MCE Error on 2010 MacPro

JacquesT

New Member
Joined
May 2, 2025
Messages
4
Reaction score
3
Credits
47
Morning,

I've recently upgraded my MacPro 5,1 (2010 model, dual Xeon 6core with 6/8 dimm slots populated with ECC memory) to Debian Bookworm, though my query relates to many distributions running kernel >6.X. With kernel 5.X I never had any MCE errors reported during the boot phase of my machine. After upgrading to Debian Bookworm I'm now getting the below error message..snip from 'dmesg | grep mce'. This message is not unique to Debian, it appears in almost all of the kernel 6.X based distributions I've tried, OpenSuse, ARCH, LinuxMint, Fedora etc. in some form.

On fresh power on boot
mce: [Hardware Error]: Machine check events logged
mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 8: ea1d6740008000b1
mce: [Hardware Error]: TSC 0 MISC 40000
mce: [Hardware Error]: PROCESSOR 0:206c2 TIME 1746170939 SOCKET 1 APIC 20 microcode 1f

On restart
mce: [Hardware Error]: Machine check events logged
mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 8: ea1d6740008000b1
mce: [Hardware Error]: TSC 0
mce: [Hardware Error]: PROCESSOR 0:206c2 TIME 1746170939 SOCKET 1 APIC 20 microcode 1f

TSC points to 'Time Stamp Counter' I believe, though I have no idea with the 0 error code or MISC 40000 refers to. There is no memory in Bank 4/8.

Things I have tried to resolve / troubleshoot this:
  1. Cleaned out the machine (it is immaculate inside), applied fresh thermal paste to both CPUs and also northbridge.
  2. Swapped the CPUs around, same error message and coding.
  3. Reduced to just 1x Dimm, and also rotated dimms, same error.
  4. Tried various distributions, same error but I sometimes don't see the 0 or MISC 40000 after the TSC identifier.
  5. Tried booting via BIOS compatibility mode (from installer CD rather than USB stick), same issue.
  6. Tried swapping gfx cards, same issue.
  7. Run the Apple Hardware test and no errors were reported.
  8. The machine will happily work all day in Windows or MacOS with no crashes, freezes or any isses.
  9. The EFI firmware is up to date and the latest available for the machine.
I have also tried to run 'rasdaemon' to see if I could get any log text but ras-mc-ctl cannot read any of the machine labels. mcelog is depreciated.

Any ideas / suggestions would be appreciated.

Thank you.

Jacques
 


MCE is a hardware problem, most commonly graphics , one common fix [doesn't fix everything] is adding nomodeset to the boot sequence
 
Thank you for the suggestion. I have just tried adding 'nomodeset' to the grub kernel line and no change unfortunately, same error message pops up. I'm starting to wonder if this is a bogus EFI/BIOS related message as the machine works perfectly in all other regards.
 
I'm starting to wonder if this is a bogus EFI/BIOS related message as the machine works perfectly in all other regards.
That is a possibility with mac's,,
 
Morning,

I've recently upgraded my MacPro 5,1 (2010 model, dual Xeon 6core with 6/8 dimm slots populated with ECC memory) to Debian Bookworm, though my query relates to many distributions running kernel >6.X. With kernel 5.X I never had any MCE errors reported during the boot phase of my machine. After upgrading to Debian Bookworm I'm now getting the below error message..snip from 'dmesg | grep mce'. This message is not unique to Debian, it appears in almost all of the kernel 6.X based distributions I've tried, OpenSuse, ARCH, LinuxMint, Fedora etc. in some form.

On fresh power on boot


On restart


TSC points to 'Time Stamp Counter' I believe, though I have no idea with the 0 error code or MISC 40000 refers to. There is no memory in Bank 4/8.

Things I have tried to resolve / troubleshoot this:
  1. Cleaned out the machine (it is immaculate inside), applied fresh thermal paste to both CPUs and also northbridge.
  2. Swapped the CPUs around, same error message and coding.
  3. Reduced to just 1x Dimm, and also rotated dimms, same error.
  4. Tried various distributions, same error but I sometimes don't see the 0 or MISC 40000 after the TSC identifier.
  5. Tried booting via BIOS compatibility mode (from installer CD rather than USB stick), same issue.
  6. Tried swapping gfx cards, same issue.
  7. Run the Apple Hardware test and no errors were reported.
  8. The machine will happily work all day in Windows or MacOS with no crashes, freezes or any isses.
  9. The EFI firmware is up to date and the latest available for the machine.
I have also tried to run 'rasdaemon' to see if I could get any log text but ras-mc-ctl cannot read any of the machine labels. mcelog is depreciated.

Any ideas / suggestions would be appreciated.

Thank you.

Jacques
An error output by mcelog is referencing a hardware problem. That means that the solution, if there really is such a problem, is a hardware repair or replacement, and not just a software matter.

Basically, if mcelog shows an error in hardware, then the easiest thing to do is to check the particular hardware it's referencing with some application which does that.

The error messages in post #1 are about the cpu rather than say, memory. The TSC (time stamp counter) is inside the cpu, so it's the cpu that one might investigate.

A common app to check the cpu is the stress command. Its man page has some useful and helpful examples of its use so there's no need to describe them here. It also has a "dry-run" option which is good to use at times since it will tell the user what it will do without doing anything so that you can sort of "try before you buy".

If it all checks out, then you can ignore the mcelog output.

There's another app called s-tui which can check the cpu and opens up a curses display on the terminal. It's possible to watch the cpus on screen in monitor mode or stressed mode, and then see if one of more cpu drops out or does something unexpected or an error message appears.

If you wish to check the memory any way, a commonly used test is with the memtest86+ app.

If the machine works "perfectly" as mentioned in post #3, then you might simply ignore mcelog and happily compute as normal until a more "real" issue appears. It's possible that there's a mismatch between the mcelog software and the kernel's idea of the hardware, so to speak. There is brief reference to this in my notes, but it's too long ago to have any relevant details for this case.
 
Thank you for the suggestions. I installed both stress and s-tui and performed a 5min stress test with the below. This made all the fans ramp up, but not to full speed.

stress -c 24 --io 8 --vm 256 --vm-bytes 256M --timeout 300

The 5min test pegged almost all the cores to 100%, apart from some IO workers here and there which were jumping between 96% and 100%. Memory was running at about 40GiB. Temperatures were well under control as monitored via s-tui and no machine glitching occurred, it purred all the way through the test.

Last thing I will do is a memtest 86+.
 
Memtest86 complete, I did 1 pass, no errors. No ECC errors either. I think this is likely a kernel / hardware mismatch that has crept in with 6.X.
 


Follow Linux.org

Members online

No members online now.

Top