MCE Error on 2010 MacPro

JacquesT

New Member
Joined
May 2, 2025
Messages
4
Reaction score
3
Credits
47
Morning,

I've recently upgraded my MacPro 5,1 (2010 model, dual Xeon 6core with 6/8 dimm slots populated with ECC memory) to Debian Bookworm, though my query relates to many distributions running kernel >6.X. With kernel 5.X I never had any MCE errors reported during the boot phase of my machine. After upgrading to Debian Bookworm I'm now getting the below error message..snip from 'dmesg | grep mce'. This message is not unique to Debian, it appears in almost all of the kernel 6.X based distributions I've tried, OpenSuse, ARCH, LinuxMint, Fedora etc. in some form.

On fresh power on boot
mce: [Hardware Error]: Machine check events logged
mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 8: ea1d6740008000b1
mce: [Hardware Error]: TSC 0 MISC 40000
mce: [Hardware Error]: PROCESSOR 0:206c2 TIME 1746170939 SOCKET 1 APIC 20 microcode 1f

On restart
mce: [Hardware Error]: Machine check events logged
mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 8: ea1d6740008000b1
mce: [Hardware Error]: TSC 0
mce: [Hardware Error]: PROCESSOR 0:206c2 TIME 1746170939 SOCKET 1 APIC 20 microcode 1f

TSC points to 'Time Stamp Counter' I believe, though I have no idea with the 0 error code or MISC 40000 refers to. There is no memory in Bank 4/8.

Things I have tried to resolve / troubleshoot this:
  1. Cleaned out the machine (it is immaculate inside), applied fresh thermal paste to both CPUs and also northbridge.
  2. Swapped the CPUs around, same error message and coding.
  3. Reduced to just 1x Dimm, and also rotated dimms, same error.
  4. Tried various distributions, same error but I sometimes don't see the 0 or MISC 40000 after the TSC identifier.
  5. Tried booting via BIOS compatibility mode (from installer CD rather than USB stick), same issue.
  6. Tried swapping gfx cards, same issue.
  7. Run the Apple Hardware test and no errors were reported.
  8. The machine will happily work all day in Windows or MacOS with no crashes, freezes or any isses.
  9. The EFI firmware is up to date and the latest available for the machine.
I have also tried to run 'rasdaemon' to see if I could get any log text but ras-mc-ctl cannot read any of the machine labels. mcelog is depreciated.

Any ideas / suggestions would be appreciated.

Thank you.

Jacques
 


MCE is a hardware problem, most commonly graphics , one common fix [doesn't fix everything] is adding nomodeset to the boot sequence
 
Thank you for the suggestion. I have just tried adding 'nomodeset' to the grub kernel line and no change unfortunately, same error message pops up. I'm starting to wonder if this is a bogus EFI/BIOS related message as the machine works perfectly in all other regards.
 
I'm starting to wonder if this is a bogus EFI/BIOS related message as the machine works perfectly in all other regards.
That is a possibility with mac's,,
 
Morning,

I've recently upgraded my MacPro 5,1 (2010 model, dual Xeon 6core with 6/8 dimm slots populated with ECC memory) to Debian Bookworm, though my query relates to many distributions running kernel >6.X. With kernel 5.X I never had any MCE errors reported during the boot phase of my machine. After upgrading to Debian Bookworm I'm now getting the below error message..snip from 'dmesg | grep mce'. This message is not unique to Debian, it appears in almost all of the kernel 6.X based distributions I've tried, OpenSuse, ARCH, LinuxMint, Fedora etc. in some form.

On fresh power on boot


On restart


TSC points to 'Time Stamp Counter' I believe, though I have no idea with the 0 error code or MISC 40000 refers to. There is no memory in Bank 4/8.

Things I have tried to resolve / troubleshoot this:
  1. Cleaned out the machine (it is immaculate inside), applied fresh thermal paste to both CPUs and also northbridge.
  2. Swapped the CPUs around, same error message and coding.
  3. Reduced to just 1x Dimm, and also rotated dimms, same error.
  4. Tried various distributions, same error but I sometimes don't see the 0 or MISC 40000 after the TSC identifier.
  5. Tried booting via BIOS compatibility mode (from installer CD rather than USB stick), same issue.
  6. Tried swapping gfx cards, same issue.
  7. Run the Apple Hardware test and no errors were reported.
  8. The machine will happily work all day in Windows or MacOS with no crashes, freezes or any isses.
  9. The EFI firmware is up to date and the latest available for the machine.
I have also tried to run 'rasdaemon' to see if I could get any log text but ras-mc-ctl cannot read any of the machine labels. mcelog is depreciated.

Any ideas / suggestions would be appreciated.

Thank you.

Jacques
An error output by mcelog is referencing a hardware problem. That means that the solution, if there really is such a problem, is a hardware repair or replacement, and not just a software matter.

Basically, if mcelog shows an error in hardware, then the easiest thing to do is to check the particular hardware it's referencing with some application which does that.

The error messages in post #1 are about the cpu rather than say, memory. The TSC (time stamp counter) is inside the cpu, so it's the cpu that one might investigate.

A common app to check the cpu is the stress command. Its man page has some useful and helpful examples of its use so there's no need to describe them here. It also has a "dry-run" option which is good to use at times since it will tell the user what it will do without doing anything so that you can sort of "try before you buy".

If it all checks out, then you can ignore the mcelog output.

There's another app called s-tui which can check the cpu and opens up a curses display on the terminal. It's possible to watch the cpus on screen in monitor mode or stressed mode, and then see if one of more cpu drops out or does something unexpected or an error message appears.

If you wish to check the memory any way, a commonly used test is with the memtest86+ app.

If the machine works "perfectly" as mentioned in post #3, then you might simply ignore mcelog and happily compute as normal until a more "real" issue appears. It's possible that there's a mismatch between the mcelog software and the kernel's idea of the hardware, so to speak. There is brief reference to this in my notes, but it's too long ago to have any relevant details for this case.
 
Thank you for the suggestions. I installed both stress and s-tui and performed a 5min stress test with the below. This made all the fans ramp up, but not to full speed.

stress -c 24 --io 8 --vm 256 --vm-bytes 256M --timeout 300

The 5min test pegged almost all the cores to 100%, apart from some IO workers here and there which were jumping between 96% and 100%. Memory was running at about 40GiB. Temperatures were well under control as monitored via s-tui and no machine glitching occurred, it purred all the way through the test.

Last thing I will do is a memtest 86+.
 
Memtest86 complete, I did 1 pass, no errors. No ECC errors either. I think this is likely a kernel / hardware mismatch that has crept in with 6.X.
 
Hi,

I have a Mac Pro 5,1 (Dual X5690) and have been struggling with this exact issue for weeks. It’s actually quite reassuring to find someone else out there experiencing the exact same problem, so I joined Linux.org solely to reply to this thread.

Like you, I encountered a nearly identical MCE error on Linux, which led me to your post. Before finding this thread, I spent a lot of time troubleshooting the hardware across different environments. I should note that I have experienced absolutely no BSODs or kernel panics whatsoever. Here is a summary of what I’ve done and observed so far:

I initially encountered a single WHEA-Logger ID 1 error in Windows 11 upon powering on the machine. This happened even on a clean, fresh installation of Windows 11 with no other software installed. Since there was no other WHEA information provided except for a 2,157-digit hexadecimal RawData string, I had no choice but to decode it myself.

Using the "decodewhearecord" tool alongside the Intel 64 and IA-32 Architectures Software Developer’s Manuals for reference, I managed to decode the RawData and found that the output was nearly identical to yours. (As a side note, according to the Intel manuals, "Bank 8" in this context refers to an internal architectural register bank, not the physical DIMM Slot 8, which is a common misconception.)

Furthermore, when I installed Linux, I got an MCE that perfectly matches your CPU 1: Machine Check: 0 Bank 8: ea1xxxx0008000b1 log, where everything except for the xxxx portion is identical to yours. Crucially, the APIC ID is flagged as 20 in Linux (which translates to decimal 32 in Windows). This specific APIC ID and bank error remain exactly the same whether decoded from the Windows 11 WHEA RawData or when swapping the physical CPUs.

To break down how this behaves across different operating systems: macOS shows absolutely no errors or issues whatsoever; Windows 11 throws that single WHEA error immediately after booting and nothing else; and Linux similarly triggers the MCE only once during the boot process.

To troubleshoot, I have already tried the following:
  • Deep cleaned the inside of my Mac Pro.
  • Swapped the positions of the two CPUs (no change).
  • Reduced RAM from 8 DIMMs to 6, 4, and 2 DIMMs, and rotated them regardless of the configuration (no change).
  • Passed Apple Service Diagnostic (ASD) 3D149. (As a side note, ASD 3D149 is different from the basic Apple Hardware Test; it is a proprietary, highly rigorous verification tool used by Apple Technicians)
  • Passed Prime95 (Small/Large/Blend) in Windows 11 and Memtest86
  • Verified that the EFI firmware is on the latest available version.
Additionally, I did some further testing with single-CPU configurations. As you may know, dual-CPU Mac Pro model can boot with only the right CPU socket(CPU A) populated(but will not boot with only the left socket installed - CPU B). Interestingly, when I booted the machine using just CPU A, the error completely disappeared, regardless of which of the two CPUs was installed in it.

Looking at both your case and mine, the machine runs flawlessly all day in both macOS and Windows with zero crashes, freezes, or performance issues. Given that swapping hardware components changes absolutely nothing and our error profiles match so perfectly, I am strongly inclined to believe this is not an actual hardware failure, but rather a unique characteristic or minor firmware quirk inherent to the dual-CPU Mac Pro 5,1 models themselves.

P.S. English is not my native language, so I used a translator to write this post. I hope everything is clear!
 
Last edited:


Follow Linux.org

Members online

No members online now.

Latest posts

Top