CPU Machine Check Exception (MCE) error in logs. Machine Check: 0 Bank 2: 8c005fc040200151

BiggiDiggi · October 25, 2024, 4:04pm

The full text of the error is as follows:

mce: [Hardware Error]: CPU 22: Machine Check: 0 Bank 2: 8c00004040400151
mce: [Hardware Error]: TSC 0 ADDR 99c43d80 MISC 40
mce: [Hardware Error]: PROCESSOR 0:b0671 TIME 1729871239 SOCKET 0 APIC 4c microcode 129

I first saw it during the startup list of flowing checks that you get before you reach the screen to login. However, because it goes by so quick and it is in gray I couldn’t see it properly. I was traveling for work, so I didn’t pay too much attention for the first week or so.
Now that I am back, I did

journalctl --since “1 minutes ago”

and realized that it appears every second!

I did some searching online and I saw a few similar errors, but not with the same code. Usually Bank 2, 5 6, which for some people are “nothing to worry about”, whereas for others maybe led to hardware issues. One very technical discussion I found talks about changing the voltage on the CPU in BIOS and so on, but this is way above my understanding.

Dell told me to update my BIOS to the latest version which I did. Nothing changed. Nothing from them since. I have always used the default settings and have never overclocked my CPU in any way. I dual boot Windows and Endeavouros. I updated BIOS through Windows (which is more or less the only thing I use it for).

I also ran the 20 min BIOS diagnostic and that did not show any errors.

I have been using my current setup for over a year with no problems and have been super happy.

I am using the linux-zen kernels. Tried a session with the linux-lts ones and the same error was constantly appearing in the logs.

Here is the result from

journalctl -k -b -0 | eos-sendlog

and here the result of

inxi -Fxxc0z | eos-sendlog

I am not sure what else to provide as information. Any help will be greatly appreciated, or steps into deciphering the technical language on other similar errors.

1093i3511 · October 25, 2024, 5:00pm

I’m by no means very experienced with this,
but from you journalctl

kernel: microcode: Current revision: 0x00000129

Am I correct with my assumption that you don’t have early microcode loading enabled as a boot parameter ?
I can’t really tell if it is the latest revision of microcode available for that raptor lake CPU. And if this may be related to the known issues that line of processors has.

Simplest way to check would be to look if intel-ucode package is installed on your system.

yay -Q intel-ucode

Furthermore, the arc wiki may help

BiggiDiggi · October 25, 2024, 6:04pm

Am I correct with my assumption that you don’t have early microcode loading enabled as a boot parameter ?

I have never myself set something like that. I am not sure how to check. From the arc wiki page you linked I tried the command:

journalctl -k --grep=‘microcode:’

which should have an indication for the early loading. I do not get that indication. In particular, the wiki says that one should see something like:

kernel: microcode: Current revision: 0x00000012
kernel: microcode: Updated early from: 0x0000000e

I only see the first line with the code version being 129. (I think this is the line you have also copied in your answer).

The rest of the wiki page is rather hard for me to understand.

The BIOS update I installed is the one that shipped the microcode 129. And the error I posted includes “microcode 129”. It did include that number even before I updated the BIOS.

Is this early loading something I have to enable from the BIOS or it is done differently?

I see sections about initramfs and dracut in the wiki, but I have never understood what these are and do. But I know I see dracut do things after the kernel gets updated.

After upgrading the BIOS I have not done any updates on my Endeavouros, because it was fully updated before I did the BIOS upgrade. Do I need to rebuild the kernel or the linux-firmware so they are aware of this new microcode?

1093i3511 · October 25, 2024, 7:13pm

Sorry for the late reply, my DSL connection is a bit unstable currently.

With regard of the microcode, I’ve seen other reports about newer revisions available for the raptor lake series of CPUs, and that is revision 12b. Intel themselves confirmed that this new revision fixes stability and performance issues for these CPUs.

As you’re running the zen kernel, I’m almost certain that early microcode would be supported. As I’m using the zen kernel as well, but with an AMD processor. And I’ve enabled early microcode support explicitly during the initial install of EnOS.

Try to install the microcode package via yay -S intel-ucode. But I don’t know if dracut will automatically include it directly. There might be an additional cmdline parameter to be set via /etc/dracut.conf.d/cmdline.conf

For now, after you’ve installed the package, check if dracut is rebuilding the kernel images and reboot the system. Then check with journalctl -k --grep=‘microcode:’ if the second line is present or not.

Edit: The section of the arch wiki which is actually of interest is this one.

BiggiDiggi · October 25, 2024, 8:26pm

No worries. I hope your connection problems are resolved soon.

I forgot to mention in my previous answer that I do indeed have intel-ucode installed. I also did rebuild with dracut, using

sudo dracut-rebuild

as suggested on the Endeavouros Discovery page for dracut (I am using grub). I restarted after that and nothing changed - still getting the MCE error in the log. Is this all one needs to do to “install the microcode package via yay -S intel-ucode”?

With regard of the microcode, I’ve seen other reports about newer revisions available for the raptor lake series of CPUs, and that is revision 12b. Intel themselves confirmed that this new revision fixes stability and performance issues for these CPUs.

Yes microcode 129 is the latest part of that fix and it is what I got with the BIOS update today.

Edit: The section of the arch wiki which is actually of interest is this one.

I did also follow the steps in the section after your previous reply to double check that I do not have any pending updates for the microcode and indeed I am up to date according to that check.

Another thing I do not understand is whether one always needs early loading of the microcode enabled.

The wiki article states:

" The Linux microcode loader supports three loading methods:

Built-in microcode
Early loading
Late loading

Early loading is the second option. How does this “built-in” option work? Is this what supposedly dracut is meant to do after updating the microcode?

And lastly, the subsection for dracaut of the early loading section of the wiki article only gives a link to the man page of dracut. I saw an option

early_microcode=“{yes|no}”

there. Am I meant to change this somehow and then rebuild again?

1093i3511 · October 25, 2024, 9:25pm

Sadly I can’t check it for myself.

But I’m certain that the revision 12b is the newer one and 129 has been an earlier fix from intel to address that issue.

Without interfering with your current configuration, you could check if intel-ucode.img is included within the /boot folder and being included in the menu entries of your grub.cfg

That would be the build-in option for the microcode. Besides the kernel image (initramfs-linux.img), the microcode image would be loaded in addition to the kernel image.

I would definitely check the iucode-tool

ricklinux · October 25, 2024, 11:03pm

You should check your memory with memtest. Run a long test.

BiggiDiggi · October 25, 2024, 11:46pm

But I’m certain that the revision 12b is the newer one and 129 has been an earlier fix from intel to address that issue.

You are correct. Sorry I got confused. Sadly, Dell (my laptop maker) hasn’t released a BIOS with that version of the microcode.

On top of that I noticed that the errors disappear from the logs if my CPU is under heavy load. Which ties in with what they have cited in that article on the 12b revision:

“4. Microcode and BIOS code requesting elevated core voltages which can cause Vmin shift especially during periods of idle and/or light activity.
a. Mitigation: Intel is releasing microcode 0x12B, which encompasses 0x125 and 0x129 microcode updates, and addresses elevated voltage requests by the processor during idle and/or light activity periods.”

So, I have to wait it seems. This is very annoying.

I do have a intel-ucode.img file in my /boot folder and it is included in the menu entries for all the linux kernels in grub.cfg in the way suggested by the wiki article. So this is all as it should be.

With regards to the early loading: it turns out that the

early_microcode=“{yes|no}”

option is on “yes” by default for dracut. So I do not know why the output of journalctl -k --grep='microcode:' does not reflect that.

I tried to understand the manpage of iucode-tool, but it looks very complicated - I will be afraid to work with it currently. Also, I do not understand, if there is no BIOS for my machine with the 12b microcode, is there anything I can do manually to get it on my machine? If yes, then I will look for a guide on doing it.

I found out that there is Intel microcode repository in GitHub. However, as far as I can see the latest release there is from 10th September, but the announcement of this 12b is from 25th September. Furthermore, the release notes on the newest release on that github page do not seem to indicate the version of the microcode.

BiggiDiggi · October 25, 2024, 11:46pm

I will look into that