Crashes and CPU Errors at Boot since Kernel 5.10

Ever since I updated to zen kernel 5.10 my PC has been crashing all monitors go black and whatever sound is playing keeps repeating and after a few seconds reboots. It happens at random whether I am gaming or simply doing homework. On boot I get MCE Hardware Errors CPU [thread number]: DSC_0001
The motherboard is an Asus ROG Crosshair VIII Hero (WI-FI) with a ryzen 9 3900x. The BIOS is updated to the latest version.

1 Like

Might this help: https://bbs.archlinux.org/viewtopic.php?pid=1842331#p1842331

I’ll give it a shot with processor.max_cstate=5 first and if not processor.max_cstate=1.

This looks like a Microcode error so I doubt that’s going to work.

Edit: It is CPU errors and telling you the Microcode version. Not necessarily caused by the Microcode. Just F.Y.I.

It was reported today that 5.10.8 kernel was just released with a lot of fixes, especially for BTRFS issues, etc. I would assume it will be dropping in ARCH very soon, so might want to wait and try that.

3 Likes

Also worth trying linux-lqx - this has been more stable for me recently. I tried linux and linux-zen 5.10.6 and 5.10.7 and they both regularly froze (e.g. screen blank, ten seconds later fans to 100%).

Oh, the joys of a new LTS kernel… :joy:

4 Likes

:sweat_smile: yeaaah…i guess lesson is not to name something LTS in advance, until it’s actually tested :laughing:

Nah, “LTS” just means it will need support for a long time to fix all the bugs… :wink:

2 Likes

I just had a quick look at it and it seems to have everything zen has and some other things. I will give it a go. This release has been the messiest I’ve seen and I’ve been using Linux for 6 years now. All my friends think I’m a fool for using Linux and with kernel 5.10 I kind of actually look like the fool.

I think it is the build, drive format, gpus, cpus or a combo of things.

I have absolutely zero issues with 5.10 zen (Gaming/PC, intel cpu/nvidia gpu) and 5.10 standard (laptop, intel cpu and gpu,) both ext4 formatted.

Linux kernel 5.10.8 is now in Arch…hit the update.

2 Likes

It seems worse after updating to kernel 5.10.8, I tried with both linux-zen and linux-lqx. I crashed twice during a 24 man raid on ffxiv.

DSC_0006

2 Likes

Have you tried disabling iommu?

IF that doesn’t work just use the 5.4 LTS kernel, check each new 5.10 point release to see if fixed.

2 Likes

in aur there is a logging tool : mcelog ,

got from this ;

2 Likes

I added the processor.max_cstate=5 and then set up mcelog as @ringo suggested. Let’s cross my finger, I will try disabling iommu later if I am still crashing. I’m just trying things one by one so I can pinpoint the solution when I find it.

1 Like

So it happened again with processor.max_cstate=5 and kernel 5.10.9. MCELOG gives mcelog: ERROR: AMD Processor family 23: mcelog does not support this processor. Please use the edac_mce_amd module instead. CPU is unsupported I also tried rasdaemon and get No MCE errors.

Have you tried disabling SMT or the uOP cache?

Please see this post in the Manjaro forum:

Same mce error message like you have but they do not cause a hang. They are just cosmetic and can eventually be supressed by kernel comamndline parameters.

So chances are that the mce messages you see are just coincidental but not the root cause of the crash.

Please also have a look at the wiki page: https://en.wikipedia.org/wiki/Machine-check_exception

Specifically where it says:

Machine checks are a hardware problem, not a software problem.

amd_iommu=off
2 Likes