Crashes and CPU Errors at Boot since Kernel 5.10

@ricklinux I have not touched SMT nor uOP.

@mbod I changed the kernel parameter to add .rasdaemon and it seems odd that I do not see those message when rebooting or booting normal, it’s only after crashing.

@otherbarry I added that now so we will wait and see.

A big shoutout and thank you to everyone helping troubleshoot this.

I suggest you focus on a hardware issue.

Have you recently changed HW or BIOS?
I see that you have a new Ryzen 9 3900X. Is that CPU fully supported by your board?
What is your RAM resp. RAM speed?
Are you overclocking anything?
What is memtest86 telling you?

No recent hardware changes, I did update the BIOS multiple times though. Looking at rasdaemon I see the following for mce errors:
1 2021-01-21 12:28:37 -0500 error: Corrected error, no action required., CPU 2, bank Platform Security Processor (bank=25), mcg mcgstatus=0, mci CECC, mcgcap=0x0000011c, status=0x98004000003e0000, misc=0xd01a000b00000000, walltime=0x6009b9c6, cpuid=0x00870f10, bank=0x00000019

and this for disk errors:
1089 2021-01-21 16:58:56 -0500 error: dev=0:66304, sector=721912480, nr_sector=16, error='unknown block error', rwbs='RA', cmd='',

Excuse me if it has already been mentioned, but did you update your BIOS?
Not necessarily the latest version, but it should have AGESA 1.0.8.0 or maybe 1.1.0.0.

I had plenty of RAM related issues with Ryzen 9, like for example very long startup time due to faulty memory training, reboots with MCE errors etc.
They have been solved with a more recent BIOS version.

1 Like

I do not experience this phenomenon at the moment.

A lot of people using NVIDIA drivers notices it BUT:

It seems to be more prevalent with users of the dkms version which makes sense since several commands that has been in the kernel forever and many, many dkms drivers (NVIDIA and others!) use have been removed.

Searching online shows that Fedora users with NVIDIA systems are the second hardest hit by the 5.10 kernels after Arch based users and they use a similar system to install the drivers. And again it also hits users of wifi and network cards that use dkms modules.

Basically this is not really a kernel problem as much as a driver problem, where hardware manufacturers haven’t adjusted their drivers in time to parry for the changes in 5.10.
(This does not really explain why the latest dkms NVIDIA drivers work very well in the pre-release 5.11.x kernel however; I don’t know if they added those commands back?).

Anyway, there are other problems as well; there are Intel CPU lockups, hardware accelerated graphics lockup, wifi stops working and so on and so forth. 5.10 is just a huge step backwards in usability which makes the idea that it should be the next LTS a little… scary, to be honest.

1 Like

I’m aware of the nVidia problems, but my old Manjaro box with a GTX 1070 works just fine.
I wouldn’t go as far as saying that 5.10 is a step backwards, maybe it is in terms of nVidia drivers (and the kernel devs do not give a flying f*ck about closed source blobs), but certainly not in general.

2 Likes

It is a step backwards when it comes to actual useability. The thing a majority of people using computers care for and another nail in the coffin of “the year of the linux desktop”.

Going out on a tangent here but the people who are willing to adapt their hardware to their software is far far FAR fewer than the people who demand that their software work on their hardware (coincidentally, I’ll imagine, it is about the same percentages as the people actually using Linux on a daily basis vs the people who don’t…).

Anyway, my point is that most people who want a good graphic card for example buy NVIDIA simply because no matter what Linux users claim, they are objectively superior, and have been superior for roughly 15 years or so.
This means that people who have a powerful laptop, or a mid-range desktop or above, and want to try Linux, are futher discouraged every time they are met with a "Nah, you see, we don’t support your hardware because ideological reasons; come back when you have bought a new graphic card just to b e able to try Linux). And it’s even worse with Wifi of course, which is why Debian will never ever ever be a beginner’s distro unlikle Ubuntu-based distros like well… Ubuntu, Pop and Mint.

Anyway, this is also the reason of course why I dual boot. Linux might be open source and superior in it’s way of “doing” things, but Windows and OSX are far superior in actually working.

I don’t need no stinking Nvidja on my Linux. :rofl:

To be honest, I absolutely do not care about “the year of the linux desktop”. And I care even less about “percentages”.
Why don’t we simply admit that (desktop) Linux is not for everyone and that it doesn’t have to be for everyone?

Yes, but that isn’t worth much if they do not work.
I had nVidia for 15 years, and only recently switched to AMD.
Remember, AMD was even more crap than nVidia before they open-sourced.

2 Likes

That is my point; my point however is that for most people the conclusion is the opposite of yours, aka “Linux doesn’t work”.

Anyway, my main point is that this is fixed in 5.11 since there are both new changes in 5.11 AND NVIDIA has made a concentrated effort to BE compatible with whatever changes are made in 5.11.

Which makes 5.10 as LTS precarious unless they backport a number of patches from 5.11.

Well that is good news isn’t it?

I’m gonna try just that tomorrow and boot up the old machine and I can give you a little feedback on both 5.10 and 5.11-rc4.

My point - and your point - is that Linux is very dependent on good hardware->driver support.
Everyone of us have at some time had a problem in this regard and I don’t think this is going to change.
But when I look back at my first try with Linux in ± 1998, things have drastically improved. The underlying problems will remain.

1 Like

(As noted in another thread since I don’t game on this setup I installed the nouveau drivers instead and the zen kernel which I usually use and so far it is behaving).

As for nouveau, this might help you: https://nouveau.freedesktop.org/FeatureMatrix.html

1 Like

Thanks!
I usually never use the driver, but since Manjaro’s devs have officially recommended owners of my specific NVIDIA card model to use them until further notice IF you use 5.10 and as long as you don’t game on your Linux setup etc I am heeding their advice.

If I have enough time, I’ll test it on Manjaro unstable and report back.
It’s a 1070 and not representative for other model generations, but I can already say that when I last used it, it ran 5.10.? and nVidia 455.?.?. on KDE.
Hope I can replace the ? :slight_smile:

1 Like

I have a 1050 so basically the same generation, but the default drivers are now 460.x.x and those do not play well with 5.10.(higher than 3); at least not the dkms version.

I have read also some instances where they set the ram to 3000 instead of 3200 and the errors go away?

I have received multiple BIOS updates lately and done all of them, but they are beta BIOS. I should probably revert back to the last stable version and see.

@Beardedgeek72 I have an AMD RX 5700 XT. Also I hope 5.11 will be a much better kernel.

I haven’t had a full crash and my PC rebooting with an mce hardware error after adding this. I got to raid the whole session tonight. The only issue I had was at one point everything froze on my 3 monitors, but could still hear my raid group, game and music. I ended up hard restarting, this is the second time, it happened during raid last Friday as well. Interesting enough because I played and stream games without this happening again all week. It’s the only time I use my parsing plugin / overlay so it might be it.

1 Like