Regular crashes on heavy load

Hello,
I’ve been using Endeavour OS for over a year now and everything was fine until a month ago. Now the system completely froze regularly whenever I’m doing anything “heavy” like big 3D games, program compilation, IDE loading or shader compiling.

I have no idea what is happening or what exactly is breaking. I have noticed that when this happens, and I have a sound playing like a music or anything, I have the last 2 seconds repeating in a loop.

The screens just freeze, and I cannot do anything (even ctrl + alt + F2 don’t do anything)

Generally I’m able to manage whenever I have a problem, but here, I don’t know where to start or which log I should look for.

I don’t know what I could have done that would have this effect. The most “exotic” things I do is manually install dotnet SDKs update, I’m not absolutely sure this couldn’t have an impact but I would be surprised.
I regulary update my system and all seems fine.

Thank to you for any help or suggestion

inxi: https://0x0.st/HDuY.txt

One thing I would do right off the bat is update your BIOS, there’s a new update that came out today and yours is about 2 years behind.

2 Likes

I had a similar issue recently. I found out it was the kernel (default arch linux kernel) being the issue in my case. What happened was, in my specific case, with certain specific games it would fail at one point to use the vram properly and let it fill and crash, then try to force reboot the GPU. Only for the GPU to fail rebooting within the OS according to journalctl. Doing REIBUS to restart made it work again (or 10sec power button press).

I looked for firmware updates and such, and turned out that I had no such issues with the LTS kernel. Might be worth a try.

I would also look in the journalctl -e at around the time the crash happens next time. And see what the journal says.

1 Like

Looks like there’s no swap, if so best to create one.

2 Likes

in one year the performance is degraded severely when you use the memory-hog apps…
…according to you, your performance never suffered while using the memory-hog apps before lately.
so
depending on your DE you can always monitor system (system-monitor) to see what is starting to leech into a seize mode i.e. abnormally high memory/cpu usage. I’ve watched shit seize up and kill the system in realtime in system monitor. sucks but at least I know what it was.

And even if you know what it is its hard to do something about it. the world of segfaults and debugging are a nightmare if you finally pinpoint this on an app…

recurring hardboots when something seizes is an awful way to live and I’ve been there.

listen to all advice you got so far. update everything.

–if it’s a flatpak, appimage, or AUR package all bets are off the table. this is outside of the purview of Endeavour curators. If it’s a native package(s) that is also relevant.

If this was me I’d uninstall all your “heavy stuff.” when it’s uninstalled go on a search and destroy folder hunt. then reinstall. bye by configs and settings but if you suspect the right one you might have some pain-free months.

still my gut says disregard this whole awful reply since it really seems like hardware degradation from losing a drive or two and observing similar behavior.

the problem with that ^ is all the S.M.A.R.T. hardware health tools are BS and they will always say everything is near death even new stuff.

kernel stuff tricky and you are absolutely right “specific”. I wonder if Zram would be better?

I’ve seen swap creation do miracles.

2 Likes

I started having problems with rtkit calling for a “canary” & then locking everything…I’m back on the LTS kernel for the time being without any problems (so far)…Check your messages…try in a Terminal: sudo journalctl -e -t rtkit-daemon or to cast a wider net you can: sudo journalctl -S “2024-01-30 07:10:00” (put in the time you want to monitor in the parentheses–i.e. date then time in 24 hr clock).

I catch most of my “problems” that way…

I’m going to go back to the zen kernel with the next update to see if the problem is gone…

4 Likes

I forgot you could scope it down like that. Brilliant! I love learning. This will give OP great view into the freezes.

Linux is so schizo as well and I love that:
x11 v wayland debates
init v systemd debates
foss v proprietary
and my favorite
current v LTS (or zen)

I’ve never tried the LTS switch just because I f**k things up just by moving my chair or clearing my throat. Living with the LTS itself is a scientific control because you dodge 80% of the nonsense linux-current people get. I am always intrigued by that. Not my thread to hijack but I hear zen is another universe:)

1 Like

Well…the LTS is a “baseline” to see if newer stuff is a problem…I keep the regular kernel, zen & LTS…switch as needed…

I’ll do a sudo journalctl -S “2024-01-30 07:10:00” (change to date/time as needed) to see if there is an “offender” that makes sense & then drill down with sudo journalctl -e -t “whatever the process is” to see how often it’s happening…

2 Likes

Thanks to you all. Those are very good troubleshooting advices.
The problem was the BIOS, I would never have expected that, I don’t like the idea of BIOS update, but once done, no crashes at all.

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.