Hard shutdown with CPU errors shown during boot

Lucian · April 29, 2024, 5:33am

It seems anytime I run a CPU intensive task my computer sometimes has a hard shutdown. How do I get crash logs to figure out what’s wrong? This crash occurs on both the lqx kernel and zen, and continued after I disabled some automatic overclocking settings in my bios.

winnyace · April 29, 2024, 6:07am

I don’t exactly know a lot, but I will try to point you to some possible directions for this issue.

EndeavourOS has a logging tool available. The command for it is $ eos-log-tool --enable-journal. It should open a GUI app where you can select what logs you want to send.

It’s also possible that your CPU isn’t dying, but merely shutting down due to overheating. Check your CPU temperatures. htop can be setup to show the CPU temperature on every core. Press F2 and on the first page, press the Left arrow key and then scroll down until you find the option Also show CPU temperature. It should work out of the box.

Lucian · April 29, 2024, 6:11am

Apr 28 23:26:58 _hostname_ kernel: mce: [Hardware Error]: Machine check events logged
Apr 28 23:26:58 _hostname_ kernel: [Hardware Error]: System Fatal error.
Apr 28 23:26:58 _hostname_ kernel: fbcon: Taking over console
Apr 28 23:26:58 _hostname_ kernel: [Hardware Error]: CPU:4 (17:71:0) MC5_STATUS[-|UE|MiscV|AddrV|PCC|TCC|SyndV|-|-|-]: 0xbea0000000000108
Apr 28 23:26:58 _hostname_ kernel: [Hardware Error]: Error Addr: 0x0000749fedcb59ce
Apr 28 23:26:58 _hostname_ kernel: [Hardware Error]: IPID: 0x000500b000000000, Syndrome: 0x000000004d000000
Apr 28 23:26:58 _hostname_ kernel: [Hardware Error]: Execution Unit Ext. Error Code: 0
Apr 28 23:26:58 _hostname_ kernel: [Hardware Error]: cache level: RESV, tx: GEN, mem-tx: GEN

I’ll try to keep an eye on my temps, the one in mangohud does get up to 75C, but I don’t know what point is being measured.

Lucian · April 29, 2024, 6:42am

Could this be a memory issue? The address seems to suggest that but I can’t figure out what the error code means.

Lucian · April 29, 2024, 7:08am

To add to the issue, the crash occurred frequently when loading a game in Dragon’s Dogma 2, and during the load screen. There wouldn’t be time for the CPU to heat up to dangerous temperatures.

flyingcakes · April 29, 2024, 7:17am

Try monitoring the memory usage next time you reproduce the issue.

Normally, in case of memory issue, the system should slow down terribly but shouldn’t “crash” or shutdown immediately.

Lucian · April 29, 2024, 7:29am

Would it result in a shutdown if a memory stick came loose? I gave them a little push and haven’t had issues since, but I need more time to confirm it.

flyingcakes · April 29, 2024, 7:38am

I don’t think so. It can cause other problems, but a shutdown isn’t one of them afaik.

Since you mention this happens only during CPU intensive tasks, can you confirm that the power supply you’re using (PSU) is capable enough and is working fine? A faulty PSU can have issues similar to the ones you listed, i.e. when system needs more power it fails.

In case you added new hardware recently to the system (like a new graphics card) that needs extra power and might be overshooting what the PSU can support.

Lucian · April 29, 2024, 7:59am

It’s occurring during gaming, so a situation where all resources are being used. I have not modified my system recently, last upgrade was a year or two ago (New RAM sticks) and the PC was built in 2020. The PSU is only 500W but this issue started just a few days ago.

Lucian · April 29, 2024, 8:33am

I double checked to make sure my RAM is seated properly, and memtest only gave a single bad result. Hopefully that solves my issues, the only other thing i can think of is a bad connection to the gpu. Mine is connected via a ribbon cable.