Need help with troubleshooting sporadic reboot on a new desktop

I’d like to ask for some help on what to do next.

It’s my first time building a new desktop. I finished putting all the pieces together and installed EOS on Aug. 9th. I’ve been experiencing sporadic hard-reboots.

Here is my hardware info created through hw-probe: https://linux-hardware.org/?probe=296f4130cd

The problem

  • I experienced 2 times of reboots on the installer ISO while downloading packages for install. I didn’t pay proper attention to it because my mind was occupied with the download taking so long. (It was a bug in the ISO hot-patch which got fixed a few days after AFAIK.) I was AFK when the second reboot happened, and the ISO booted into the first option (non-NVIDIA, so I guess nouveau?). I installed EOS with that which took like 4 hours lol, but that’s ok.
  • After the installation, the reboots have happened randomly both in TTY and Hyprland. I haven’t been able to tell any specific patterns when the reboots happen including workloads in CPU/GPU, temperatures, networking, disk I/O, etc.
  • When the reboot happens, it just happens at an instance. There is no freezing or stuttering.
Output of `last reboot`
reboot   system boot  6.10.6-arch1-1   Mon Aug 26 16:23   still running
reboot   system boot  6.10.6-arch1-1   Mon Aug 26 15:52 - 16:21  (00:29)
reboot   system boot  6.10.6-arch1-1   Sat Aug 24 23:00 - 23:37  (00:36)
reboot   system boot  6.10.6-arch1-1   Sat Aug 24 16:46 - crash  (06:13)
reboot   system boot  6.10.6-arch1-1   Sat Aug 24 12:03 - crash  (04:43)
reboot   system boot  6.10.6-arch1-1   Sat Aug 24 10:51 - crash  (01:12)
reboot   system boot  6.10.6-arch1-1   Sat Aug 24 10:46 - crash  (00:04)
reboot   system boot  6.10.6-arch1-1   Sat Aug 24 01:10 - 01:15  (00:05)
reboot   system boot  6.10.6-arch1-1   Fri Aug 23 22:42 - crash  (02:28)
reboot   system boot  6.10.6-arch1-1   Fri Aug 23 16:38 - 17:04  (00:26)
reboot   system boot  6.10.6-arch1-1   Fri Aug 23 15:38 - 16:38  (00:59)
reboot   system boot  6.10.6-arch1-1   Fri Aug 23 13:14 - crash  (02:24)
reboot   system boot  6.10.6-arch1-1   Thu Aug 22 20:35 - 13:14  (16:38)
reboot   system boot  6.10.6-arch1-1   Thu Aug 22 13:04 - crash  (07:31)
reboot   system boot  6.10.6-arch1-1   Wed Aug 21 15:17 - 01:49  (10:32)
reboot   system boot  6.10.6-arch1-1   Wed Aug 21 13:22 - 15:14  (01:51)
reboot   system boot  6.10.5-arch1-1   Tue Aug 20 13:50 - 00:59  (11:09)
reboot   system boot  6.10.5-arch1-1   Mon Aug 19 20:31 - 00:35  (04:04)
reboot   system boot  6.10.5-arch1-1   Mon Aug 19 18:48 - 19:52  (01:03)
reboot   system boot  6.10.5-arch1-1   Mon Aug 19 12:23 - crash  (06:24)
reboot   system boot  6.10.5-arch1-1   Sat Aug 17 13:43 - 15:28  (01:44)
reboot   system boot  6.10.5-arch1-1   Fri Aug 16 13:45 - 01:10  (11:25)
reboot   system boot  6.10.4-arch2-1   Thu Aug 15 22:47 - 03:04  (04:16)
reboot   system boot  6.10.4-arch2-1   Thu Aug 15 13:14 - crash  (09:33)
reboot   system boot  6.10.4-arch2-1   Thu Aug 15 13:11 - 13:13  (00:02)
reboot   system boot  6.10.4-arch2-1   Wed Aug 14 13:38 - 00:44  (11:06)
reboot   system boot  6.10.4-arch2-1   Tue Aug 13 22:19 - 23:04  (00:45)
reboot   system boot  6.10.4-arch2-1   Tue Aug 13 22:01 - 22:18  (00:17)
reboot   system boot  6.10.4-arch2-1   Tue Aug 13 17:54 - 17:58  (00:04)
reboot   system boot  6.10.4-arch2-1   Tue Aug 13 14:44 - crash  (03:10)
reboot   system boot  6.10.4-arch2-1   Tue Aug 13 11:42 - crash  (03:01)
reboot   system boot  6.10.3-arch1-2   Mon Aug 12 16:16 - 00:28  (08:11)
reboot   system boot  6.10.3-arch1-2   Mon Aug 12 15:50 - 16:15  (00:24)
reboot   system boot  6.10.3-arch1-2   Sun Aug 11 18:33 - 01:10  (06:36)
reboot   system boot  6.10.3-arch1-2   Sat Aug 10 18:08 - 01:19  (07:11)
reboot   system boot  6.10.3-arch1-2   Sat Aug 10 00:59 - 01:02  (00:03)
reboot   system boot  6.10.3-arch1-2   Fri Aug  9 23:01 - crash  (01:57)

wtmp begins Fri Aug  9 23:01:13 2024

Things I’ve tried so far

  • journalctl -e -b -1 does not show anything out of ordinary. Sometimes, the last entry is over 30 minutes before the reboot.
  • I’ve tried tweaking some BIOS settings. I actually found out that my MoBo was setting a wrong SPD profile for my RAMs, so I set that to a non-XMP baseline profile (4800MHz, 40-40-40-77, 1.1V). I also turned off “Dynamic Boost” for RAMs.
  • I’ve also tried turning off turbo boost for my Intel CPU, turning on “Power Loading” which keeps additional load on PSU when a lot of parts are in idling state, and changing the power-off button to activate only when long-pressed for 4 seconds.
  • I got the intel-ucode package update on Aug. 15th and my MoBo BIOS update on Aug. 19th regarding the Intel CPU microcode bug requesting too much voltage.
  • Ran memtest86+ with 4 consecutive passes and no errors for almost 5 hours.
  • Ran stress using s-tui for 10 minutes each, once with sqrt() only and once again combining all other options. Didn’t notice anything wrong including temperature.
  • Ran vkmark with default settings. Didn’t notice anything wrong including temperature.
  • Ran unigine-valley. The screen was like 1 fps probably b/c I haven’t figured out the right Hyprland settings, but the benchmark result was 71.5 - 617.5 fps with 319.2 fps on average.
  • Tried running the demo of Tekken 8 from Steam. Ran fine with the CPU turbo boost both on and off.

Things I’m considering

  • Maybe the PSU is faulty given that the system journal doesn’t seem to have time to flush to the file system? Maybe I should look into what people call RMA? But I’ve also read that problems with PSU tend to involve a few seconds of freeze or other malfunctioning.
  • Maybe I should try using nouveau driver? Though I’m not sure if this is actually a driver problem.

What I’d like to ask

I’m hoping if anyone could see something I’m not seeing or be able to provide a general guidance on approaching this kind of problem. I’ve been doing my best on searching the internet for information, but now I feel like I should get some help from people more experienced in computer hardwares.

I’m aware that my problem might not be specific to EndeavourOS, but I’m posting the question here because I don’t really know where else I could go to. If you happen to know a place that might be a better fit, I’d be happy to hear where that would be.

Welcome @jameekim! :wave: :sunglasses: :enos_flag:

Given your concerns it may be PSU, or GPU driver related, perhaps remove your GeForce RTX 4060Ti and see if you can reproduce the issue. You would need to ensure the integrated UHD Graphics 770 are enabled.

The reduced power draw would possibly place the PSU back in a more stable state.

If there is no issue with the GPU removed, perhaps we might have a closer look at the PSU and GPU.

What PSU are you using?

Thank you for the reply! I’ll try that out and see if it works. It’ll probably take a day or two for the reboot to happen if it comes back. :smiling_face_with_tear:

My PSU is Leadex III Gold 750W from Super Flower (product page: https://www.super-flower.com.tw/en/products/leadex-iii-gold-750w).

That PSU gets the “Tier A - High End” rating here. The only downside I can note, is this PSU is not ATX 3.0 rated, which your GPU would have benefited from. That’s not a deal-breaker, so I’m not suggesting you change it, it just would have been nice.

Your PSU spec says:

Comply With ATX 12V. v2.32 & EPS 12V. v2.92 Specification

I see. Thanks for the info.

I experienced another reboot about an hour ago. I had removed the NVIDIA GPU after adjusting system settings like KMS and checking the system was picking up the iGPU.

Another probe in case you want to take a look: https://linux-hardware.org/?probe=1ea143160c
I made it right after the reboot.

Welcome @jameekim

Edit: It would be helpful if you could post the hardware ouput with inxi -Faz | eos-sendlog and post the url from this command.

1 Like

Sure, here is the url: https://0x0.st/Xtt_.txt

@jameekim, you recently built this system and you describe it as your “new desktop”. Are you able to confirm the origin of your CPU? Was it purchased brand new, never before used?

I can see you have updated the BIOS to try and address the Intel CPU permadeath issues. I just want to check this wasn’t a previously owned part (being last-gen), with the potential to have a now permanent fault.

I’ve only used laptops before, so I bought all components new. I believe I purchased the CPU from Memory Express. It came in a box with a seal that I opened. I’ve been keeping the box just in case.

1 Like

I would be using the default settings for the UEFI Bios. How did you install hyprland? Did you use one of the install scripts?

I’m actually hesitant on using the default settings for the RAM because the model I have is DDR5-5200 but the BIOS set it to 5600 by default. Is it something that I shouldn’t worry too much about? Other settings, I’ve been switching back and forth. I can definitely try the default settings again for those.

I chose the no DE option in the installer and used the tty at first. I installed hyprland and other wayland stuff from the arch repo. I believe I also experienced a reboot or two while in tty.

Edit: I just want to mention just to be sure we’re on the same page that initially I had all BIOS settings at default. I started tweaking those after experiencing the reboots a few times.

Have you tried another desktop such as Kde? That may be an option which may help show if it’s really hardware or Hyprland set up?

Take this with a grain of salt but have you tried another kernel yet?
If using the Linux kernel, try the Linux-lts kernel.

I actually haven’t tried it because the reboots also happened with the installer ISO. I could try KDE for sure, but I was also thinking that I’d see some kind of log if it was a software problem. But I’m also aware that anything’s possible lol. I’ll put using KDE into my things-to-try list.

Are you using the latest ISO?

Edit: Taking 4 hours for the install bothers me too. :wink:

I haven’t tried it, but that’s definitely something I could try. Thanks for reminding me of that option. I almost forgot that was a thing :sweat_smile:

I haven’t tried using one since installing EOS. It was the latest version when I used it. Do you think it would be worth trying out a new one?

And it’s all good with the installation time lol. I just think it’s one of those fun events that happen with all software things. :smile:

Yeah the fact that it’s not presenting any useful information in your journal is a shame. The system saying it has no issue, while there clearly is, is a bit :face_with_spiral_eyes:

I can undestand why it would at least seem to be hardware.

Perhaps try reseating your RAM. Also confirm you have it in the correct slots. Check page 12 of the manual to confirm.

1 Like

With the hardware you have i would definitely make sure you are using the latest ISO.

1 Like

How did you install the heat protection of the CPU? If that’s not done right, the CPU can easily overheat.