Unrecoverable system freezes under stress?

I am on an Xfce install. Since my computer is a laptop with an Nvidia GPU with Optimus, I use optimus-manager to use my Nvidia GPU rather than hybrid or Intel graphics. All this information may or may not be related to the problem.

When playing games, things will be working fine until I will hit a wall of stuttering and lag that gets worse until it freezes the system completely in a few seconds. Even if I manage to run xkill in time, the system still freezes. These freezes only happen when the system is under stress, since I can do simple things like web browsing for hours. Just now my system crashed when I ran mprime to test my CPU.

There is a segfault I see in journalctl when it happens. Here is journalctl from my last boot which ended in a crash:
https://clbin.com/CfzsU

System info:
https://clbin.com/7pU0w

What is causing the system to completely die and how do I fix this?

Could be thermal throttling, did you also check temperature?

Just FYI I had hardware with hybrid graphics and same Nvidia GPU without that problem, also using xfce. But every hardware is different and also I did not partition in Btrfs but don’t think that should be the problem. Did you allow over clocking in bios?

You could also try booting with only dedicated GPU and blacklist intel. Optimus is just crappy. Never had a great experience.

I have had previous installs on this machine running correctly, but if the nvidia stuff is causing the problem, I would think it would be due to me configuring something incorrectly, since there are like so many places nvidia configs can end up. I checked my temperatures and don’t see anything out of the order. I haven’t done any overclocking stuff.

1 Like

From my own experience, I generally don’t need to do much config beyond installing the Nvidia driver.

Would be good to let us know how you configured Nvidia, did you just install the driver when you booted the iso and installed eos? Or installed the driver afterwards? And/or did you change any additional configs and if so which file (post it’s content)?

Just in case I normally use:

sudo nvidia-installer-dkms

Following this guide and additional arch wiki links if necessary.

https://discovery.endeavouros.com/nvidia/nvidia-installer/2021/03/

And Optimus guide in case you missed it.

https://discovery.endeavouros.com/nvidia/optimus-manager-for-nvidia/2021/03/

I still highly recommend trying the dedicated GPU only (and blacklist intel, you can try and revert afterwards, if interested we can try that). The whole hybrid graphics with Nvidia is a crap show in general. I will never ever commit to this type of hardware in the future. Nvidia is proprietary and make our life hard on Linux…

I used
optimus-manager --switch nvidia
which is what I normally do, and this worked on past (vanilla arch) installs.

However, before that I did do what you did following this for lightdm
https://discovery.endeavouros.com/nvidia/optimus-switch-another-solution-for-optimus-laptops/2021/04/

But I actually used its uninstall script because I wasn’t quite sure if my GPU was being utilized or not. So if its installation and uninstallation left any dirty configs somewhere that could be related. Still not certain if this is Nvidia related or not, as my GPU does seem to be working in games and stuff, but I could try blacklisting Intel if that is the issue.

1 Like

Ok! Would be nice if you can post the result of this: pacman -Q | grep nvidia

@ricklinux maybe able to help with checking the config files, he is the hardware pro. I can never remember where these are located and what to change :grin:

If you go the blacklist route, I highly recommend to first switch off the Intel in bios (or switch to only dedicated) then blacklist intel in grub by pressing e on grub.

If that works better it is possible to add this to grub to make it persistent. However, no hybrid mode in that case.

Another option that comes to my mind is to try to switch between lts or current kernel and try it out. These can be installed easily via the welcome app, or terminal if you feel comfortable.

Here is the nvidia packages I have installed:

lib32-nvidia-utils 515.76-1
nvidia 515.76-2
nvidia-hook 1.0-1
nvidia-inst 1.2-1
nvidia-installer-common 1.3-1
nvidia-installer-db 2.5.8-1
nvidia-installer-dkms 3.5-1
nvidia-prime 1.0-4
nvidia-settings 515.76-1
nvidia-utils 515.76-1

(I also have optimus-manager 1.4-4)

I’ll see about doing the blacklist stuff when I wake up tomorrow, since I am already only ever using nvidia mode on optimus-manager.

I maybe wrong but seems you have a lot of additional packages. For example nvidia-prime and nvidia-hook. I will check on my laptop but suggest to make sure to have the same packages that install via nvidia-dkms. In principle these should include this with correct version.

lib32-nvidia-utils xxx version
nvidia-dkms xxx version
nvidia-installer-db xxx version
nvidia-installer-dkms xxx version
nvidia-settings xxx version
nvidia-utils xxx version

Also btw welcome to the forum!!! :grin:

This is what i have on my Nvidia GTX 1060 but is a desktop.

[ricklinux@eos-xfce ~]$ pacman -Qs nvidia
local/egl-wayland 2:1.1.11-2
    EGLStream-based Wayland external platform
local/lib32-libvdpau 1.5-1
    Nvidia VDPAU library
local/libvdpau 1.5-1
    Nvidia VDPAU library
local/libxnvctrl 515.76-1
    NVIDIA NV-CONTROL X extension
local/nvidia-dkms 515.76-1
    NVIDIA drivers - module sources
local/nvidia-hook 1.0-1
    pacman hook for nvidia
local/nvidia-installer-common 1.3-1
    Common scripts for nvidia-installer-dkms and nvidia-inst
local/nvidia-installer-db 2.5.8-1
    Database for the script to setup nvidia drivers in EndeavourOS
local/nvidia-installer-dkms 3.5-1
    Script to setup nvidia drivers (dkms version) in EndeavourOS
local/nvidia-settings 515.76-1
    Tool for configuring the NVIDIA graphics driver
local/nvidia-utils 515.76-1
    NVIDIA drivers utilities
[ricklinux@eos-xfce ~]$ 

Edit: This is running on xfce

1 Like

Could nvidia-prime conflict?

Possibly? I would not have prime installed and Optimus.

Edit: There is optimus-switch and optimus-manager

Edit: I think optimus-switch uses prime?

https://discovery.endeavouros.com/nvidia/optimus-switch-another-solution-for-optimus-laptops/2021/04/

I think i would be trying one or the other and see what works for the hardware. Maybe one or other works better?

Edit: I think optimus-manager is easier to setup.

Seems that the Nvidia install or configs need to be cleaned up. Especially, don’t use different methods at the same time as they are not compatible…

I would also note this about Optimus manager

  • Custom Xorg config : optimus-manager works by auto-generating a Xorg configuration file and putting it into /etc/X11/xorg.conf.d/. If you already have custom Xorg configuration files at that location or at /etc/X11/xorg.conf , it is strongly advised that you remove anything GPU-related from them to make sure that they do not interfere with the GPU switching process.
  • Nvidia-generated Xorg config : Similarly, if you have ever used the nvidia-xconfig utility or the Save to X Configuration File button in the Nvidia control panel, a Xorg config file may have been generated at /etc/X11/xorg.conf . It is highly recommended to delete it before trying to switch GPUs.

nvidia-dkms not installed neither. Need to run the dkms installer… Perhaps a good idea to remove all the nvidia drivers and start again from scratch following the eos wiki I posted above with sudo nvidia-installer-dkms. The reason is that it will work for future kernel updates I believe and if you have lts, that will also work and install the drivers…

Yes … A lot of nvidia users tend to use the nvidia settings and it creates an xorg file which you don’t want to have.

[ricklinux@eos-xfce ~]$ cat /etc/X11/xorg.conf
cat: /etc/X11/xorg.conf: No such file or directory
[ricklinux@eos-xfce ~]$ 

Edit: The conf file should be located here for basic xorg file if required

/etc/X11/xorg.conf.d/20-nvidia.conf

Edit2: I have no conf files

[ricklinux@eos-xfce ~]$ cat /etc/X11/xorg.conf.d/20-nvidia.conf
cat: /etc/X11/xorg.conf.d/20-nvidia.conf: No such file or directory
[ricklinux@eos-xfce ~]$ 

Edit3: I have a desktop GTX 1060 card and i am dual booting with Windows 11 using grub. I have not had any of the issues that other users have had with nvidia or grub on this machine. :man_shrugging:

1 Like

Hi, sorry for the late reply but I did reinstall EndeavourOS to clean up whatever config mess I had and also because I wanted to switch to i3 anyway. As I was reinstalling I noticed I only had half the ram I was supposed to, which explains the freezing and why it happened so easily. I just had to pop both sticks out and put them back in haha. I now am using optimus manager on permanent nvidia mode and everything works. Also thanks for the tip with the nvidia settings thing, because I always have to force full composition pipeline to stop screen tearing, but yeah I never knew where that config was supposed to actually go.