Cold boot crash after switch to nvidia-open

After switching to nvidia-open yesterday, I used my system all day without a hitch. Everything seemed better, but I have no metrics to back that up. This morning, however, I was greeted with this lovely message:

Apr 22 08:37:29 eos-host kernel: BUG: kernel NULL pointer dereference, address: 000000000000001c
Apr 22 08:37:29 eos-host kernel: #PF: supervisor read access in kernel mode
Apr 22 08:37:29 eos-host kernel: #PF: error_code(0x0000) - not-present page
Apr 22 08:37:29 eos-host kernel: PGD 0 P4D 0 
Apr 22 08:37:29 eos-host kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
Apr 22 08:37:29 eos-host kernel: CPU: 12 UID: 0 PID: 1009 Comm: Xorg.wrap Tainted: G           O       6.14.2-lqx1-1-lqx #1
Apr 22 08:37:29 eos-host kernel: Tainted: [O]=OOT_MODULE
Apr 22 08:37:29 eos-host kernel: Hardware name: ASUS System Product Name/PRIME B760M-A AX, BIOS 1656 04/18/2024
Apr 22 08:37:29 eos-host kernel: RIP: 0010:nv_audio_dynamic_power+0xbd/0x130 [nvidia]
Apr 22 08:37:29 eos-host kernel: Code: d2 74 96 48 8b 82 a8 01 00 00 48 81 c2 a0 01 00 00 48 39 d0 75 14 eb 81 0f 1f 44 00 00 48 8b 40 08 48 39 d0 0f 84 6f ff ff ff <83> 78 1c 03 75 ed 48 8b 78 20 48 83 bf 58 03 00 00 00 0f 84 57 ff
Apr 22 08:37:29 eos-host kernel: RSP: 0018:ffffc90003f26d70 EFLAGS: 00010207
Apr 22 08:37:29 eos-host kernel: RAX: 0000000000000000 RBX: ffff888125620050 RCX: ffff888100e07c38
Apr 22 08:37:29 eos-host kernel: RDX: ffff8881808599a0 RSI: ffffc90003f26ce8 RDI: ffff888101e3e100
Apr 22 08:37:29 eos-host kernel: RBP: ffffc90003f26d88 R08: 0000000000000000 R09: 0000000000000000
Apr 22 08:37:29 eos-host kernel: R10: ffffffffa0ca8250 R11: ffffffffa0ca8290 R12: ffffc90003f26f70
Apr 22 08:37:29 eos-host kernel: R13: ffffc90003f26e38 R14: ffffffffa0ca7f00 R15: ffffffffa071c200
Apr 22 08:37:29 eos-host kernel: FS:  0000791908bd0740(0000) GS:ffff88885fc00000(0000) knlGS:0000000000000000
Apr 22 08:37:29 eos-host kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 22 08:37:29 eos-host kernel: CR2: 000000000000001c CR3: 0000000221bae003 CR4: 0000000000f72ef0
Apr 22 08:37:29 eos-host kernel: PKRU: 55555554
Apr 22 08:37:29 eos-host kernel: Call Trace:
Apr 22 08:37:29 eos-host kernel:  <TASK>
Apr 22 08:37:29 eos-host kernel:  subdeviceCtrlCmdOsUnixAudioDynamicPower_IMPL+0x27/0x2b [nvidia]
Apr 22 08:37:29 eos-host kernel:  ? resControl_IMPL+0x1bd/0x1d0 [nvidia]
Apr 22 08:37:29 eos-host kernel:  ? serverControl+0x48e/0x5c0 [nvidia]
Apr 22 08:37:29 eos-host kernel:  ? _rmapiRmControl+0x55c/0x810 [nvidia]
Apr 22 08:37:29 eos-host kernel:  ? rmapiControlWithSecInfo+0x7a/0x140 [nvidia]
Apr 22 08:37:29 eos-host kernel:  ? rmapiControlWithSecInfoTls+0x79/0xe0 [nvidia]
Apr 22 08:37:29 eos-host kernel:  ? _nv04ControlWithSecInfo+0x8d/0xa0 [nvidia]
Apr 22 08:37:29 eos-host kernel:  ? _nv04ControlWithSecInfo+0x8d/0xa0 [nvidia]
Apr 22 08:37:29 eos-host kernel:  ? Nv04ControlKernel+0x62/0x70 [nvidia]
Apr 22 08:37:29 eos-host kernel:  ? nvkms_call_rm+0x46/0x80 [nvidia_modeset]
Apr 22 08:37:29 eos-host kernel:  ? nvRmApiControl+0x5b/0x70 [nvidia_modeset]
Apr 22 08:37:29 eos-host kernel:  ? RmSetELDAudioCaps+0xb6/0x170 [nvidia_modeset]
Apr 22 08:37:29 eos-host kernel:  ? nvHdmiDpEnableDisableAudio+0xd8/0x380 [nvidia_modeset]
Apr 22 08:37:29 eos-host kernel:  ? KickoffProposedModeSetHwState+0xdc7/0xf00 [nvidia_modeset]
Apr 22 08:37:29 eos-host kernel:  ? nvSetDispModeEvo+0x135a/0x4300 [nvidia_modeset]
Apr 22 08:37:29 eos-host kernel:  ? Flip+0xe0/0xe0 [nvidia_modeset]
Apr 22 08:37:29 eos-host kernel:  ? nvKmsIoctl+0xe6/0x220 [nvidia_modeset]
Apr 22 08:37:29 eos-host kernel:  ? nvkms_ioctl_from_kapi_try_pmlock+0x60/0xb0 [nvidia_modeset]
Apr 22 08:37:29 eos-host kernel:  ? ApplyModeSetConfig+0x513/0xd00 [nvidia_modeset]
Apr 22 08:37:29 eos-host kernel:  ? drm_gem_plane_helper_prepare_fb+0x11/0x1f0 [drm_kms_helper]
Apr 22 08:37:29 eos-host kernel:  ? nv_drm_atomic_apply_modeset_config+0x4d7/0x820 [nvidia_drm]
Apr 22 08:37:29 eos-host kernel:  ? nv_drm_atomic_commit+0xe9/0x410 [nvidia_drm]
Apr 22 08:37:29 eos-host kernel:  ? drm_atomic_check_only+0x5c6/0xa20 [drm]
Apr 22 08:37:29 eos-host kernel:  ? drm_atomic_commit+0xab/0xe0 [drm]
Apr 22 08:37:29 eos-host kernel:  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
Apr 22 08:37:29 eos-host kernel:  ? drm_client_modeset_commit_atomic.constprop.0+0x1b2/0x1f0 [drm]
Apr 22 08:37:29 eos-host kernel:  ? drm_client_modeset_commit_locked+0x51/0x180 [drm]
Apr 22 08:37:29 eos-host kernel:  ? drm_client_modeset_commit+0x21/0x40 [drm]
Apr 22 08:37:29 eos-host kernel:  ? drm_fb_helper_lastclose+0x45/0x80 [drm_kms_helper]
Apr 22 08:37:29 eos-host kernel:  ? drm_fbdev_client_restore+0xd/0x20 [drm_client_lib]
Apr 22 08:37:29 eos-host kernel:  ? drm_client_dev_restore+0x6a/0x100 [drm]
Apr 22 08:37:29 eos-host kernel:  ? drm_release+0xfb/0x110 [drm]
Apr 22 08:37:29 eos-host kernel:  ? __fput+0xe2/0x2b0
Apr 22 08:37:29 eos-host kernel:  ? __x64_sys_close+0x8d/0x120
Apr 22 08:37:29 eos-host kernel:  ? do_syscall_64+0x4b/0x140
Apr 22 08:37:29 eos-host kernel:  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
Apr 22 08:37:29 eos-host kernel:  </TASK>
Apr 22 08:37:29 eos-host kernel: Modules linked in: bridge stp llc af_packet nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables cmac algif_hash algif_skcipher af_alg bnep nct6775 nct6775_core hwmon_vid ext4 mbcache nls_utf8 jbd2 nls_cp437 vfat fat snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_sdca rtw89_8852be snd_soc_avs rtw89_8852b snd_hda_codec_realtek intel_rapl_msr snd_soc_hda_codec rtw89_8852b_common intel_rapl_common snd_hda_ext_core snd_hda_codec_generic intel_uncore_frequency snd_hda_scodec_component intel_uncore_frequency_common
Apr 22 08:37:29 eos-host kernel:  intel_tcc_cooling rtw89_pci snd_soc_core x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi coretemp snd_compress ac97_bus rtw89_core snd_pcm_dmaengine snd_hda_intel kvm_intel btusb snd_intel_dspcfg spi_nor iTCO_wdt btrtl snd_intel_sdw_acpi intel_pmc_bxt mac80211 btintel asus_nb_wmi eeepc_wmi mtd iTCO_vendor_support spd5118 snd_hda_codec asus_wmi kvm btbcm snd_hda_core btmtk platform_profile sparse_keymap wmi_bmof cfg80211 pcspkr snd_hwdep efi_pstore i2c_i801 r8169 bluetooth spi_intel_pci i2c_smbus snd_pcm spi_intel libarc4 i2c_mux i2c_nvidia_gpu i2c_designware_platform i2c_ccgx_ucsi realtek snd_timer ses i2c_designware_core rfkill enclosure snd crc16 ccp scsi_transport_sas soundcore mousedev intel_pmc_core joydev pmt_telemetry pmt_class tpm_crb intel_vsec tpm_tis acpi_pad acpi_tad tpm_tis_core mei_me mei nfsd auth_rpcgss nfs_acl lockd grace nfs_localio sunrpc fuse loop dm_mod nfnetlink zram tpm libaescfb ecdh_generic rng_core ip_tables x_tables crc32c_generic uas usb_storage usbhid nvidia_drm(O)
Apr 22 08:37:29 eos-host kernel:  drm_client_lib nvidia_modeset(O) polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 drm_ttm_helper aesni_intel ttm crypto_simd cryptd agpgart drm_kms_helper intel_lpss_pci intel_lpss idma64 video wmi pinctrl_alderlake pkcs8_key_parser nvidia_uvm(O) vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd btrfs blake2b_generic xor raid6_pq nvidia(O) drm crypto_user dmi_sysfs
Apr 22 08:37:29 eos-host kernel: CR2: 000000000000001c
Apr 22 08:37:29 eos-host kernel: ---[ end trace 0000000000000000 ]---
Apr 22 08:37:29 eos-host kernel: RIP: 0010:nv_audio_dynamic_power+0xbd/0x130 [nvidia]
Apr 22 08:37:29 eos-host kernel: Code: d2 74 96 48 8b 82 a8 01 00 00 48 81 c2 a0 01 00 00 48 39 d0 75 14 eb 81 0f 1f 44 00 00 48 8b 40 08 48 39 d0 0f 84 6f ff ff ff <83> 78 1c 03 75 ed 48 8b 78 20 48 83 bf 58 03 00 00 00 0f 84 57 ff
Apr 22 08:37:29 eos-host kernel: RSP: 0018:ffffc90003f26d70 EFLAGS: 00010207
Apr 22 08:37:29 eos-host kernel: RAX: 0000000000000000 RBX: ffff888125620050 RCX: ffff888100e07c38
Apr 22 08:37:29 eos-host kernel: RDX: ffff8881808599a0 RSI: ffffc90003f26ce8 RDI: ffff888101e3e100
Apr 22 08:37:29 eos-host kernel: RBP: ffffc90003f26d88 R08: 0000000000000000 R09: 0000000000000000
Apr 22 08:37:29 eos-host kernel: R10: ffffffffa0ca8250 R11: ffffffffa0ca8290 R12: ffffc90003f26f70
Apr 22 08:37:29 eos-host kernel: R13: ffffc90003f26e38 R14: ffffffffa0ca7f00 R15: ffffffffa071c200
Apr 22 08:37:29 eos-host kernel: FS:  0000791908bd0740(0000) GS:ffff88885fc00000(0000) knlGS:0000000000000000
Apr 22 08:37:29 eos-host kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 22 08:37:29 eos-host kernel: CR2: 000000000000001c CR3: 0000000221bae003 CR4: 0000000000f72ef0
Apr 22 08:37:29 eos-host kernel: PKRU: 55555554

Thinking that it might be related to the linux-lqx kernel I’m trying, I rebooted again with the linux-lts kernel - same thing, system crashed on boot. After switching back to the closed-source nvidia drivers, I was able to boot my system again (cold and warm, multiple tests).

My system hardware:

System:
  Kernel: 6.14.2-lqx1-1-lqx arch: x86_64 bits: 64 compiler: gcc v: 14.2.1
    clocksource: tsc avail: acpi_pm parameters: audit=0 intel_pstate=disable
    amd_pstate=disable BOOT_IMAGE=/@/boot/vmlinuz-linux-lqx
    root=UUID=bdd04bd5-f872-4915-ab50-3d3429048bbf rw rootflags=subvol=@
    nowatchdog nvme_load=YES nvidia-drm.modeset=1 loglevel=3 intel_iommu=on
    iommu=pt rd.driver.pre=vfio-pci zswap.enabled=0
  Desktop: Xfce v: 4.20.1 tk: Gtk v: 3.24.48 wm: xfwm4 v: 4.20.0
    with: xfce4-panel tools: xfce4-screensaver dm: LightDM v: 1.32.0
    Distro: EndeavourOS base: Arch Linux
Machine:
  Type: Desktop System: ASUS product: N/A v: N/A serial: N/A
  Mobo: ASUSTeK model: PRIME B760M-A AX v: Rev 1.xx serial: <filter>
    part-nu: SKU uuid: 2b0940ce-c3ae-e8ae-9e8b-e89c256a7592
    UEFI: American Megatrends v: 1812 date: 01/21/2025
CPU:
  Info: model: 13th Gen Intel Core i5-13400F socket: LGA1700 bits: 64
    type: MST AMCP arch: Raptor Lake gen: core 13 level: v3 note: check
    built: 2022+ process: Intel 7 (10nm) family: 6 model-id: 0xBF (191)
    stepping: 2 microcode: 0x38
  Topology: cpus: 1x dies: 1 clusters: 7 cores: 10 threads: 16 mt: 6 tpc: 2
    st: 4 smt: enabled cache: L1: 864 KiB desc: d-4x32 KiB, 6x48 KiB; i-6x32
    KiB, 4x64 KiB L2: 9.5 MiB desc: 6x1.2 MiB, 1x2 MiB L3: 20 MiB
    desc: 1x20 MiB
  Speed (MHz): avg: 2501 min/max: 800/2501 boost: enabled
    base/boost: 4059/4600 scaling: driver: acpi-cpufreq governor: performance
    volts: 1.2 V ext-clock: 100 MHz cores: 1: 2501 2: 2501 3: 2501 4: 2501
    5: 2501 6: 2501 7: 2501 8: 2501 9: 2501 10: 2501 11: 2501 12: 2501
    13: 2501 14: 2501 15: 2501 16: 2501 bogomips: 79872
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
  Vulnerabilities:
  Type: gather_data_sampling status: Not affected
  Type: ghostwrite status: Not affected
  Type: itlb_multihit status: Not affected
  Type: l1tf status: Not affected
  Type: mds status: Not affected
  Type: meltdown status: Not affected
  Type: mmio_stale_data status: Not affected
  Type: reg_file_data_sampling mitigation: Clear Register File
  Type: retbleed status: Not affected
  Type: spec_rstack_overflow status: Not affected
  Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
    prctl
  Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
    sanitization
  Type: spectre_v2 mitigation: Enhanced / Automatic IBRS; IBPB:
    conditional; RSB filling; PBRSB-eIBRS: SW sequence; BHI: BHI_DIS_S
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: NVIDIA AD107 [GeForce RTX 4060] vendor: ZOTAC driver: nvidia
    v: 570.144 alternate: nouveau,nvidia_drm non-free: 550/565.xx+
    status: current (as of 2025-01) arch: Lovelace code: AD1xx
    process: TSMC n4 (5nm) built: 2022+ pcie: gen: 1 speed: 2.5 GT/s lanes: 8
    link-max: gen: 4 speed: 16 GT/s ports: active: none off: DP-2,DP-3
    empty: DP-1,HDMI-A-1 bus-ID: 01:00.0 chip-ID: 10de:2882 class-ID: 0300
  Device-2: NVIDIA TU116 [GeForce GTX 1660 SUPER] vendor: Dell
    driver: vfio-pci v: N/A alternate: nouveau,nvidia_drm,nvidia
    non-free: 550/565.xx+ status: current (as of 2025-01; EOL~2026-12-xx)
    arch: Turing code: TUxxx process: TSMC 12nm FF built: 2018-2022 pcie:
    gen: 3 speed: 8 GT/s lanes: 4 link-max: lanes: 16 bus-ID: 06:00.0
    chip-ID: 10de:21c4 class-ID: 0300
  Display: x11 server: X.Org v: 21.1.16 compositor: xfwm4 v: 4.20.0 driver:
    X: loaded: nvidia unloaded: modesetting alternate: fbdev,nouveau,nv,vesa
    gpu: nvidia,nvidia-nvswitch display-ID: :0.0 screens: 1
  Screen-1: 0 s-res: 3840x1080 s-dpi: 96 s-size: 1017x286mm (40.04x11.26")
    s-diag: 1056mm (41.59")
  Monitor-1: DP-2 note: disabled pos: left model: Acer XF270H B
    serial: <filter> built: 2019 res: mode: 1920x1080 hz: 144 scale: 100% (1)
    dpi: 82 gamma: 1.2 size: 598x336mm (23.54x13.23") diag: 686mm (27")
    ratio: 16:9 modes: max: 1920x1080 min: 640x480
  Monitor-2: DP-3 mapped: DP-4 note: disabled pos: primary,right
    model: Acer XF270H B serial: <filter> built: 2019 res: mode: 1920x1080
    hz: 144 scale: 100% (1) dpi: 82 gamma: 1.2 size: 598x336mm (23.54x13.23")
    diag: 686mm (27") ratio: 16:9 modes: max: 1920x1080 min: 640x480
  API: EGL v: 1.5 hw: drv: nvidia nouveau drv: nvidia platforms: device: 0
    drv: nvidia device: 1 drv: nouveau device: 2 drv: swrast gbm: drv: nvidia
    surfaceless: drv: nvidia x11: drv: nvidia inactive: wayland
  API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: nvidia mesa v: 570.144
    glx-v: 1.4 direct-render: yes renderer: NVIDIA GeForce RTX 4060/PCIe/SSE2
    memory: 7.81 GiB
  Info: Tools: api: eglinfo,glxinfo de: xfce4-display-settings
    gpu: nvidia-settings,nvidia-smi x11: xdpyinfo, xprop, xrandr
Audio:
  Device-1: Intel Raptor Lake High Definition Audio vendor: ASUSTeK
    driver: snd_hda_intel v: kernel alternate: snd_soc_avs,snd_sof_pci_intel_tgl
    bus-ID: 00:1f.3 chip-ID: 8086:7a50 class-ID: 0403
  Device-2: NVIDIA AD107 High Definition Audio vendor: ZOTAC
    driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 8
    bus-ID: 01:00.1 chip-ID: 10de:22be class-ID: 0403
  Device-3: NVIDIA TU116 High Definition Audio vendor: Dell driver: vfio-pci
    alternate: snd_hda_intel pcie: speed: Unknown lanes: 63 link-max: gen: 3
    speed: 8 GT/s bus-ID: 06:00.1 chip-ID: 10de:1aeb class-ID: 0403
  API: ALSA v: k6.14.2-lqx1-1-lqx status: kernel-api
    tools: alsactl,alsamixer,amixer
  Server-1: PipeWire v: 1.4.2 status: n/a (root, process) with:
    1: pipewire-pulse status: active 2: wireplumber status: active
    3: pipewire-alsa type: plugin 4: pw-jack type: plugin
    tools: pactl,pw-cat,pw-cli,wpctl
Network:
  Device-1: Realtek RTL8125 2.5GbE vendor: ASUSTeK driver: r8169 v: kernel
    pcie: gen: 2 speed: 5 GT/s lanes: 1 port: 6000 bus-ID: 04:00.0
    chip-ID: 10ec:8125 class-ID: 0200
  IF: enp4s0 state: down mac: <filter>
  Device-2: Realtek RTL8852BE PCIe 802.11ax Wireless Network
    vendor: AzureWave driver: rtw89_8852be v: kernel pcie: gen: 1
    speed: 2.5 GT/s lanes: 1 port: 5000 bus-ID: 05:00.0 chip-ID: 10ec:b852
    class-ID: 0280
  IF: wlan0 state: up mac: <filter>
  IF-ID-1: virbr0 state: down mac: <filter>
  Info: services: NetworkManager, nfsd, smbd, systemd-timesyncd,
    wpa_supplicant
Bluetooth:
  Device-1: IMC Networks Bluetooth Radio driver: btusb v: 0.8 type: USB
    rev: 1.0 speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-14:8 chip-ID: 13d3:3571
    class-ID: e001 serial: <filter>
  Report: btmgmt ID: hci0 rfk-id: 0 state: up address: <filter> bt-v: 5.2
    lmp-v: 11 status: discoverable: no pairing: no class-ID: 6c0104
Drives:
  Local Storage: total: 5.51 TiB used: 1.6 TiB (29.1%)
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Western Digital
    model: WD Blue SN580 1TB size: 931.51 GiB block-size: physical: 512 B
    logical: 512 B speed: 63.2 Gb/s lanes: 4 tech: SSD serial: <filter>
    fw-rev: 281010WD temp: 36.9 C scheme: GPT
  SMART: yes health: PASSED on: 264d 22h cycles: 611
    read-units: 89,125,387 [45.6 TB] written-units: 40,923,536 [20.9 TB]
  ID-2: /dev/sda maj-min: 8:0 vendor: SanDisk model: SSD PLUS 480GB
    family: Marvell based SSDs size: 447.13 GiB block-size: physical: 512 B
    logical: 512 B sata: 3.2 speed: 6.0 Gb/s tech: SSD serial: <filter>
    fw-rev: 7100 temp: 29 C scheme: GPT
  SMART: yes state: enabled health: PASSED on: 7d 21h cycles: 151
  ID-3: /dev/sdb maj-min: 8:16 vendor: SanDisk model: SDSSDH3512G
    size: 476.94 GiB block-size: physical: 512 B logical: 512 B sata: 3.3
    speed: 6.0 Gb/s tech: SSD serial: <filter> fw-rev: 7000 temp: 28 C
    scheme: GPT
  SMART: yes state: enabled health: PASSED on: 3y 262d 2h cycles: 1662
    read: 42.1 MiB written: 14 MiB
  ID-4: /dev/sdc maj-min: 8:32 vendor: Western Digital
    model: WD40EZRZ-00GXCB0 family: Blue size: 3.64 TiB block-size:
    physical: 4096 B logical: 512 B type: USB rev: 3.0 spd: 5 Gb/s lanes: 1
    mode: 3.2 gen-1x1 sata: 3.1 speed: 6.0 Gb/s tech: HDD rpm: 5400
    serial: <filter> fw-rev: 3002 drive-rev: 80.00A80 temp: 39 C scheme: GPT
  SMART: yes state: enabled health: PASSED on: 1y 82d 19h cycles: 8007
  ID-5: /dev/sdd maj-min: 8:48 vendor: SanDisk model: Ultra size: 57.28 GiB
    block-size: physical: 512 B logical: 512 B type: USB rev: 2.1 spd: 480 Mb/s
    lanes: 1 mode: 2.0 tech: N/A serial: <filter> fw-rev: 1.00 scheme: MBR
  SMART Message: Unknown USB bridge. Flash drive/Unsupported enclosure?
Partition:
  ID-1: / raw-size: 929.51 GiB size: 929.51 GiB (100.00%)
    used: 288.39 GiB (31.0%) fs: btrfs block-size: 4096 B dev: /dev/nvme0n1p4
    maj-min: 259:2
  ID-2: /boot/efi raw-size: 2 GiB size: 1.99 GiB (99.61%)
    used: 486.3 MiB (23.8%) fs: vfat block-size: 512 B dev: /dev/nvme0n1p1
    maj-min: 259:1
  ID-3: /home raw-size: 929.51 GiB size: 929.51 GiB (100.00%)
    used: 288.39 GiB (31.0%) fs: btrfs block-size: 4096 B dev: /dev/nvme0n1p4
    maj-min: 259:2
  ID-4: /var/log raw-size: 929.51 GiB size: 929.51 GiB (100.00%)
    used: 288.39 GiB (31.0%) fs: btrfs block-size: 4096 B dev: /dev/nvme0n1p4
    maj-min: 259:2
Swap:
  Kernel: swappiness: 180 (default 60) cache-pressure: 100 (default) zswap: no
  ID-1: swap-1 type: zram size: 15.57 GiB used: 0 KiB (0.0%) priority: 100
    comp: lz4 avail: lzo-rle,lzo,lz4hc,zstd,deflate,842 max-streams: 16
    dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 37.5 C mobo: 29.0 C gpu: nvidia temp: 37 C
  Fan Speeds (rpm): fan-1: 989 fan-2: 924 fan-3: 1015 fan-4: 1009
    gpu: nvidia fan: 37%
Info:
  Memory: total: 32 GiB available: 31.14 GiB used: 7.04 GiB (22.6%)
  Processes: 506 Power: uptime: 12m states: freeze,mem,disk suspend: deep
    avail: s2idle wakeups: 0 hibernate: platform avail: shutdown, reboot,
    suspend, test_resume image: 12.39 GiB services: power-profiles-daemon,
    upowerd, xfce4-power-manager Init: systemd v: 257 default: graphical
    tool: systemctl
  Packages: pm: pacman pkgs: 1495 libs: 393 tools: pacseek,yay Compilers:
    clang: 19.1.7 gcc: 14.2.1 Shell: Sudo (sudo) v: 1.9.16p2 default: Bash
    v: 5.2.37 running-in: xfce4-terminal inxi: 3.3.37

I updated the BIOS today (after the crashes), so the version listed is the latest one. I just don’t understand why a cold boot would fail, but a warm one would not.

Have you tried using nvidia-open-dkms (this needs also kernel headers)?

Sorry, I should have clarified. I had to use nvidia-open-dkms due to the non-standard kernel.

OK, thanks.
Maybe the lqx kernel isn’t compatible with nvidia-open, did you try the default kernel?

No, but I did try the LTS kernel and it crashed as well.

The same incompatibility might exist with the lts kernel.
NVIDIA recommends its open driver to all newer GPUs.

Well, I guess I’m stuck then as the lqx kernel is noticeably better when gaming. I think you’re on to something with the stock kernel, though. My (2) VMs with GTX 1660 GPUs cold-booted just fine today with the mainline kernel.

I’ll keep an eye on this.

UPDATE: started having some major graphics issues after the last kernel update. I also noticed that the nvidia-utils packages were still on 570.133, while the driver itself was at 570.144. I tried nvidia-open-dkms again after updating the kernel to 6.14.3, and now everything seems to be back to normal again.

Not sure why the entire group of nvidia packages was not updated at the same time, but I think it’s definitely related to my issue. So far, my first cold-boot test was a success.

1 Like

I also read about problems Nvidia had very recently with the drivers. Likely related.

Yeah, I think the 575 drivers will fix a lot of lingering problems, especially related to slow memory burn. I wanted to test it with the latest 6.15RC kernel, but I can’t get any Nvidia driver version to build.