Crash, related to GPU, only in Linux, not Windows

EDIT: For anyone saying it’s hardware - why doesn’t it also crash in Windows when I’m gaming for several hours straight multiple nights a week?

I split this off from my other thread “Random lag in interface” since the issue has evolved into a full blown crash, and logs seem to indicate it’s related to the GPU. I really want to know if this could be a software issue instead of a hardware issue.

If I attempt this without the video card (eg. a different one or just use onboard video) and the crash goes away, how can I be sure it’s not just a software issue with this particular kind of video card? I have not had these crashes in Windows and that’s where I really hit the graphics card hard and do all my games. Whereas in Linux, this crash just happens when I’m browsing, or typing a LibreOffice document, or just doing nothing much at all - scrolling through a web page or filling out a form.

I’ve cleaned the contacts on the video card using contact cleaner and a tooth brush, both the pins and the PCI-e socket.

Here’s the error. I’m up to my 4th crash today and the output logs all look very similar to this; (these logs are obtained after hard resetting and running the command journalctl -b -1 | tail -n400

May 30 17:18:12 domarius-endeavouros rtkit-daemon[1075]: Supervising 9 threads of 6 processes of 1 users.
May 30 17:18:34 domarius-endeavouros kernel: NVRM: GPU at PCI:0000:01:00: GPU-2cb4308a-b22b-7d66-7401-2fea800cec5d
May 30 17:18:34 domarius-endeavouros kernel: NVRM: Xid (PCI:0000:01:00): 62, 00011fab 00012007 00011b38 000159fb 00015e06 00013e17 00000011 00000000
May 30 17:18:40 domarius-endeavouros kernel: NVRM: Xid (PCI:0000:01:00): 119, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x20800a56 0x5c).
May 30 17:18:40 domarius-endeavouros kernel: NVRM: GPU0 GSP RPC buffer contains function 76 (GSP_RM_CONTROL) and data 0x0000000020800a56 0x000000000000005c.
May 30 17:18:40 domarius-endeavouros kernel: NVRM: GPU0 RPC history (CPU -> GSP):
May 30 17:18:40 domarius-endeavouros kernel: NVRM:     entry function                   data0              data1              ts_start           ts_end             duration actively_polling
May 30 17:18:40 domarius-endeavouros kernel: NVRM:      0    76   GSP_RM_CONTROL        0x0000000020800a56 0x000000000000005c 0x0006365538dda64c 0x0000000000000000          y
May 30 17:18:40 domarius-endeavouros kernel: NVRM:     -1    76   GSP_RM_CONTROL        0x00000000c3700104 0x0000000000000014 0x0006365538a098b3 0x0006365538a099c1    270us  
May 30 17:18:40 domarius-endeavouros kernel: NVRM:     -2    76   GSP_RM_CONTROL        0x00000000c3700104 0x0000000000000014 0x0006365538a095a0 0x0006365538a098a7    775us  
May 30 17:18:40 domarius-endeavouros kernel: NVRM:     -3    76   GSP_RM_CONTROL        0x00000000c3700104 0x0000000000000014 0x0006365538a07e09 0x0006365538a0838f   1414us  
May 30 17:18:40 domarius-endeavouros kernel: NVRM:     -4    76   GSP_RM_CONTROL        0x00000000c3700104 0x0000000000000014 0x0006365538a0780c 0x0006365538a07df8   1516us  
May 30 17:18:40 domarius-endeavouros kernel: NVRM:     -5    76   GSP_RM_CONTROL        0x00000000c3700104 0x0000000000000014 0x0006365538a05844 0x0006365538a05a03    447us  
May 30 17:18:40 domarius-endeavouros kernel: NVRM:     -6    76   GSP_RM_CONTROL        0x00000000c3700104 0x0000000000000014 0x0006365538a054a6 0x0006365538a05836    912us  
May 30 17:18:40 domarius-endeavouros kernel: NVRM:     -7    76   GSP_RM_CONTROL        0x00000000c3700104 0x0000000000000014 0x0006365538a034d1 0x0006365538a037c2    753us  
May 30 17:18:40 domarius-endeavouros kernel: NVRM: GPU0 RPC event history (CPU <- GSP):
May 30 17:18:40 domarius-endeavouros kernel: NVRM:     entry function                   data0              data1              ts_start           ts_end             duration during_incomplete_rpc
May 30 17:18:40 domarius-endeavouros kernel: NVRM:      0    4130 RECOVERY_ACTION       0x0000000000000000 0x0000000000000000 0x0006365538dda616 0x0006365538dda618      2us  
May 30 17:18:40 domarius-endeavouros kernel: NVRM:     -1    4102 OS_ERROR_LOG          0x0000000000000000 0x0000000000000000 0x0006365538dda60a 0x0006365538dda615     11us  
May 30 17:18:40 domarius-endeavouros kernel: NVRM:     -2    4128 GSP_POST_NOCAT_RECORD 0x0000000000000003 0x0000000000011fab 0x0006365538dda608 0x0006365538dda609      1us  
May 30 17:18:40 domarius-endeavouros kernel: NVRM:     -3    4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000057f1344 0x0006365527b00c69 0x0006365527b00c6c      3us  
May 30 17:18:40 domarius-endeavouros kernel: NVRM:     -4    4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000057f1344 0x00063655252ec88d 0x00063655252ec88f      2us  
May 30 17:18:40 domarius-endeavouros kernel: NVRM:     -5    4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000057f1344 0x00063655219f62d8 0x00063655219f62da      2us  
May 30 17:18:40 domarius-endeavouros kernel: NVRM:     -6    4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000057f1344 0x00063655210ddf51 0x00063655210ddf53      2us  
May 30 17:18:40 domarius-endeavouros kernel: NVRM:     -7    4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000057f1344 0x000636550fd909b6 0x000636550fd909b9      3us  
May 30 17:18:40 domarius-endeavouros kernel: CPU: 7 UID: 0 PID: 442 Comm: nv_queue Tainted: P           OE      6.14.6-arch1-1 #1 9658fd36a89bb82f508ea2dcbad8e1444239d436
May 30 17:18:40 domarius-endeavouros kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
May 30 17:18:40 domarius-endeavouros kernel: Hardware name: System manufacturer System Product Name/Z170-AR, BIOS 3801 03/14/2018
May 30 17:18:40 domarius-endeavouros kernel: Call Trace:
May 30 17:18:40 domarius-endeavouros kernel:  <TASK>
May 30 17:18:40 domarius-endeavouros kernel:  dump_stack_lvl+0x5d/0x80
May 30 17:18:40 domarius-endeavouros kernel:  _nv013207rm+0x508/0x5b0 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:18:40 domarius-endeavouros kernel:  _nv013118rm+0x74/0x330 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:18:40 domarius-endeavouros kernel:  _nv051942rm+0x49f/0x7f0 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:18:40 domarius-endeavouros kernel:  _nv029397rm+0x106/0x150 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:18:40 domarius-endeavouros kernel:  _nv044125rm+0xa0/0xf0 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:18:40 domarius-endeavouros kernel:  _nv044127rm+0x43/0x50 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:18:40 domarius-endeavouros kernel:  _nv012794rm+0xe8/0x180 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:18:40 domarius-endeavouros kernel:  ? _nv012794rm+0xa7/0x180 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:18:40 domarius-endeavouros kernel:  ? __pfx__main_loop+0x10/0x10 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:18:40 domarius-endeavouros kernel:  rm_execute_work_item+0x13e/0x1f0 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:18:40 domarius-endeavouros kernel:  os_execute_work_item+0x68/0x90 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:18:40 domarius-endeavouros kernel:  _main_loop+0x90/0x150 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:18:40 domarius-endeavouros kernel:  ? __pfx__main_loop+0x10/0x10 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:18:40 domarius-endeavouros kernel:  kthread+0xec/0x230
May 30 17:18:40 domarius-endeavouros kernel:  ? __pfx_kthread+0x10/0x10
May 30 17:18:40 domarius-endeavouros kernel:  ret_from_fork+0x31/0x50
May 30 17:18:40 domarius-endeavouros kernel:  ? __pfx_kthread+0x10/0x10
May 30 17:18:40 domarius-endeavouros kernel:  ret_from_fork_asm+0x1a/0x30
May 30 17:18:40 domarius-endeavouros kernel:  </TASK>
May 30 17:18:40 domarius-endeavouros kernel: NVRM: Xid (PCI:0000:01:00): 154, GPU recovery action changed from 0x0 (None) to 0x1 (GPU Reset Required)
May 30 17:18:50 domarius-endeavouros kernel: NVRM: Xid (PCI:0000:01:00): 119, pid=1796, name=Renderer, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x20800a6a 0x0).
May 30 17:18:56 domarius-endeavouros kernel: NVRM: Xid (PCI:0000:01:00): 119, pid=1107, name=QSGRenderThread, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 10 (FREE) (0xbeea0403 0x0).
May 30 17:19:02 domarius-endeavouros kernel: NVRM: Rate limiting GSP RPC error prints for GPU at PCI:0000:01:00 (printing 1 of every 30).  The GPU likely needs to be reset.
May 30 17:19:15 domarius-endeavouros rtkit-daemon[1075]: Supervising 9 threads of 6 processes of 1 users.
May 30 17:19:15 domarius-endeavouros rtkit-daemon[1075]: Supervising 9 threads of 6 processes of 1 users.
May 30 17:20:08 domarius-endeavouros kernel: NVRM: Xid (PCI:0000:01:00): 16, Head 00000001 Count 0000acf3
May 30 17:20:14 domarius-endeavouros kernel: NVRM: Xid (PCI:0000:01:00): 16, Head 00000002 Count 0000acb0
May 30 17:20:20 domarius-endeavouros kernel: NVRM: Xid (PCI:0000:01:00): 16, Head 00000003 Count 0000ad28
May 30 17:20:21 domarius-endeavouros rtkit-daemon[1075]: Supervising 9 threads of 6 processes of 1 users.
May 30 17:20:21 domarius-endeavouros rtkit-daemon[1075]: Supervising 9 threads of 6 processes of 1 users.
May 30 17:21:07 domarius-endeavouros systemd[1]: Starting Cleanup of Temporary Directories...
May 30 17:21:08 domarius-endeavouros systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
May 30 17:21:08 domarius-endeavouros systemd[1]: Finished Cleanup of Temporary Directories.
May 30 17:23:02 domarius-endeavouros kernel: NVRM: Xid (PCI:0000:01:00): 119, pid=1107, name=CPMMListener, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 10 (FREE) (0xbeeda0b5 0x0).
May 30 17:23:32 domarius-endeavouros kernel: NVRM: Xid (PCI:0000:01:00): 16, Head 00000001 Count 0000acf7
May 30 17:23:38 domarius-endeavouros kernel: NVRM: Xid (PCI:0000:01:00): 16, Head 00000002 Count 0000acb4
May 30 17:23:44 domarius-endeavouros kernel: NVRM: Xid (PCI:0000:01:00): 16, Head 00000003 Count 0000ad2c
May 30 17:24:27 domarius-endeavouros kernel: INFO: task joplin:3094 blocked for more than 122 seconds.
May 30 17:24:27 domarius-endeavouros kernel:       Tainted: P           OE      6.14.6-arch1-1 #1
May 30 17:24:27 domarius-endeavouros kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 30 17:24:27 domarius-endeavouros kernel: task:joplin          state:D stack:0     pid:3094  tgid:3094  ppid:3067   task_flags:0x400040 flags:0x00000006
May 30 17:24:27 domarius-endeavouros kernel: Call Trace:
May 30 17:24:27 domarius-endeavouros kernel:  <TASK>
May 30 17:24:27 domarius-endeavouros kernel:  __schedule+0x401/0x1350
May 30 17:24:27 domarius-endeavouros kernel:  schedule+0x27/0xf0
May 30 17:24:27 domarius-endeavouros kernel:  schedule_preempt_disabled+0x15/0x30
May 30 17:24:27 domarius-endeavouros kernel:  rwsem_down_write_slowpath+0x1e5/0x680
May 30 17:24:27 domarius-endeavouros kernel:  ? ____sys_recvmsg+0x96/0x210
May 30 17:24:27 domarius-endeavouros kernel:  down_write+0x5a/0x60
May 30 17:24:27 domarius-endeavouros kernel:  os_acquire_rwlock_write+0x2b/0x40 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:24:27 domarius-endeavouros kernel:  _nv049912rm+0x10/0x40 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:24:27 domarius-endeavouros kernel:  _nv051352rm+0x284/0x360 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:24:27 domarius-endeavouros kernel:  _nv053381rm+0x54/0xd0 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:24:27 domarius-endeavouros kernel:  _nv053306rm+0x185/0x470 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:24:27 domarius-endeavouros kernel:  _nv051309rm+0x171/0x300 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:24:27 domarius-endeavouros kernel:  _nv051310rm+0x5c/0x90 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:24:27 domarius-endeavouros kernel:  _nv015000rm+0x23/0x30 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:24:27 domarius-endeavouros kernel:  ? get_page_from_freelist+0x33f/0x1640
May 30 17:24:27 domarius-endeavouros kernel:  _nv015022rm+0x4f/0x90 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:24:27 domarius-endeavouros kernel:  _nv013437rm+0xc8/0x120 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:24:27 domarius-endeavouros kernel:  _nv000728rm+0x60/0x70 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:24:27 domarius-endeavouros kernel:  _nv000648rm+0x31/0x40 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:24:27 domarius-endeavouros kernel:  _nv000778rm+0x486/0xe00 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:24:27 domarius-endeavouros kernel:  rm_ioctl+0x7f/0x400 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:24:27 domarius-endeavouros kernel:  nvidia_unlocked_ioctl+0x528/0x8d0 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:24:27 domarius-endeavouros kernel:  __x64_sys_ioctl+0x94/0xc0
May 30 17:24:27 domarius-endeavouros kernel:  do_syscall_64+0x7b/0x190
May 30 17:24:27 domarius-endeavouros kernel:  ? __sys_recvmsg+0x9a/0xe0
May 30 17:24:27 domarius-endeavouros kernel:  ? __count_memcg_events+0xb0/0x150
May 30 17:24:27 domarius-endeavouros kernel:  ? count_memcg_events.constprop.0+0x1a/0x30
May 30 17:24:27 domarius-endeavouros kernel:  ? handle_mm_fault+0x1b6/0x2c0
May 30 17:24:27 domarius-endeavouros kernel:  ? do_user_addr_fault+0x36c/0x640
May 30 17:24:27 domarius-endeavouros kernel:  ? irqentry_exit_to_user_mode+0x2c/0x1b0
May 30 17:24:27 domarius-endeavouros kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
May 30 17:24:27 domarius-endeavouros kernel: RIP: 0033:0x7c0b6ae5fecd
May 30 17:24:27 domarius-endeavouros kernel: RSP: 002b:00007fffe391b380 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 30 17:24:27 domarius-endeavouros kernel: RAX: ffffffffffffffda RBX: 00007fffe391b580 RCX: 00007c0b6ae5fecd
May 30 17:24:27 domarius-endeavouros kernel: RDX: 00007fffe391b580 RSI: 00000000c0b8464a RDI: 0000000000000016
May 30 17:24:27 domarius-endeavouros kernel: RBP: 00007fffe391b3d0 R08: 00007fffe391b580 R09: 00007fffe391b594
May 30 17:24:27 domarius-endeavouros kernel: R10: 0000051401783880 R11: 0000000000000246 R12: 0000000000000016
May 30 17:24:27 domarius-endeavouros kernel: R13: 00007fffe391b594 R14: 0000000068395ca6 R15: 00007fffe391b3e0
May 30 17:24:27 domarius-endeavouros kernel:  </TASK>
May 30 17:26:08 domarius-endeavouros kernel: NVRM: Xid (PCI:0000:01:00): 16, Head 00000001 Count 0000acfa
May 30 17:26:14 domarius-endeavouros kernel: NVRM: Xid (PCI:0000:01:00): 16, Head 00000002 Count 0000acb7
May 30 17:26:20 domarius-endeavouros kernel: NVRM: Xid (PCI:0000:01:00): 16, Head 00000003 Count 0000ad2f
May 30 17:26:30 domarius-endeavouros kernel: INFO: task nvidia-modeset/:438 blocked for more than 122 seconds.
May 30 17:26:30 domarius-endeavouros kernel:       Tainted: P           OE      6.14.6-arch1-1 #1
May 30 17:26:30 domarius-endeavouros kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 30 17:26:30 domarius-endeavouros kernel: task:nvidia-modeset/ state:D stack:0     pid:438   tgid:438   ppid:2      task_flags:0x208040 flags:0x00004000
May 30 17:26:30 domarius-endeavouros kernel: Call Trace:
May 30 17:26:30 domarius-endeavouros kernel:  <TASK>
May 30 17:26:30 domarius-endeavouros kernel:  __schedule+0x401/0x1350
May 30 17:26:30 domarius-endeavouros kernel:  ? __schedule+0x409/0x1350
May 30 17:26:30 domarius-endeavouros kernel:  schedule+0x27/0xf0
May 30 17:26:30 domarius-endeavouros kernel:  schedule_timeout+0xbf/0x100
May 30 17:26:30 domarius-endeavouros kernel:  __down_common+0x103/0x240
May 30 17:26:30 domarius-endeavouros kernel:  ? __pfx__main_loop+0x10/0x10 [nvidia_modeset 2181e8cb5814ec4ada887364c5ea7ddb123e105c]
May 30 17:26:30 domarius-endeavouros kernel:  down+0x47/0x60
May 30 17:26:30 domarius-endeavouros kernel:  nvkms_kthread_q_callback+0xae/0x160 [nvidia_modeset 2181e8cb5814ec4ada887364c5ea7ddb123e105c]
May 30 17:26:30 domarius-endeavouros kernel:  _main_loop+0x90/0x150 [nvidia_modeset 2181e8cb5814ec4ada887364c5ea7ddb123e105c]
May 30 17:26:30 domarius-endeavouros kernel:  ? __pfx__main_loop+0x10/0x10 [nvidia_modeset 2181e8cb5814ec4ada887364c5ea7ddb123e105c]
May 30 17:26:30 domarius-endeavouros kernel:  kthread+0xec/0x230
May 30 17:26:30 domarius-endeavouros kernel:  ? __pfx_kthread+0x10/0x10
May 30 17:26:30 domarius-endeavouros kernel:  ret_from_fork+0x31/0x50
May 30 17:26:30 domarius-endeavouros kernel:  ? __pfx_kthread+0x10/0x10
May 30 17:26:30 domarius-endeavouros kernel:  ret_from_fork_asm+0x1a/0x30
May 30 17:26:30 domarius-endeavouros kernel:  </TASK>
May 30 17:26:30 domarius-endeavouros kernel: INFO: task Xorg:929 blocked for more than 122 seconds.
May 30 17:26:30 domarius-endeavouros kernel:       Tainted: P           OE      6.14.6-arch1-1 #1
May 30 17:26:30 domarius-endeavouros kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 30 17:26:30 domarius-endeavouros kernel: task:Xorg            state:D stack:0     pid:929   tgid:929   ppid:910    task_flags:0x400100 flags:0x00000006
May 30 17:26:30 domarius-endeavouros kernel: Call Trace:
May 30 17:26:30 domarius-endeavouros kernel:  <TASK>
May 30 17:26:30 domarius-endeavouros kernel:  __schedule+0x401/0x1350
May 30 17:26:30 domarius-endeavouros kernel:  schedule+0x27/0xf0
May 30 17:26:30 domarius-endeavouros kernel:  schedule_preempt_disabled+0x15/0x30
May 30 17:26:30 domarius-endeavouros kernel:  rwsem_down_read_slowpath+0x25e/0x4c0
May 30 17:26:30 domarius-endeavouros kernel:  ? __memcg_slab_free_hook+0xf7/0x140
May 30 17:26:30 domarius-endeavouros kernel:  down_read+0x48/0xb0
May 30 17:26:30 domarius-endeavouros kernel:  os_acquire_rwlock_read+0x2b/0x40 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv049910rm+0x10/0x40 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv051352rm+0x2c4/0x360 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv053381rm+0x54/0xd0 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv053312rm+0xa0/0x500 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv015043rm+0x424/0x680 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv051326rm+0x69/0xd0 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv013401rm+0x83/0xa0 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv000631rm+0x5e/0x70 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  rm_kernel_rmapi_op+0x167/0x273 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  nvkms_call_rm+0x4a/0x80 [nvidia_modeset 2181e8cb5814ec4ada887364c5ea7ddb123e105c]
May 30 17:26:30 domarius-endeavouros kernel:  _nv003116kms+0x42/0x50 [nvidia_modeset 2181e8cb5814ec4ada887364c5ea7ddb123e105c]
May 30 17:26:30 domarius-endeavouros kernel:  ? _nv002787kms+0x52/0xb0 [nvidia_modeset 2181e8cb5814ec4ada887364c5ea7ddb123e105c]
May 30 17:26:30 domarius-endeavouros kernel:  ? _nv003010kms+0xd0/0x190 [nvidia_modeset 2181e8cb5814ec4ada887364c5ea7ddb123e105c]
May 30 17:26:30 domarius-endeavouros kernel:  ? _nv000555kms+0x1b3/0x210 [nvidia_modeset 2181e8cb5814ec4ada887364c5ea7ddb123e105c]
May 30 17:26:30 domarius-endeavouros kernel:  ? _nv000088kms+0x30/0x30 [nvidia_modeset 2181e8cb5814ec4ada887364c5ea7ddb123e105c]
May 30 17:26:30 domarius-endeavouros kernel:  ? nvKmsIoctl+0xf7/0x270 [nvidia_modeset 2181e8cb5814ec4ada887364c5ea7ddb123e105c]
May 30 17:26:30 domarius-endeavouros kernel:  ? nvkms_unlocked_ioctl+0x110/0x180 [nvidia_modeset 2181e8cb5814ec4ada887364c5ea7ddb123e105c]
May 30 17:26:30 domarius-endeavouros kernel:  ? __x64_sys_ioctl+0x94/0xc0
May 30 17:26:30 domarius-endeavouros kernel:  ? do_syscall_64+0x7b/0x190
May 30 17:26:30 domarius-endeavouros kernel:  ? syscall_exit_to_user_mode+0x37/0x1c0
May 30 17:26:30 domarius-endeavouros kernel:  ? do_syscall_64+0x87/0x190
May 30 17:26:30 domarius-endeavouros kernel:  ? do_syscall_64+0x87/0x190
May 30 17:26:30 domarius-endeavouros kernel:  ? do_syscall_64+0x87/0x190
May 30 17:26:30 domarius-endeavouros kernel:  ? syscall_exit_to_user_mode+0x37/0x1c0
May 30 17:26:30 domarius-endeavouros kernel:  ? syscall_exit_to_user_mode+0x37/0x1c0
May 30 17:26:30 domarius-endeavouros kernel:  ? syscall_exit_to_user_mode+0x37/0x1c0
May 30 17:26:30 domarius-endeavouros kernel:  ? do_syscall_64+0x87/0x190
May 30 17:26:30 domarius-endeavouros kernel:  ? do_syscall_64+0x87/0x190
May 30 17:26:30 domarius-endeavouros kernel:  ? do_syscall_64+0x87/0x190
May 30 17:26:30 domarius-endeavouros kernel:  ? irqentry_exit_to_user_mode+0x2c/0x1b0
May 30 17:26:30 domarius-endeavouros kernel:  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
May 30 17:26:30 domarius-endeavouros kernel:  </TASK>
May 30 17:26:30 domarius-endeavouros kernel: INFO: task joplin:3094 blocked for more than 245 seconds.
May 30 17:26:30 domarius-endeavouros kernel:       Tainted: P           OE      6.14.6-arch1-1 #1
May 30 17:26:30 domarius-endeavouros kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 30 17:26:30 domarius-endeavouros kernel: task:joplin          state:D stack:0     pid:3094  tgid:3094  ppid:3067   task_flags:0x400040 flags:0x00000006
May 30 17:26:30 domarius-endeavouros kernel: Call Trace:
May 30 17:26:30 domarius-endeavouros kernel:  <TASK>
May 30 17:26:30 domarius-endeavouros kernel:  __schedule+0x401/0x1350
May 30 17:26:30 domarius-endeavouros kernel:  schedule+0x27/0xf0
May 30 17:26:30 domarius-endeavouros kernel:  schedule_preempt_disabled+0x15/0x30
May 30 17:26:30 domarius-endeavouros kernel:  rwsem_down_write_slowpath+0x1e5/0x680
May 30 17:26:30 domarius-endeavouros kernel:  ? ____sys_recvmsg+0x96/0x210
May 30 17:26:30 domarius-endeavouros kernel:  down_write+0x5a/0x60
May 30 17:26:30 domarius-endeavouros kernel:  os_acquire_rwlock_write+0x2b/0x40 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv049912rm+0x10/0x40 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv051352rm+0x284/0x360 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv053381rm+0x54/0xd0 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv053306rm+0x185/0x470 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv051309rm+0x171/0x300 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv051310rm+0x5c/0x90 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv015000rm+0x23/0x30 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  ? get_page_from_freelist+0x33f/0x1640
May 30 17:26:30 domarius-endeavouros kernel:  _nv015022rm+0x4f/0x90 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv013437rm+0xc8/0x120 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv000728rm+0x60/0x70 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv000648rm+0x31/0x40 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  _nv000778rm+0x486/0xe00 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  rm_ioctl+0x7f/0x400 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  nvidia_unlocked_ioctl+0x528/0x8d0 [nvidia fefee9cf1f3102d1e82e2e2fe2b949b7e3997694]
May 30 17:26:30 domarius-endeavouros kernel:  __x64_sys_ioctl+0x94/0xc0
May 30 17:26:30 domarius-endeavouros kernel:  do_syscall_64+0x7b/0x190
May 30 17:26:30 domarius-endeavouros kernel:  ? __sys_recvmsg+0x9a/0xe0
May 30 17:26:30 domarius-endeavouros kernel:  ? __count_memcg_events+0xb0/0x150
May 30 17:26:30 domarius-endeavouros kernel:  ? count_memcg_events.constprop.0+0x1a/0x30
May 30 17:26:30 domarius-endeavouros kernel:  ? handle_mm_fault+0x1b6/0x2c0
May 30 17:26:30 domarius-endeavouros kernel:  ? do_user_addr_fault+0x36c/0x640
May 30 17:26:30 domarius-endeavouros kernel:  ? irqentry_exit_to_user_mode+0x2c/0x1b0
May 30 17:26:30 domarius-endeavouros kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
May 30 17:26:30 domarius-endeavouros kernel: RIP: 0033:0x7c0b6ae5fecd
May 30 17:26:30 domarius-endeavouros kernel: RSP: 002b:00007fffe391b380 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 30 17:26:30 domarius-endeavouros kernel: RAX: ffffffffffffffda RBX: 00007fffe391b580 RCX: 00007c0b6ae5fecd
May 30 17:26:30 domarius-endeavouros kernel: RDX: 00007fffe391b580 RSI: 00000000c0b8464a RDI: 0000000000000016
May 30 17:26:30 domarius-endeavouros kernel: RBP: 00007fffe391b3d0 R08: 00007fffe391b580 R09: 00007fffe391b594
May 30 17:26:30 domarius-endeavouros kernel: R10: 0000051401783880 R11: 0000000000000246 R12: 0000000000000016
May 30 17:26:30 domarius-endeavouros kernel: R13: 00007fffe391b594 R14: 0000000068395ca6 R15: 00007fffe391b3e0
May 30 17:26:30 domarius-endeavouros kernel:  </TASK>
May 30 17:27:38 domarius-endeavouros kernel: NVRM: Xid (PCI:0000:01:00): 119, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x20800a6a 0x0).
May 30 17:27:44 domarius-endeavouros kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c57e:6:0:0x00000065
May 30 17:27:50 domarius-endeavouros kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c57e:2:0:0x00000065
May 30 17:27:52 domarius-endeavouros kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c57e:6:0:0x0000ffff
May 30 17:27:53 domarius-endeavouros plasmashell[1107]: QRhiGles2: Context is lost.
May 30 17:27:53 domarius-endeavouros kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c57e:2:0:0x0000ffff
May 30 17:27:53 domarius-endeavouros plasmashell[1107]: Graphics device lost, cleaning up scenegraph and releasing RHI
May 30 17:27:53 domarius-endeavouros kwin_x11[1082]: kwin_scene_opengl: A graphics reset attributable to the current GL context occurred.
May 30 17:28:18 domarius-endeavouros electron[1320]: Failed to connect to proxy
May 30 17:28:43 domarius-endeavouros electron[1320]: Failed to connect to proxy
May 30 17:29:08 domarius-endeavouros electron[1320]: Failed to connect to proxy
May 30 17:29:33 domarius-endeavouros element-desktop[1320]: [1320:0530/172933.675015:ERROR:libnotify_notification.cc(50)] notify_notification_show: domain=400 code=24 message="Timeout was reached"

If your gpu is not on 60 Hz try that, and see if the crashes stop.

I think this is is a hardware problem, i see a lot of nvrm errors (memory errors of the gpu itself)

but to be sure do a new install on this hardware and test again, if it is the same your gpu is at fault.
Or try this gpu in another computer, if the crashes are moved to the other pc your gpu is at fault.

I would say it’s not a hardware fault because I boot into Windows to play games late into the night for several hours at a time, multiple times a week, I’m running the card a lot harder while I’m gaming, and I’ve NEVER had a crash from Windows. In Linux I can’t get more than 10 mins at the moment without a crash, and all I’m doing is documents and emails - and I’m laughing right now because this bloody thing crashed (in Linux) right in the middle of typing this post! Good thing it saves the draft!

Have you got adaptive sync on? Try turning that off.

Really? Come on man.

1 Like

Well that is indeed interesting i’ll admit.
Maybe in Linux it could be a driver issue? Those nvrm errors are coming from something else if it aint the hardware itself.

1 Like

Yes, really. I just finished another 2 hours of gaming in Windows, with no issues. In Linux, it crashes nearly every 10 mins just browsing the net or checking emails. Instead of asking me to “come on”, could you explain how this could still be a hardware issue that magically doesn’t happen when I switch to Windows and use more of the GPU?

My brother said he heard there was a recent Nvidia driver issue in Linux, and I think I saw some mentions of that in my searching. Maybe I should roll back the drivers (not sure how to do that yet).

There’s always someone with Nvidia driver issues somewhere. That’s no surprise, and not really anything new. Sometimes it’s the driver itself. Sometimes it’s a driver/Wayland issue.

As a side note: Just because something works perfectly in Windows does not necessarily mean that it will do the same in Linux. All you can do is troubleshoot the issue at hand.

1 Like

I understand there’s a non-zero chance that a hardware fault could be triggered in one scenario and not another, but we’re talking about the card being used the same way in both environments (just rendering the OS, browsing with Firefox in both cases) and in the environment it’s not crashing in, it’s actually being worked harder. I would expect a hardware fault to show up when more features of the card are being used, but it’s the other way around. And the fact that there’s “always Nvidia driver issues on Linux” feels like that’s the more likely possibility.

You may be right. But since your issue of a system crash every 10 minutes does not seem to be happening to everyone (or even many) users, I can only speculate that it’s something specific to your particular machine. Meaning it could be a driver, hardware, or something else.

If your “crashing every 10 minutes” was even remotely something common, it would be all over the web via searches.

EDIT: I know that doesn’t help with your issue. I’m just out of any ideas at all. If I were in your position and wanted to keep EOS, I’d just do a clean, fresh install. But that’s just me.

1 Like

It might be a small subset of users though. Not everyone with the specific combo of hardware and drivers will be complaining about it, or in this specific forum.

Anyway, shortly after I stopped posting here about it, I went through another few rounds of system updates - and lo and behold, I have not had any strange behaviour whatsoever since.

I just came back to say this since I figured it’s been long enough, about 1.5 months since I last posted & ran the updates, no issue.

I won’t flag my post as the solution though, because there is no concrete evidence of a cause. But for me this is pretty sufficient to just move on with my life.