Frequent Crashing over Months

dcl · April 5, 2024, 2:17am

Good day all! I have finally thrown in the towel and decided that I need to ask for help, because what I’ve been doing has resolved a whole lot of nothing.

I am currently experiencing three types of “crashes” that almost always occur while I’m playing games, but the games themselves don’t appear to be the ultimate cause of the crash. The one that irritated me the most from a few months back was Starfield, and I poured over tons and tons of forums and articles and the like trying to find suggestions. I don’t remember much of what I tried back then to get it working, but ultimately it never fully worked. Sometimes I could get a good six hours in a single sitting, other times it’d crash within 20 minutes of launch. I did usually have htop going at the time and I did not notice anything significant while playing.

Most recently, the game has been Warhammer 40K: Rogue Trader. I have been playing four 40 hours or so over the last three to four weeks. I did not have an issue until about three days ago. Ever since then, I can’t go more than an hour in game without a crash.

The crashes take three different forms:

The most common lately begins randomly and starts with an audio and video stutter. Audio and video only plays for a few milliseconds every second, and it’s as if the audio and video are both paused in between these stutters. It will last anywhere from 30 seconds to 5 minutes. Sometimes I have enough time to save and exit, other times the stutter is so bad and ends so quickly, that I can’t react. After, the screen goes black, and the computer restarts.
Another very common crash that occurs happened twice in the fourth set of logs below. The screen will freeze but audio continues playing normally, and after between 15 seconds and 3 minutes, the screen will go completely black (or will flicker a few times before going black), and then I’m back to my LightDM login screen. This usually will repeat itself until I restart.
The least common crash I’ve seen more than once happens similar to the second, except rather than the screen just being frozen, the screen distorts and odd colored pixels cover the screen. When this happens, it never recovers, and I have to manually shut off the computer via the power button.

I don’t know what to do. I’m at the absolute end of my rope and I think I’m looking at having to compromise my morals and return to that awful, awful Microsoft OS, or purchase a new computer (which I certainly don’t have the funds for). I have included a ton of logs below, but if there’s anything else I can do and provide, please let me know! Any help is greatly appreciated, I will be forever in your debt for helping me resolve this beyond frustrating issue!

Hardware: https://0x0.st/Xioa.txt
JournalCTL: https://0x0.st/XioM.txt
A full print out of all of the logs from the log tool after the crash today: https://0x0.st/XioM.txt
Another full print out from another day filled with crashes, I can’t recall the exact date: https://0x0.st/XioM.txt
And a few of what I believe to be relevant logs from a while back that the same issues were occurring on: https://0x0.st/XioM.txt

anthony93 · April 5, 2024, 3:57am

Apr 04 21:39:45 dcl-endeavouros kernel: sr 3:0:0:0: [sr0] tag#10 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Apr 04 21:39:45 dcl-endeavouros kernel: sr 3:0:0:0: [sr0] tag#10 Sense Key : Not Ready [current] 
Apr 04 21:39:45 dcl-endeavouros kernel: sr 3:0:0:0: [sr0] tag#10 Add. Sense: Medium not present - tray closed
Apr 04 21:39:45 dcl-endeavouros kernel: sr 3:0:0:0: [sr0] tag#10 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
Apr 04 21:39:45 dcl-endeavouros kernel: I/O error, dev sr0, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
Apr 04 21:39:45 dcl-endeavouros kernel: sr 3:0:0:0: [sr0] tag#11 unaligned transfer
Apr 04 21:39:45 dcl-endeavouros kernel: I/O error, dev sr0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 0
Apr 04 21:39:45 dcl-endeavouros kernel: Buffer I/O error on dev sr0, logical block 0, async page read
Apr 04 21:39:45 dcl-endeavouros kernel: Buffer I/O error on dev sr0, logical block 1, async page read
Apr 04 21:39:45 dcl-endeavouros kernel: sr 3:0:0:0: [sr0] tag#12 unaligned transfer
Apr 04 21:39:45 dcl-endeavouros kernel: I/O error, dev sr0, sector 2 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 0
Apr 04 21:39:45 dcl-endeavouros kernel: Buffer I/O error on dev sr0, logical block 2, async page read
Apr 04 21:39:45 dcl-endeavouros kernel: Buffer I/O error on dev sr0, logical block 3, async page read
Apr 04 21:39:45 dcl-endeavouros kernel: sr 3:0:0:0: [sr0] tag#13 unaligned transfer
Apr 04 21:39:45 dcl-endeavouros kernel: I/O error, dev sr0, sector 4 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 0
Apr 04 21:39:45 dcl-endeavouros kernel: Buffer I/O error on dev sr0, logical block 4, async page read
Apr 04 21:39:45 dcl-endeavouros kernel: Buffer I/O error on dev sr0, logical block 5, async page read
Apr 04 21:39:45 dcl-endeavouros kernel: sr 3:0:0:0: [sr0] tag#14 unaligned transfer
Apr 04 21:39:45 dcl-endeavouros kernel: I/O error, dev sr0, sector 6 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 0
Apr 04 21:39:45 dcl-endeavouros kernel: Buffer I/O error on dev sr0, logical block 6, async page read
Apr 04 21:39:45 dcl-endeavouros kernel: Buffer I/O error on dev sr0, logical block 7, async page read

These entries hints at a faulty drive. Are you using a CDROM device? Because that’s usually what /dev/sr0represents.

Can you post the output of

$ lsblk

dcl · April 5, 2024, 4:29am

No problem! Here is that output.

NAME          MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda             8:0    0   1.8T  0 disk 
└─hdd-hdd--lv 254:1    0   3.6T  0 lvm  /hdd
sdb             8:16   0   1.8T  0 disk 
└─hdd-hdd--lv 254:1    0   3.6T  0 lvm  /hdd
sdc             8:32   0 465.8G  0 disk 
└─ssd-ssd--lv 254:0    0 931.5G  0 lvm  /ssd
sdd             8:48   0 465.8G  0 disk 
└─ssd-ssd--lv 254:0    0 931.5G  0 lvm  /ssd
sr0            11:0    1  1024M  0 rom  
nvme0n1       259:0    0   1.8T  0 disk 
├─nvme0n1p1   259:1    0    16G  0 part [SWAP]
├─nvme0n1p2   259:2    0   500G  0 part /
├─nvme0n1p3   259:3    0   1.3T  0 part /home
└─nvme0n1p4   259:4    0   512M  0 part /efi
nvme1n1       259:5    0 465.8G  0 disk 
└─nvme1n1p1   259:6    0 465.8G  0 part /nvme

I do have a DVD (or maybe Blu-Ray? can’t remember) drive but there’s nothing in it at the moment.

In total, I have two NVMe drives, two SSD’s (that I have in a LVM) and two HDD’s (also in a LVM). I’ve never actually paid much attention to that sr0…

anthony93 · April 5, 2024, 4:31am

Does the crash still occur if you physically disconnect the CDROM device from the system?

dcl · April 5, 2024, 4:32am

I’ve never tried! Didn’t ever think the CDROM would be causing problems. I’m about to head to bed for the night, but I’ll disconnect it once I shut it down, then test it tomorrow after work and report back. Thanks so much for the quick response!

dcl · April 6, 2024, 12:21am

I thought that, perhaps, you’d nailed it on the first try, but after about two hours of gameplay, the second type of crash (screen freeze, to black screen, to login screen) occurred. Here’s what I think are the relevant logs from the log tool!

https://0x0.st/XiAP.txt

I have not restarted yet, so hopefully all other relevant logs that you may need I will still have readily available.

ricklinux · April 6, 2024, 1:32am

@dcl
I would at least update the UEFI Firmware (Bios) to version 10.10 There is also a Beta version after that one for 2024. That’s where i would start. Then what else did you set up and what packages for the gpu besides running on the amdgpu? Do you have vulkan-radeon installed. Have you set up hardware acceleration?

https://wiki.archlinux.org/title/AMDGPU

dcl · April 6, 2024, 1:39am

I never thought I’d get the ricklinux responding to my post!

I will work on getting those BIOS updates running now. I’ve had very bad experiences with BIOS updates in the past, so I tend to avoid them like the plague.

As far as what I’ve set up, I haven’t done a whole lot of any tinkering. I recall I added in something in the xconfig I think to stop screen tearing, but beyond that, most stuff should be pretty close to the standard install. I have not set up hardware acceleration yet (or if I did, I don’t remember it).

Packages installed related to anything AMD (obtained by using grep to find anything related to “amd”, “radeon”, “mesa”, and “vulkan”:

amd-ucode 20240312.3b128b60-1
lib32-amdvlk 2024.Q1.3-2
xf86-video-amdgpu 23.0.0-2
lib32-vulkan-icd-loader 1.3.279-1
lib32-vulkan-radeon 1:24.0.4-2
vulkan-headers 1:1.3.279-1
vulkan-icd-loader 1.3.279-1
vulkan-radeon 1:24.0.4-2
vulkan-tools 1.3.269-1
lib32-mesa 1:24.0.4-2
mesa 1:24.0.4-2
mesa-utils 9.0.0-3

ricklinux · April 6, 2024, 1:46am

The bios update should be very straight forward using instant flash.

https://www.asrock.com/support/BIOSIG.asp?cat=BIOS10

dcl · April 6, 2024, 2:26am

Well, either it was not as simple as ASRock suggests, or that BIOS version is incompatible with my hardware. The computer is not turning on at all now…

dcl · April 6, 2024, 2:47am

To clarify in more specific, technical terms… The computer is receiving power, the motherboard receives power (the LEDs all come on), the fans all start spinning, but it never enters BIOS. After about 20-30 seconds, everything shuts off, and then it tries to boot again after about 5 seconds.

That MoBo had a built in display that will display error codes (I think its called Dr. Debug), but it stays off for the entire process.

I have tried clearing the CMOS, that was about all i could think of to try and fix it.

ricklinux · April 6, 2024, 3:01am

That’s not good.

Edit: Not sure why that would happen. I have done Bios updates for many years and never ever have i had a failure or an issue. On my Ryzen i have done 14 updates since this board was put out. I’ve also done 13 on my Intel board. Strange.

Edit: This is not what i was expecting. It should be a simple update process.

ricklinux · April 6, 2024, 3:13am

Did you disconnect the power when doing this?

ricklinux · April 6, 2024, 3:17am

Are you sure you let it finish the Bios update? It should have automatically shutoff and rebooted when flashing and completed then reboot to the bios screen or desktop. Hopefully you didn’t interrupt the flashing process?

Edit: I guess my only suggestion would be to disconnect the hard drives and ssd drives and try booting it.

anthony93 · April 6, 2024, 3:57am

Unfortunately, none of the logs provided are helpful. What we need is the journal entries.

If you haven’t rebooted,

$ journalctl -b0

If you’ve rebooted once,

$ journalctl -b1

dcl · April 6, 2024, 4:02am

I did let it go all the way through the flash process, power was never disconnected. The instant flash process went all the way to 100%, it gave me a message that it was completed and was ready to restart. Upon restart, that’s when it began the process described above.

dcl · April 6, 2024, 4:03am

In my original posting, i did include my journal logs. I did not grab them from this time, and now unfortunately the computer is considerably more dead in the water.

ricklinux · April 6, 2024, 4:06am

Sorry this has happened. I don’t know what to think. Normally the updates should not cause this.

Edit: Normally the Bios updates support newer processors and provide a new amd agesa and other updates. I did some further digging on their site (Asrock) and found it looks like the X470 boards they don’t show support for all the Bios versions. It’s only when you get into the X570 versions and some others.

https://www.asrock.com/support/cpu.asp?s=AM4&u=614

There should be a way to flash it to a previous version. The chart shows it only has support for a lower version than what you had also. I’ve never had this on any motherboards I’ve ever used. They always support the latest Bios.

ricklinux · April 6, 2024, 4:30am

Maybe you can contact Asrock and ask them if this Bios version isn’t compatible how you can recover or flash to an older version?

https://tw.asrock.com/events/tsd.asp

Jonodros · April 6, 2024, 6:54am

This is definitely odd since I updated about 100 BIOSes at work the past year. None ever had a failure or issue except a few old models that failed to boot because BIOS settings reverted back from UEFI to legacy boot or vice-versa. Easy fix by accessing BIOS and turning setting back on. I’ve personally had a hiccup with my Asus motherboard where secure boot was turned on after BIOS update when settings reset. Also easy fix by changing setting. Maybe check that your BIOS boot settings are correct.

Also, did you exclude temp and hardware issues? Try running a memtest and stress test if you can manage to get your PC started up again.