Not sure what’s happening in the last two weeks or so, but suddenly half of my reboots lead me to a black screen and stays unresponsive until I do a hard reboot. Sometimes it only makes it to the screen asking me which kernel I want to load, and the other times it gets as far as asking me to enter my encrypted drive password. The only solution I have is to keep rebooting, and suddenly it works again after a few tries. I cannot think of anything I would’ve done as a user to cause this.
Is this experienced by anyone else? Is it a known issue? What can I do about it?
Do a lsblk and look which drive has the root (MOUNTPOINT /) partition. This drive at the top of the tree (not the partition itself) is the system drive. Use this (/dev/sda, /dev/nvme0n1 or whatever it is)
As @emk2203 seems to suspect, this may be caused by a disk problem.
So I suggest you backup all important data you have on that disk to an external drive as the first thing.
A good SSD brand, brand-new, and the only errors with Invalid Field in Command looks like the SSD is healthy but there are things happening in the system which cause problems.
Faulty RAM? Faulty connector or controller? Difficult to say. Run memtest86+ from the boot menu to be sure that the RAM is OK.
One further avenue to explore could be nvme self-test-log /dev/nvme0 and try to see the errors in more detail. Don’t know if this proves to be insightful for all the work.
It’s sudo nvme device-self-test /dev/nvme0n1 -s 1
followed by sudo nvme self-test-log --dst-entries 1 -v /dev/nvme0
And if you post again, post diagnostics as text, not image.
If I would be you, I would swap out the SSD to a different system and also put a different SSD in the problem system. And then test both system under load.
I’ll have to find the time to do all this as I am super busy with work, but I can say for sure that all my equipment is brand new, so I am surprised if suddenly there’s an issue. I mean it’s possible, but what I’m getting at is it is all brand new reputable name brand hardware. I could be wrong, but I suspect it’s something to do with the OS and an update at some point. The reason why I say this is that I had a similar issue previously that magically disappeared, and now it appears to have resurfaced but worse than before.
What do you mean by ‘anything else’? Windows only? On the system level we are talking about, there shouldn’t be a difference between Linux distributions if they run a similar kernel. The nvme access is always the same driver.
Linux and Windows are different, though. Windows is better in hiding flaws in the system.
For the sake of knowing for sure, I am going to install a different distro for a day and run several reboot tests to see if I can replicate it. If it doesn’t happen, then wouldn’t it mean it’s not the hardware?
Turns out you were right. I swapped both the SSD and the RAM at the same time and that problem is now gone. Only now I am unsure if the problem was the SSD or the RAM. Which do you think it most likely was? Thanks for your help btw.
I had the same happening to me years ago with a new build, and it was the RAM. I had issues surfacing only after some hours of use, so somewhat different, but I would try RAM first.
There’s a reason why manufacturers make compatibility lists for RAM but rarely for other components.
Nevermind. I spoke too soon. It suddenly happened again today. I’m going to RMA this Intel NUC. Well at least you helped me figure out it’s faulty while still within the warranty period. Hopefully it is actually an issue with it and it doesn’t also happen with a replacement unit.