Getting bit hard by the grub issue. Older hardware, ALL boot devices fail even after after applying the fix

I’ve had the grub issue on all 4 of my systems. On all 4 systems, I’ve run through the motions to chroot and fix grub. I’m extremely familiar with the process; I can do this :poop: blindfolded by now …

I’m dealing with it for a 5th time on a family’s machine, but this time it is behaving very differently. It’s older hardware (2nd gen Intel).

When the grub bug happened on this machine, ALL devices failed to boot, which means I can’t even chroot over live USB. If the EndeavourOS drive was connected to the motherboard, I couldn’t boot to any hard disks, any USB, or even DVD – it would go straight into BIOS no matter what. Because of these weird boot issues, I was suspecting corrupt firmware or something. Cleared CMOS, reinstalled UEFI, etc. Nothing working. I wanted to try wiping the disk, but I couldn’t boot into anything … every reboot jumped straight into the bios.

I detached the EOS disk from the system and hooked it up to an external USB HDD reader connected to my laptop, wiped it in gparted, then hooked it back up to the main system. Yay, I can boot to USB finally! I installed EOS to the freshly wiped disk from live ISO. That went fine. Yay!

Not so fast! Grub isn’t done #$!&ing with me yet …

After a fresh installation of the latest EOS ISO directly from the website that I downloaded today, the grub issue happened immediately on first reboot after the installer finished. I can once again no longer boot to HDD or USB or anything else.

I detached the hard drive again, plugged it back into my laptop again, chrooted again, and fixed grub… again… then plugged it back into the main computer. Yay! It finally works! I’m finally into my freshly installed, brand new EOS desktop.

You don’t think I got off that easy did you? The spawn of Satan isn’t done with me yet.

I rebooted freshly installed EOS to make sure everything was running smooth … and it goes right back to the #$!&ing BIOS. I can’t boot into to anything. AGAIN.

If I hook the EOS disk up to my laptop and go through the motions to fix grub again, it will work, and then it will stop working again when I reboot.

I don’t know what to do at this point. Has a similar problem been reported by others?

edit: you’ve got to be kidding me… now my laptop is booting into the grub command line

I switched over to systemd-boot and like magic all the grub issues went away. I didn’t need grub for any specific reason so it was easy for me to switch, but if you need grub for a certain reason (dual boot, grub-btrfs, etc), it may be a bit harder to switch, but there are other alternatives that offer less of a headache like systemd-boot, rEFInd, etc. that might be worth looking into if you haven’t already of course.

Just tried rEFInd … it’s doing the same thing now.

Looks like grub took down another. My issue sounds a little similar to what @limotux posted about here. Just like limo, the system was working just fine before, now can’t even boot to USB sticks. It’s doing something to cause this, and based on a sample size of two, it seems to be happening to older systems

I do not think that this is the same issue as what happens with grub that day …
Because if you run grub-install properly it will not happen again after it… only if versions of installed grub files and the grub package have change that would cause incompatibility… It could happen if you install outdated old ISO offline and update the system.

It could be an issue with the way you run grub-install may creating multiple entries in NVRAM and it is now full… or nvram has bad entries. what can cause fail to load bootloader (not overcome firmware loading)

Could it be that Grub is adding new entries to nvram but never removing any old ones, causing them to pile up? On the older PC mentioned in the original post (family’s computer), we had tested out multiple distros before settling on EOS for that system. I’ll need to check efibootmgr a little later to see what’s in there… but on my laptop, which started having new grub problems yesterday, I did notice that there are some old KDE Neon entries from when I installed that long ago but replaced with EOS.

The commands I run on every system to fix “the grub issue” are sudo grub-install --target=x86_64-efi --efi-directory=/boot/efi && sudo grub-mkconfig -o /boot/grub/grub.cfg via arch-chroot.

On my laptop, I fixed the grub issue a couple weeks ago, and it’s been working perfectly fine ever since. Then all of a sudden yesterday (presumably since an update?) I boot it up and it boots directly into grub rescue. Any attempt to load EOS it reboots back to UEFI. When I try to boot EOS from grub rescue (set root=(hd0,gpt1) && prefix=(hd0,gpt1)/boot/grub && insmod normal && normal), it boots directly into the UEFI as soon as I execute the normal command.

On BOTH of these machines, I can chroot and run the “grub fix”, get into the operating system exactly one time, and the next time I reboot the OS it goes into bios. Both machines, same exact issue.

What logs would be helpful to diagnose, aside from efibootmgr -v?

Series of events for both systems, for the record.

“Old pc”:

  • Tested out a few distros
  • Finally settled on EOS
  • Running fine for a few weeks, updates no problem
  • (unrelated) An Nvidia driver update broke the system, tried a bunch of things to fix it
  • During troubleshooting, eventually decided to reinstall EOS
  • Installed EOS from latest ISO (downloaded yesterday, installed w/ online method)
  • Rebooted after installation from live ISO, and it boots into bios
  • Chroot, fix grub issue, and it boots back into EOS just fine
  • Reboot EOS, it boots back into bios - rinse and repeat

Laptop:

  • Installed two years ago
  • Got hit with grub issue 2 weeks ago
  • Chroot, fix grub issue, everything is working fine for weeks
  • Boot up yesterday, goes directly to grub rescue
  • Chroot, fix grub issue, and it boots back into EOS just fine
  • Reboot EOS, it boots back into bios - rinse and repeat

The only difference between the two systems, is that the “Old PC” is entirely unable to boot from any device - including USB - so I have to transplant the hard drive to a different machine to chroot and fix grub.

this can give a hint… and you can remove unused entire with it too …
NVRAM is indeed limited in size. And they can wear out too.

I’ll get logs soon, thanks. Will also see about removing some unneeded entries.

Since nothing changed with my laptop’s boot entries in the last two weeks, the nvram suddenly filling up yesterday for no apparent reason makes no sense. Coincidentally happening on the same day, at the same time, that I’m troubleshooting a different system for the identical boot issue. The possibility that both systems’ nvram filling up beyond capacity, or wearing out on exactly the same day, causing exactly the same boot issue, also seems ludicrous. Something more is definitely going on… just not sure what yet. :thinking:

it could be also the one little thing you have done in the same way on each of the machines.
Because there is no general issue causing this for all devices.

But i see that caused by the past grub issue users tinkering a lot with the firmware/bios and nvram usually no one do this that intensive … And this was causing strange issues all over the place, like resetting Setup for CSM/legacy-boot, secure-boot per example… or in a lot of cases adding additional entries into the device menu in the nvram, render it unusable or having a default boot entry that was not working e.t.c. e.t.c
Some users were writing grub to the MBR of a gpt drive too which can cause issue too, or render the ESP fat32 partition unstable…

And i a lot of cases the solution is way more simple compared to find it in the first place … sad but from experience. .

It could be also the Bios-Battery you ever changed that one? not all Motherboards have one like notebooks p.e. and they have it hardwired into the nvram block… but if you could give the info about the boards? i could check that and research more specifically for your hardware …

inxi -Fxxc0z | eos-sendlog