Full transparency on the GRUB issue - Updated 2022-08-29

A post was merged into an existing topic: Grub 2:2.06.r322.gd9b4638c5-1 won’t boot and goes straight to the BIOS after update

I hope this is an anomaly and we can still keep using grub. This is probably a con of using a bleeding edge distro. This time the bleeding is huge and there is no simple solution. I hope that important things like grub in the future aren’t rolled out to the normal repos without extensive testing in the testing repos with several users giving feedback. I just think this issue is somehow overlooked by arch devs. I saw that the maintainer of grub is in charge of 218 packages in arch repos. I think important packages like grub have to be tested more rigorously before putting them in stable repos.

A small list of packages this dev is responsible for:

archlinux-keyring, dhcp, cronie, curl, git, gimp, intel-ucode, libssh, parted, rsync, virtualbox, xorg-server, yad
5 Likes

There are ~13,000 packages in the Arch repos. How many packagers do you think there are? 218 packages isn’t even that many for one of the packagers to be handling compared to the total number.

Packages are built into the testing repos and given time to be tested, if no issues are reported, they get moved forward. With a rolling distro, I don’t think you can expect more than that.

3 Likes

That’s fair. So grub is a thing we use that covers everything and let’s us down a lot. It’s like your jack of all trades uncle that isn’t really good at anything other than showing up.

Regardless, I meant the rest of it. :grin:

Imagine being the haskell maintainer :wink:

4 Likes

I’m in the fortunate position that I hadn’t updated before seeing the reports of this. I’ve now updated, run grub-install and rebooted successfully :sweat_smile: (It might be that my system wasn’t affected, but I didn’t intend on finding out).

This is close to a nightmare scenario - an upstream bug that causes systems to fail to boot and that is difficult to replicate consistently. Thank you to @dalto and all the :enos: team for all your work on this so far.

:enos_flag:

2 Likes

dalot ,

there is also some strange lines on boot UEFI , see carefully with efibootmgr

before install grub 322 and apply command grub-install and reboot ( this on UEFI , no secure boot - AMD - Gigabyte )

it starts with

sudo efibootmgr
BootCurrent: 0000
Timeout: 1 seconds
BootOrder: 0000,0006,0007,0008,0009,0005,000A,000B,000C,000D
Boot0000* manjaro	HD(1,GPT,1a168083-0494-48f8-8235-3b4308a4bb4a,0x800,0x64000)/File(\EFI\MANJARO\GRUBX64.EFI)
Boot0001* UEFI OS|HD(1,GPT,b0920a5a-88ab-405d-a405-62ac77736cb5,0x800,0x100000)/File(\EFI\BOOT\BOOTX64.EFI)
Boot0002* UEFI OS HD(1,GPT,bb9434bc-d41f-f842-9828-86f7cac2f09b,0x22,0x10fde)/File(\EFI\BOOT\BOOTX64.EFI)
Boot0003* UEFI OS HD(1,GPT,4f593557-5e27-47c1-8e42-5506db7268f8,0x800,0x101000)/File(\EFI\BOOT\BOOTX64.EFI)
Boot0004* UEFI OS HD(1,GPT,d7598c43-0200-4880-9d6d-25d95fec6981,0x800,0x101800)/File(\EFI\BOOT\BOOTX64.EFI)
Boot0005* UEFI OS HD(1,GPT,1a168083-0494-48f8-8235-3b4308a4bb4a,0x800,0x64000)/File(\EFI\BOOT\BOOTX64.EFI)

from my side , UEFI motherboard wants always have “UEFI …” string at the beginning , if not present , it create double entries with this begining name ( may be for failback )

after reboot and create with id grub name , i get:

Timeout: 1 seconds
BootOrder: 0000,0006,0007,0008,0009,0005,000A,000B,000C,000D
Boot0000* manjaro	HD(1,GPT,1a168083-0494-48f8-8235-3b4308a4bb4a,0x800,0x64000)/File(\EFI\MANJARO\GRUBX64.EFI)0000424f
Boot0001* UEFI OS|HD(1,GPT,b0920a5a-88ab-405d-a405-62ac77736cb5,0x800,0x100000)/File(\EFI\BOOT\BOOTX64.EFI)0000424f
Boot0002* UEFI OS HD(1,GPT,bb9434bc-d41f-f842-9828-86f7cac2f09b,0x22,0x10fde)/File(\EFI\BOOT\BOOTX64.EFI)
Boot0003* UEFI OS HD(1,GPT,4f593557-5e27-47c1-8e42-5506db7268f8,0x800,0x101000)/File(\EFI\BOOT\BOOTX64.EFI)0000424f
Boot0004* UEFI OS HD(1,GPT,d7598c43-0200-4880-9d6d-25d95fec6981,0x800,0x101800)/File(\EFI\BOOT\BOOTX64.EFI)0000424f
Boot0005* UEFI OS	HD(1,GPT,1a168083-0494-48f8-8235-3b4308a4bb4a,0x800,0x64000)/File(\EFI\BOOT\BOOTX64.EFI)0000424f
Boot0006* EndvQtil	HD(1,GPT,d7598c43-0200-4880-9d6d-25d95fec6981,0x800,0x101800)/File(\EFI\ENDVQTIL\GRUBX64.EFI)
Boot0007* EndvI3	HD(1,GPT,b0920a5a-88ab-405d-a405-62ac77736cb5,0x800,0x100000)/File(\EFI\ENDVI3\GRUBX64.EFI)
Boot0008* EndvXfce	HD(1,GPT,bb9434bc-d41f-f842-9828-86f7cac2f09b,0x22,0x10fde)/File(\EFI\ENDVXFCE\GRUBX64.EFI)
Boot0009* EndvMate	HD(1,GPT,4f593557-5e27-47c1-8e42-5506db7268f8,0x800,0x101000)/File(\EFI\ENDVMATE\GRUBX64.EFI)
Boot000A* UEFI OS	HD(1,GPT,b0920a5a-88ab-405d-a405-62ac77736cb5,0x800,0x100000)/File(\EFI\BOOT\BOOTX64.EFI)0000424f
Boot000B* UEFI OS	HD(1,GPT,bb9434bc-d41f-f842-9828-86f7cac2f09b,0x22,0x10fde)/File(\EFI\BOOT\BOOTX64.EFI)0000424f
Boot000C* UEFI OS	HD(1,GPT,4f593557-5e27-47c1-8e42-5506db7268f8,0x800,0x101000)/File(\EFI\BOOT\BOOTX64.EFI)0000424f
Boot000D* UEFI OS	HD(1,GPT,d7598c43-0200-4880-9d6d-25d95fec6981,0x800,0x101800)/File(\EFI\BOOT\BOOTX64.EFI)0000424f

i create 6 , 7 , 8 , 9 with grub-install
it create A, B , C , D entry boot after reboot by UEFI motherboard

also where comes theses all 0000424f hexa code here

there is too much change version not fully tested git savannah grub since 2.06 ( 15 months )
see commits
https://git.savannah.gnu.org/cgit/grub.git

Something I haven’t seen mentioned yet that I think worth mentioning: interaction of this bug with encrypted intramfs:

My L9520 had encrypted initramfs. I got the “back to firmware” after entering the correct password.

It so happens that the reason I had taken it home was precisely to reinstall with an unencrpyted initramfs (but otherwise encrypted /, of course — I want it that way for hardware acceleration, keymap hook, etc; and I fear no evil maid). So I didn’t troubleshoot and just reinstalled. And it worked. I checked, i have r322 on the new system.

What’s strange to me is that some posts I’ve read implied that people having this problem had an unencrypted system… so it seems that even given specific hardware, it’s not easy to predict if you’ll have problems, as it’s very configuration-specific.

That makes sense. The failure occurs when it is processing your grub.cfg. It can’t read that until it decrypts your volume. So it asks for the password, decrypts and then fails.

The issue doesn’t seem to be related to encryption.

Ok, but then why does it not fail now that I have an unencrypted /boot ? Ohhhh I think I just answered my own question: because the re-installation performed grub-install with the new version, whereas the update didn’t?

Yes, exactly.

1 Like

I just installed rEFInd when this happened. I had been meaning to anyway. last time i tested it it just gave an out of range error or smth like that.

I just got bit by this myself out of the blue but I blamed myself because I went mucking around with Secure Boot in my BIOS! :joy: I had applied the updates first, rebooted, then went in the BIOS to double check something. Once there, I noticed XMP wasn’t loaded and fiddled with the Fast Boot/Secure Boot section. After saving the BIOS and rebooting, I got a black screen. Rebooted and pressed F11 to boot override and got a text based Grub error that efibootx64 couldn’t be found. Went back into BIOS to undo everything I did and grub kept chainloading back into my BIOS! :unamused:

Whipped out the USB stick to reinstall and noticed the link to the forums on the live session’s greeter. Saw the topic on Grub and the thread on how to arch-chroot into my SSD to reinstall. That was painless as all hell. Saved me a lot of time as I prepared to slink back to Debian or Mint. The documentation here kept me on Arch! :+1:

7 Likes

Seriously, the speed at which the community here posted instructions for a workaround is nothing short of amazing. Normally, an update that breaks a system so badly would make me hate the OS, but the fact that I could fix this by myself (well, following instructions anyway), and that I learned a few new things in the process, makes me love Linux and EOS even more!

11 Likes

Oh, I agree, the local folks were amazing in their response.

3 Likes

For some strange reason the problem didn’t come right after the update, but a few days later. Due to the urgency I restored via timeshift, which gave me more problems than I expected, it messed up the bios entries a bit and at boot it gave me “restore sparse file not allowed at boot”, but at least there was an actually working voice. Finally, after the system started, I fixed it with “grub-install”.

Yes! Unless it happens at an inconvenient moment, I usually quite enjoy fixing my system. Almost every time I learn something new.

3 Likes

Thanks for the information, @dalto, and thank you for your work troubleshooting GRUB xx5-1, xx5-2. :smiley:

1 Like

You know, when I read the bug report on the GRUB issue, it appeared that it is a unique failure–this failure has not happened before because additional calls were added or changed. The fact that it apparently did not occur in Testing is that it appears the testers did not have the same setups or didn’t use GRUB at all.

I’m just glad a solution was provided before I install Arch on this machine. I still use GRUB and would have deemed it a failure on my part, had it happened booting a brand new install. LOL! :smiley:

The usual suspect for me used to be networkmanager. :rofl:

1 Like

It shouldn’t happen in that case. If you install grub new it should work fine. The issue occurs because of mismatched grub versions between the version on your system and what you had when you ran grub-install. On a new install, you would get the new version in both places.

5 Likes