Kernel panic on cold boot

Alien · November 16, 2022, 10:44pm

Hey everyone, after using Debian-testing for a week and a half (my longest consistent period ever on a linux machine, I’m a new-comer who finally decided to make the transition from Windows).

Without further ado, while installing EndeavourOS (two days ago) I choose to install the LTS kernel during the setup (because I’m very afraid of Arch and Arch-based distros, the crazy amount of online posts I’ve read of people who moved away for unstability and PC’s not booting after updates and what not so I realized that maybe having the LTS kernel available is a good practice?)

On installation day, no problems at all apart from apart from SDDM hanging on shutdown (KDE Wayland) which I realized can be fixed by actually shutting down through the GUI.

Next day I get up to boot my PC and I get a kernel panic message, after looking online I came across a post where someone didn’t boot after connecting ethernet cable or something like that? So I looked at the connected devices to my PC that weren’t plugged the other day, I spotted an old wifi-adapter my brother was trying out so I thought maybe that’s it. Unplugged, restarted, PC works like a charm, it’s literally too good to be true.

Almost true. Later that same day (yesterday), I shut down my PC and went out for four to five hours or so, when I came back and fired up the PC, here it was, the kernel panic again. I just hanged on the screen searching through loads of various topics online, people are getting the same crash as me;

"initramfs unpacking failed: ZSTD-compressed data is corrupt"

but it’s the second part “ZSTD-compressed data is corrupt” that’s different for them.

Long story short, after two hours of surfing the web I stumbled upon a reddit post of someone on ArchLinux who had the same problem, a user asked them about a tool called “mkinitcpio” that I had no idea what it was so I looked it up, realized it’s related to create the initial ramdisk enviroment that supposedly handles stuff for init then I remember when I manually chose to grab the LTS on EOS installation so I was like maybe that’s it? Perhaps the installation forgot to use the tool in question? At this point I just hit the reboot button in my PC and once again, the PC started to life although I did see a glimpse of the error in the very beginning of package loading however, no kernel panic.

I ran the command

sudo mkinitcpio -P

after the computer started then shutdown my PC entirely, waited for 5 minutes or so trying to reproduce the problem, which sadly worked, the kernel panic wasn’t solved and I slammed restart once again my PC boots fine on the reboot.

My brain lurched at this point so I just went to sleep and woke up next day (today), went to college and didn’t touch my PC in the morning. However I was struck by an idea, maybe I should use the other kernel that comes preinstalled with EOS? Perhaps this fixed the problem entirely? Which I think it did for a single boot, then the problem reoccured but now it’s a different error (I think?).

5ec03a9f-7f9a-409b-93d4-8a96bd725b56
19606bf6-61f7-453c-927f-1cf72a46923d

The two images I uploaded should provide better information on the two errors I face currently whenever I do a cold boot, a reboot just starts the system no matter what kernel I pick. This is frustrating to be honest and I’d be extremely sad if it’s a hardware problem given that I bought my current PC build only a year ago or so so yeah. That’s it.

Any help is appreciated and thank you for making it this far into the post.

manuel · November 16, 2022, 10:59pm

Welcome to the forum!

If possible, use the eos-log-tool to show some logs (show the returned URL here).

Interesting logs would be

EndeavourOS install log
journal of the failing boot (the second latest boot)

for starters.

Alien · November 16, 2022, 11:08pm

Thank you for the fast reply.

https://0x0.st/oIW_.txt

pebcak · November 16, 2022, 11:12pm

The error message says something to the effect that the shared libraries libcrypt.so.2 cannot be found. This library is provided by the package libxcrypt.

I would use the live usb and chroot into the installed system.
Once in the chroot, I would update the system fully. And then run the mkinitcpio -P command.

Read through the articles linked to here below and choose the appropriate method to chroot depending on your boot mode (UEFI VS. Legacy/MBR) and file system.

https://discovery.endeavouros.com/?s=chroot

If you are uncertain how to go about to chroot, post the output of the following command and indicate what partition holds your system (root partition):

sudo parted -l

In the live session connect to Internet.

In chroot:

Update your system fully: sudo pacman -Syu
Rebuild initrds: mkinitcpio -P
Quit chroot: exit

and reboot. See if this will resolve the issue.

If you get any error messages at any step post it on the forum.

manuel · November 16, 2022, 11:44pm

Can you tell the partitions you used for EndeavourOS?
And if you used the same partitions you had with Debian, did you format them before installing EndeavourOS?

Alien · November 17, 2022, 12:05am

I went down the arch-chroot way, did everything in the guide and followed your steps then shutdown the live environment and waited for a while before starting the PC up once again. I still saw a glimpse of Z-STD corruption before the packages loaded but no kernel panic this time thankfully, hopefully tomorrow things will get clearer if kernel no longer panics.

Alien · November 17, 2022, 12:08am

I used the same hard disk and yeah, I did a complete wipe of it during the partioning part in EndeavourOS installation process. Below is my sudo parted -loutput.

Model: ATA ADATA SU650 (scsi)
Disk /dev/sda: 120GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End    Size   File system  Name  Flags
 1      2097kB  317MB  315MB  fat32              boot, esp
 2      317MB   120GB  120GB  ext4         root


Model: ATA ST9500325AS (scsi)
Disk /dev/sdb: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End    Size   File system  Name                Flags
 3      16.8MB  500GB  500GB  ntfs         LDM data partition


Model: ATA ST500DL001 HD503 (scsi)
Disk /dev/sdc: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End    Size   File system  Name                  Flags
 1      1049kB  500GB  500GB  ntfs         Basic data partition  msftdata

ATA ADATA SU650 (/dev/sda) is where my root directory is, the two other hard disks are both used for my own personal data storage, brought them along through the Windows to Linux transition journey.

Alien · November 17, 2022, 12:53pm

This solved it! No more kernel panics for me. I shutdown the PC after applying the fix proposed through arch-chroot and woke up next morning, the boot was fine and then I shut it immediately. Almost five hours later when I booted it it was fine but however let be known that the error

ZTSD-compressed data is corrupt

still appears but no kernel panics. I don’t know if the “problem” (whatever it was) is completely fixed while that error is still around so I don’t know if I should mark your post as solution yet, please elaborate more on that persistent error if you can.

KDen · November 17, 2022, 1:56pm

A quick google search on that ZTSD compressed data corrupt message seems to indicated it could be a lot of things. Several say it could be hardware. I didnt search very long, But, I saw a couple say it was RAM issues for them. Check you speed and timings. If its overclocked with xmp try taking it back to stock. Xmp has often had issues. Run a memtest. If you have more than one module try them one at a time in the correct slots.

pebcak · November 17, 2022, 4:37pm

I did also a quick search and from what I could find, I couldn’t come to an unambiguous conclusion as to what the root cause of this might be.

There is also this thread in the forum:

Admittedly it is related to another issue. However the error message is the same and the conclusion of the OP seems to point to RAM configuration.

Stephane · November 17, 2022, 5:48pm

may be openssl
see https://forum.manjaro.org/t/pacman-error-while-loading-shared-libraries-libcrypto-so-3/127001

Alien · November 17, 2022, 11:40pm

The beast strikes again. Same initramfs error and same missing file.

3f77f81a-815e-4192-8ce9-1f08b139ff60

petsam · November 18, 2022, 12:33am

IMHO, issues that happen only on cold boot, or only on reboot, have to do with the hardware.
Software (devs) may create some workarounds to deal with them, but hardware/firmware is where they are coming from.

Alien · November 18, 2022, 12:59pm

I just ran memtest86+ through a USB stick. I halted the test halfway because at 2 passes I had already scored 106+ errors so I guess this is the problem, although after searching online I realized it might not be a problem with the ram stick itself, perhaps the ram slot on the motherboard. I pray it’s dust accumulation and I’ll be cleaning my PC and re-running the test at a later time.

A question still hits me, why didn’t I run into any problems on Windows nor Debian either? Does it have to do with how Arch/Arch-based systems boot?

KDen · November 18, 2022, 1:35pm

Memory can go bad. I would think its the sticks before jumping to thinking it was the board. When I moved from Windows to linux I had something similar happen with getting random crashes. Memtested the memory and got a bunch of errors. It happens. In my case the memory couldn’t handle the built in XMP profile. So I disabled that and no more errors. So check the mem speed and timings first. If that doesn’t work go through them one at a time in the correct slot.

Alien · November 18, 2022, 1:52pm

Can you elaborate more on what XMP is exactly? Also you said “built in XMP profile”, does Linux natively over/underclock the memory?

KDen · November 18, 2022, 2:06pm

XMP is basically memory overclocking. You’d change it in the bios

ricklinux · November 18, 2022, 2:11pm

It’s basically a preset overclock setting of your memory based on what you have installed.

manuel · November 18, 2022, 2:42pm

Because updating system (sudo pacman -Syu) already helped, maybe some packages are still corrupted?

If so, then re-installing most packages, something like

sudo pacman -Syyu $(pacman -Qqen)

might help too. It will take some time…

petsam · November 18, 2022, 3:12pm

Yes (from personal experience). Not only Arch, but with Linux (kernel) in general AFAIK.