Diagnose and fix update issue

Hey all,
last update completely destroyed my system, and I’m not completely sure that is the 5.15 kernel because I’ve also installed the LTS (that during this update remains at 5.10.xxx) and with that also I’ve the same issue.
Basically when I reboot after the update systemd fail to load

systemd-modules-load.service

and my system doesn’t start. It asks to insert root password and throw me in a root shell, from which I can actually do stuff, and also switch to my user, but in which I have no internet and no way to start it (all the interfaces expect to loop one aren’t there, probably correlated to the error mentioned above).

I revert to a previous snapshot and now I’m in my not-yet-updated-but-working system but I would like to investigate why this is happening (and also can be useful to someone else I think)

here a lot of log taken from both the working system and the update and broken one

I haven’t seen anything that guide me to a possible solution that is not revert and wait for a new point version to come out but, well, maybe some of you more expert will see something into that.

ip link state for working system

journalctl -xeu systemd-moduels-load working

journalctl -xb working

service status working

checkupdates output

ip status after update

journal log after update

another journal log after update

service status after update

There is no guide for an unknown issue. If you were able to revert and have a working system then? Updates don’t just normally destroy your whole system. If you had an update and you weren’t able to boot in both the lts kernel and the regular kernel then that is odd.

I know this is odd and also never happened to me before on eOS or Arch, that’s why I’ve posted it, to see if with the help of someone is possible to understand what’s the actual issue and eventually fix it.

Also is very reproducible on my machine, I’ve tried 3 times the update and every time finished with the same problem…I really don’t know what to think or check more then the logs I’ve already gave.

Did pacman show any errors or warnings during the update? Were any pacnew files created? You can check at /var/log/pacman.log.

no pacman errors, I’ve checked the second time around. Also the only pacnew I’ve saw was for mirrorlist, so nothing related I think.

Check for any failed services prior to update: systemctl --failed
Create a script to upgrade each package one by one and test. I’m guessing you’re already using btrfs or something similar, so use pacman hooks to create a pre/post snapshot after each package upgrade.

Hopefully this can narrow down the issue.

could you show your /etc/fstab?

only result is snapper-timeline.service loaded failed failed Timeline of Snapper Snapshots

# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a device; this may
# be used with UUID= as a more robust way to name devices that works even if
# disks are added and removed. See fstab(5).
#
# <file system>             <mount point>  <type>  <options>  <dump>  <pass>
UUID=AAA5-AE9A                            /efi      	 vfat    umask=0077 0 2
UUID=23539706-dc93-4182-bc08-9f77a17e75d8 /              btrfs   subvol=/@,defaults,noatime,space_cache,autodefrag,compress=zstd 0 1
UUID=23539706-dc93-4182-bc08-9f77a17e75d8 /home          btrfs   subvol=/@home,defaults,noatime,space_cache,autodefrag,compress=zstd 0 2
UUID=23539706-dc93-4182-bc08-9f77a17e75d8 /var/cache     btrfs   subvol=/@cache,defaults,noatime,space_cache,autodefrag,compress=zstd 0 2
UUID=23539706-dc93-4182-bc08-9f77a17e75d8 /var/log       btrfs   subvol=/@log,defaults,noatime,space_cache,autodefrag,compress=zstd 0 2
tmpfs  

could you try removing space_cache from the BTRFS entries ?
this is currently merged into BTRFS to be used per default and seems to cause issues if it is set in addition on fstab.

ok, I’ve removed it, so now update and reboot?

Will need that to test if this was causing the issue.
But no guarantee that it is the solution

no problem, will do it and report back

aren’t you loading new modules with LTS kernel instead of the current one? I would think that kernel modules should go with a kernel that comes with it instead of an older one ?! I don’t understand as a Tester you can use LTS instead of current as that is not testing new/updated software… If I was a tester I would be using current kernel with updated software as you are using a rolling distro. Also most people would be using ext24 not BTRF-S ?? what are you really testing??

Ehmm. Sorry but I didn’t understand what you mean with it. I’m having an issue with a new update and trying to understand what’s wrong with my system. As a lot of people actually using an lts along with the main one because generally when there are this kind of problems booting the lts instead of the main will solve it.

1 Like

There is so much wrong with this post that I don’t even know where to begin.

Let’s start with:

From the very first line in the first post of this thread:

You do understand what the word “also” means, right?

As the title beside his/her name clearly states, the user is a tester for the Dev ISO - the development version of what will be the next ISO that is released. The user may or may not be using the testing ISO for his/her main system. That is immaterial.

Perhaps you are not aware of what the next ISO will have included?

2 Likes

sadly didn’t solve the issue

If that would make sense to you, boot into live usb, chroot into your system and reinstall your kernels and their headers.

@ParanoidNemo
I would like to see the output of

sudo dmesg | eos-sendlog
inxi -Faz --no-host | eos-sendlog

Edit: I also agree with @pebcak

Edit2: I have btrfs set up and i still have the space_cache settings and i haven’t had an issue. I don’t typically ever run lts kernels unless i had to which is rare.

2 Likes

You do realize they ALSO update the LTS kernel, right?

And as testers, we generally try everything we possibly can.

Every installation (online) is always immediately installed with the must updated software available.

I have no idea what most people do. And to be honest I have no idea what a lot of what you were saying.

4 Likes

i think you got confused by the badge @ParanoidNemo have:
2021-11-16_21-42
This has nothing to do with this thread about an issue he has :wink: He is only part of the ISO testing group (part of development)

3 Likes