Unofficial zfs on root installation instructions

I do have it in VirtualBox :slight_smile: Will try.

Weird thing to mention: the system refuses to show me UUID of the ext4 FS for /boot :frowning:

So with current /boot on MD RAID1 consisting of one disk partition (now disconnected) and one MD RAID0 member it booted. There was some delay after reaching ZFS import target but eventually it asked for ZFS password and after commenting out sda’s /boot/efi in /etc/fstab in emergency mode it booted normally.

Yes, I’m not sure how the UUIDs are mapped either, given 1.2 metadata is at vol +4kB, I’m assuming file system super block starts maybe at +8kB, or starts at +|0B and is only 4kB in size, I did try and find that out a long time ago, but never did get a definitive answer I was happy with. So it seems with 1.2 metadata for the md device and the file system is somewhat comingled. When I have a boot device under md device, I select metadata=1.0 so it it at the end of the disk, and the boot loaded can at worst case pretend that the md device doesn’t exist, and just assume ext2/ext4, and it should work anyway, or at least enough to grub.cfg & initrd/kernel loaded.

Not sure what you mean by / boot on md1raid on 1 disk (discon) + one md0raid member?

seems like md1raid and md0raid are incompatible? or are you just using the md0raid partition as a chunk on disk, and not striping it anywhere else, so it isn’t really md0raid, just a partition? What does cat /proc/mdstat show?

So are you closer to having a working configuration with mirrored bpool?

Not sure what you mean by / boot on md1raid on 1 disk (discon) + one md0raid member?

2 HDDs in RAID0 form one member of RAID1 (/boot) on top of it and /dev/sda SSD is the second member of RAID1. /dev/sda is now disconnected so the RAID1 is now in degraded state:

[root@nbpg0603vm ~]# cat /proc/mdstat 
Personalities : [raid0] [raid1] 
md123 : active raid1 md127[1]
      1045504 blocks super 1.2 [2/1] [_U]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md124 : active raid1 md125[1]
      1045504 blocks super 1.2 [2/1] [_U]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md125 : active raid0 sdb3[1] sda3[0]
      1046528 blocks super 1.2 64k chunks
      
md126 : active raid0 sdb5[1] sda5[0]
      7330752 blocks super 1.2 64k chunks
      
md127 : active raid0 sda4[0] sdb4[1]
      1046528 blocks super 1.2 64k chunks
      
unused devices: <none>

I am not closer to having mirrored BPOOL consisting of one member being /dev/sda3 and the second one being RAID0 /dev/md/boot as this causes GRUB to fall into rescue mode. To correct prefix= value during every boot is not comfortable so I will probably stick to having /boot on MD RAID1 and not on mirrored bpool. They promised to release GRUB 2.11 the last year, GRUB 2.12 this year and the latest version avaliable is stil 2.06 heavily (and differently) patched by every distro maintainters…

I would need to do testing to be sure but I think there are some questionable assumptions here.

The first is that the faster disk will get a high enough percentage of the reads that performance will be minimally impacted. It will likely get a higher percentage of the reads but I suspect that percentage won’t be high enough to not have a substantial impact on performance.

The second is in relation to the read/write behavior in a desktop. I don’t know what the read/write ratio is exactly but modern applications and DEs often use a lot of disk caching. That will not only impact your read/write ratio, but it may impact the responsiveness of those applications when their writes are slow.

I have read it and according to his tests the latest I/O scheduler works (balances load) among VDEVs, not inside VDEVs, so this exercise was in vain :man_facepalming:. I will have to insert MD RAID1 between MD RAID0 and ZPOOL, so for /boot the ZPOOL is waste of effort (but I will try it anyway :slight_smile: )

Better to be better informed.

I would just setup bpoot to be mirrored across the ssd, and 1 or both hdds, triple mirrors work, and not like boot is IO intensive, but it will screw up your symmetry a bit. Better to keep the grub side of things as simple as possible, and your previous geometry was too complex. After the kernel has booted then you can get more complex, like your raid0 setup, although I still think you are putting a layer of sotware between zfs and the medium that it will not be expecting, and that, like hardware raid controllers, can lead to nasty surprises with data inconsistencies.

I have enough questions about the disparity in performance between NVME and a SSD/SATA mirror for my desktop, let alone SSD and HDD. But getting my desktop to dual NVME would require some adding another m.2 socket into the space, and some supporting passives, and some experience SMT soldering, so not something I’m going to try in a hurry, but I have seen pictures of it being done.

I was able to follow these instructions for a system over a year ago and it’s still running well. (ZFS encrypted root, however, was an unsuccessful time sink).

I’m trying to repeat it for a new system, and I’m lost on this thread and where things stand.

I’m at this same point as this. I tried following the thread from that point, but it seems like the exact issue at hand is still up in the air, and frankly I don’t follow a lot of the finer details. Are there working instructions right now?

I had some good success with setting up my system with ZFS on Root and using ZFSBootMenu as the bootloader. ZBM out of the box also enables using ZFS native dataset encryption for your data and system.

The critical trick I used that finally got me through installation issues was to ignore the ZFS-on-root support in the Calamares installer, do make the config modifications to include ZFS modules in the OS, install to a non-ZFS filesystem, set up the ZFS dataset scaffolding in a different location (according to ZFSBootMenu), then copy the installed OS to the ZFS datasets.

I will include my notes, but be warned that they are not refined, cleaned up, or guaranteed to work without some interpretation.

The rough sequence of steps are:

  1. Install ZFS in live environment
  2. Prepare Zpool disk
  3. Download ZFSBootMenu EFI and install to ESP
  4. Modify Calamares config to include zfs modules and service setup
  5. Install EOS to second disk
  6. Copy system files to Zpool disk
These are the rough notes of commands that were ran
sudo pacman -Syu --noconfirm # Takes too long but should be done in real scenario
sudo pacman -Syu zfs-dkms zfs-utils paru

# Format disk
sudo fdisk /dev/sda
	g   # Create GPT
	n   # New partition, enter +300M for size (last sector)
	t   # Change partition type, type 1 = EFI
	n   # New partition, use all default to fill rest of disk with Linux filesystem partition
	w   # Write changes and exit

sudo fdisk -l   # Verify changes

# Format partitions
sudo mkfs.fat -F 32 /dev/sda1

##### Zpool #####
# Load zfs kernel module if not already
sudo modprobe zfs
# This step has with and without compression and encryption
sudo zpool create -f -o ashift=12         \
             -O acltype=posixacl       \
             -O relatime=on            \
             -O xattr=sa               \
             -O dnodesize=legacy       \
             -O normalization=formD    \
             -O mountpoint=none        \
             -O canmount=off           \
             -O devices=off            \
             -R /mnt                   \
             -O compression=lz4        \
             -O encryption=aes-256-gcm \
             -O keyformat=passphrase   \
             -O keylocation=prompt     \
             zroot /dev/disk/by-id/_id-to-partition-partx_

sudo su
zfs create -o mountpoint=none zroot/data
zfs create -o mountpoint=none zroot/ROOT
zfs create -o mountpoint=/ -o canmount=noauto zroot/ROOT/default
zfs create -o mountpoint=/home zroot/data/home
# Simplifying by removing these
#zfs create -o mountpoint=/var -o canmount=off     zroot/var
#zfs create zroot/var/log
#zfs create -o mountpoint=/var/lib -o canmount=off zroot/var/lib
#zfs create zroot/var/lib/libvirt
#zfs create zroot/var/lib/docker

zpool set bootfs=zroot/ROOT/default zroot
# Create zpool cache
zpool set cachefile=/etc/zfs/zpool.cache zroot
# Copy cache to target system
mkdir -p /mnt/etc/zfs
cp /etc/zfs/zpool.cache /mnt/etc/zfs/zpool.cache

##### Bootloader #####
mkdir /boot
mount /dev/sda1 /boot
mkdir -p /boot/EFI/zbm

# There is a package for an EFI binary of zfsbootmenu, which I think could prevent us having to built it ourself, unless some sort of custom build options are needed in our case
# EFI package requires ESP to be mounted
paru -Sy zfsbootmenu-efi-bin
paru -Sy efibootmgr   # Creates boot entry in UEFI of mobo

efibootmgr -c -d /dev/sda -p 1 -L "ZFSBootMenu" -l '\EFI\zbm\zfsbootmenu-release-vmlinuz-x86_64.EFI'

zfs set org.zfsbootmenu:commandline="rw" zroot/ROOT

##### Initrd #####
vi /etc/dracut.conf.d/zfs.conf
# zfs is a self scrubbing and healing system, fsck is redundant and slows down boot
# no_fsck=yes
# add_dracutmodules+=" zfs "
# omit_dracutmodules+=" btrfs "
dracut --hostonly --no-hostonly-cmdline /boot/initramfs-linux.img


##### Enable auto import of pool
systemctl enable zfs.target
systemctl enable zfs-import-cache.service
systemctl enable zfs-mount.service
systemctl enable zfs-import.target
Calamares modifications

We want to add zfs support, enable the services

# modules/pacstrap.conf
<   - nano
>   - zfs-dkms

# modules/services..
   - name: "zfs.target"
     action: "enable"
   - name: "zfs-import-cache.service"
     action: "enable"
   - name: "zfs-mount.service"
     action: "enable"
   - name: "zfs-import.target"
     action: "enable"

Calamares Installation

Run installer, install to a second disk, bootloader steps can be ignored because we are using ZFSBootMenu and just copying system files.

Post Installation

Prepare ZFS destination and mount under /mnt
Mount other disk /targ
sudo rsync -aAXv --exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/lost+found"} /targ /mnt

mount --bind /proc /mnt/proc && mount --bind /sys /mnt/sys && mount --bind /dev /mnt/dev && mount --bind /run /mnt/run
arch-chroot /mnt
vi /etc/dracut.conf.d/zfs.conf
dracut --force --hostonly --no-hostonly-cmdline /boot/initramfs-linux.img 6.5.8-arch1-1
# fallback.img was not updated
vi /etc/fstab # remove entries
1 Like

Galileo update: Does not create ZFS filesystem, cannot install.

Log:
2023-12-23 - 14:58:14 [6]: … - details:
Create filesystem zfs on partition ‘/dev/nvme0n1p4’
Job: Delete file system on ‘/dev/nvme0n1p4’
Command: wipefs --all /dev/nvme0n1p4
Job: Create file system ‘zfs’ on partition ‘/dev/nvme0n1p4’

…and that’s it. Developers force users to use crystal ball.

These instructions are from Apollo. There have been a lot of changes since then and they probably don’t work anymore. They would need to be updated/rewritten for Galileo.

If I do not use manual partitioning and choose to wipe out entire disk with auto-partitioning according to partition.conf, it works. So I put the desired partition layout to partition.conf instead of using GUI.

After killing all gpg-agent processes and running zpool export -a before rebooting at the end of installation I also had to edit kernel line and remove the second root= option which uses UUID, then it booted :slight_smile:

Updated steps for installing Galileo using online installation method with root on encrypted ZFS:

  1. In terminal window get the linux kernel version of the EndeavourOS installation image:
uname -r

(mine version was 6.6.1-arch1-1) and download the kernel package from https://archive.archlinux.org/packages/l/linux.

  1. Install the package:
sudo pacman -U ./linux-6.6.1.arch1-1-x86_64.pkg.tar.zst

in my case.
It is needed for successful zfs-dkms installation.

  1. Then install the zfs-dkms package and load the zfs module:
sudo pacman -Sy zfs-dkms
sudo modprobe zfs
  1. Add zfs-dkms to the basePackages array in /etc/calamares/modules/pacstrap.conf:
basePackages:
  - zfs-dkms
  - base
  - base-devel
  1. Extend availableFileSystemTypes array in /etc/calamares/modules/partition.conf with zfs:
availableFileSystemTypes:  ["ext4","btrfs","zfs"]

and update your desired partition layout (manual partitioning is broken for ZFS).
(I updated:

efiSystemPartition:     "/boot/efi"

efiSystemPartitionSize:     512MiB

efiSystemPartitionName:     EFI

and

swapPartitionName:      SWAP

and

partitionLayout:
    - name: "EOSBOOT"
      filesystem: "ext4"
      mountPoint: "/boot"
      size: 1GiB
    - name: "EOS"
      filesystem: "unknown"
      mountPoint: "/"
      size: 100%

(see https://github.com/calamares/calamares/blob/calamares/src/modules/partition/partition.conf)).

  1. Add zfs to the exec section of /etc/calamares/settings_online.conf between partition and mount:
- exec:
  - hardwaredetect
  - partition
  - zfs
  - mount
  1. Add these entries to the units section of /etc/calamares/modules/services-systemd.conf:
   - name: "zfs-import-cache.service"
     mandatory: false
   - name: "zfs-mount.service"
     mandatory: true
   - name: "zfs.target"
     mandatory: true
   - name: "zfs-import.target"
     mandatory: true
  1. Update /etc/calamares/modules/zfs.conf according to your desired ZFS layout and options (I added -o autotrim=on -O dnodesize=auto -O normalization=formD options to poolOptions, changed poolName and added some datasets).
    Be sure to specify canMount=noauto for additional datasets otherwise ZFS will mount them over the live installation environment and the installation will fail.
    NOTE: There is a bug in systemd (https://issues.redhat.com/browse/RHEL-17164) which does not allow /usr to be a separate filesystem.

  2. In /etc/calamares/scripts/chrooted_cleaner_script.sh add before line 503 export ZPOOL_VDEV_NAME_PATH=1 so it should look like:

    if [ -z "$NEW_USER" ] ; then
        _c_c_s_msg error "new username is unknown!"
    fi

    export ZPOOL_VDEV_NAME_PATH=1
    _check_install_mode
    _virtual_machines
    _clean_up
  1. Now run the installation and on the disk setup page choose to erase the whole disk, select swap size (I chose suspend) and zfs as the filesystem.
    Select encryption at the bottom of the page. This will result in ALL partitions being encrypted, in my case including /boot.

  2. When the installation finishes, do NOT reboot!
    For all datasets (except root) now change their canmount property to on: zfs set canmount=on <volume>, for example:

zfs set canmount=on zpeosroot/ROOT/eos/home
  1. Then in terminal window run sudo killall gpg-agent and then zpool export -a, otherwise you won’t be able to boot (ZFS will refuse to import the pool due to not being cleanly exported).

  2. When booting the new installation, press e to edit the kernel command line and remove the second root=... option (with UUID=…), otherwise it won’t boot.
    During boot, enter passwords with English keyboard layout.

You can skip this step. The error you get at the end of zfs-dkms install can be ignored. It is a hook that doesn’t matter in that situation.

So far every time I’ve run dracut I’ve had to forcefully import the zpool upon boot :frowning:

When you installed, did you add the zfshostid module? If not, do you have /etc/hostid generated?

Also, how are you running dracut?

No, I didn’t add zfshostid module. The last time I generated hostid via zgenhostid and copied it into the still mounted new installation. The first dracut-rebuild then didn’t break it so I guess the hostid was the issue.

I don’t even see any such module. It seems it is installed in module-setup.sh of 90zfs.

That is a Calamares module. Something you would add to settings.conf.

It copies the generated hostid into the new installation.