Comprehensive Linux system recovery - from broken OS or hardware failure to a fully functional system.
This guide is written for EndeavourOS but can be easily adapted for any Arch based system, and less easily for any other Linux based system. The concepts are the same irrespective of distribution.
Written from the perspective of a beginner to copy-on-write filesystems, this was originally intended as a personal reference but hopefully someone may find this updated version useful. Feel free to open an issue or PR for corrections.
- Fast incremental snapshots of whole system that can be restored easily in the event of a broken OS.
- Offsite backup of snapshots that can be restored easily in the event of hardware failure.
This setup should not be used in production, it was originally intended for a personal laptop without RAID and that means without the self healing feature offered by the Btrfs filesystem.
- Btrfs (“Butter” FS): CoW filesystem with super fast snapshots
- snapper: automatic creation & management of btrfs snapshots
- snap-pac: automatic creation of pre/post btrfs snapshots on package changes when using pacman (Arch package manager)
- grub-btrfs: include btrfs snapshots in grub (bootloader) menu options
- borg: backup program that supports deduplication, compression, and client side encryption - used for offsite backup and restore
- borgmatic: automatic creation & management of borg backups
Much of this was implemented by referring to the ArchWiki (Arch Linux documentation), this would not have been possible without their excellent documentation and its contributors. Thanks also goes to EndeavourOS forum members who were kind enough to clear up some doubts and the OS/package maintainers.
Sources not listed previously:
- Btrfs ArchWiki: https://wiki.archlinux.org/title/Btrfs
- Snapper ArchWiki: https://wiki.archlinux.org/title/snapper
For ease of setup, this guide assumes you’re
root. Be careful and read twice before executing any commands!
Disk and partitions:
[root@eos-laptop ~]# parted -l Model: ATA KINGSTON SV300S3 (scsi) Disk /dev/sda: 120GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 2097kB 539MB 537MB fat32 boot, esp 2 539MB 111GB 110GB btrfs root legacy_boot 3 111GB 120GB 9449MB linux-swap(v1) swap
[root@eos-laptop ~]# lsblk /dev/sda NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sda 8:0 0 111.8G 0 disk ├─sda1 8:1 0 512M 0 part /boot/efi ├─sda2 8:2 0 102.5G 0 part /home │ /var/log │ /var/cache │ / └─sda3 8:3 0 8.8G 0 part [SWAP]
It’s not required to have the exact aforementioned layout, but knowing the disk/partition topology would make following this guide easier.
We’re naturally interested in the btrfs partition:
Get the properties including UUID (universally unique identifier) of
[root@eos-laptop ~]# blkid /dev/sda2 /dev/sda2: UUID="63745996-340f-4bda-98b1-63af36e6cdae" UUID_SUB="c52df0d3-929a-4f2e-b980-5035036125e0" BLOCK_SIZE="4096" TYPE="btrfs" PARTLABEL="root" PARTUUID="a1969063-479c-3042-9a63-68560d5f5e23"
Btrfs (“Butter” FS) is a modern CoW (copy-on-write) filesystem with a lot of interesting features like checksumming for both data and metadata that can be used for finding and correcting errors.
Since we’re not using RAID (with mirroring), our setup can only find errors and not fix it automatically. We can always restore from the offsite backups if the original ever gets corrupted beyond repair.
For the purpose of this guide it’s important to understand Btrfs subvolume and CoW concepts.
- A Btrfs subvolume is an independently mountable POSIX filetree and not a block device (and cannot be treated as one). Most other POSIX filesystems have a single mountable root, Btrfs has an independent mountable root for the volume (top level subvolume) and for each subvolume; a Btrfs volume can contain more than a single filetree, it can contain a forest of filetrees. A Btrfs subvolume can be thought of as a POSIX file namespace.
- The top-level subvolume (with Btrfs id 5) (which one can think of as the root of the volume) can be mounted, and the full filesystem structure will be seen at the mount point; alternatively any other subvolume can be mounted (with the mount options subvol or subvolid, for example subvol=@home) and only anything below that subvolume will be visible at the mount point. For example when we map the subvolume
/homedirectory, only the contents of subvolume
@homewill be visible at the mount point
Here’s the filesystem layout we’ll be implementing:
<subvol> <mountpoint> @ / @home /home @log /var/log @cache /var/cache @snapshots /.snapshots
Listing current subvolumes:
[root@eos-laptop ~]# btrfs subvolume list / ID 276 gen 1629 top level 5 path @ ID 277 gen 1629 top level 5 path @home ID 278 gen 1629 top level 5 path @log ID 279 gen 910 top level 5 path @cache
Manually mount the btrfs partition to see what’s inside:
# Mount partition /dev/sda2 at /mnt, here we're explicitly mentioning the type as btrfs. It would probably work without it as well! [root@eos-laptop ~]# mount -t btrfs /dev/sda2 /mnt [root@eos-laptop ~]# cd /mnt [root@eos-laptop mnt]# ls @ @cache @home @log [root@eos-laptop mnt]# ls @ bin boot dev etc home lib lib64 mnt opt proc root run sbin srv sys tmp usr var [root@eos-laptop mnt]# ls @cache ldconfig man pacman private [root@eos-laptop mnt]# ls @home pavin # Unmount [root@eos-laptop ~]# umount /mnt
EndeavourOS creates these subvolumes automatically when choosing to install on a btrfs partition, we’ll create the
@snapshots subvolume manually later.
Whew! That was a handful to digest, but don’t worry if you don’t immediately get it, you’ll understand it better once we implement it.
- The CoW operation is used on all writes to the filesystem.
- This makes it much easier to implement lazy copies, where the copy is initially just a reference to the original, but as the copy (or the original) is changed, the two versions diverge from each other in the expected way.
This is why snapshots can be created so fast and easily in btrfs. A snapshot subvolume is simply a reference to another subvolume.
Update repos and upgrade packages:
pacman -S snapper snap-pac grub-btrfs borg borgmatic
Create a snapper config for root subvolume mounted at
snapper -c root create-config /
Modify the snapper config, the defaults are probably fine.
You may create other snapper configs as needed, for example for
/home but it will not be covered here as we’re backing up
Enable systemd timer to have snapper create a snapshot on each boot:
systemctl enable snapper-boot.timer
Backup non-btrfs partition
/boot (EFI system partition) on pacman kernel updates
[Trigger] Operation = Upgrade Operation = Install Operation = Remove Type = Path Target = usr/lib/modules/*/vmlinuz [Action] Depends = rsync Description = Backing up /boot... When = PostTransaction Exec = /usr/bin/rsync -a --delete /boot /.bootbackup
Snapper automatically creates a
.snapshots directory inside the directory it’s snapshotting.
For the above config for root it will be at
But we don’t need it as we’ll create an empty directory of same name/path for our
@snapshots subvolume to be mounted at.
.snapshots directory snapper created at
rm -rI /.snapshots
Create the directory anew
Mount btrfs partition and create new subvolume
mount -t btrfs /dev/sda2 /mnt btrfs subvolume create /mnt/@snapshots umount /mnt
Automatically mount the
@snapshots subvolume at
# /etc/fstab: static file system information. # # Use 'blkid' to print the universally unique identifier for a device; this may # be used with UUID= as a more robust way to name devices that works even if # disks are added and removed. See fstab(5). # # <file system> <mount point> <type> <options> <dump> <pass> UUID=E199-B52A /boot/efi vfat umask=0077 0 2 UUID=63745996-340f-4bda-98b1-63af36e6cdae / btrfs subvol=/@,defaults,noatime,space_cache,autodefrag,compress=lzo 0 1 UUID=63745996-340f-4bda-98b1-63af36e6cdae /home btrfs subvol=/@home,defaults,noatime,space_cache,autodefrag,compress=lzo 0 2 UUID=63745996-340f-4bda-98b1-63af36e6cdae /var/cache btrfs subvol=/@cache,defaults,noatime,space_cache,autodefrag,compress=lzo 0 2 UUID=63745996-340f-4bda-98b1-63af36e6cdae /var/log btrfs subvol=/@log,defaults,noatime,space_cache,autodefrag,compress=lzo 0 2 UUID=831cc5b9-7c99-459c-8b25-0db765c5ce3b swap swap defaults,noatime 0 0 tmpfs /tmp tmpfs defaults,noatime,mode=1777 0 0 # Snapshots UUID=63745996-340f-4bda-98b1-63af36e6cdae /.snapshots btrfs subvol=/@snapshots,defaults,noatime,space_cache,compress=lzo 0 0
Mount all partitions listed in
Check if everything works by creating a manual snapshot:
snapper -c root create -c number -d 'Test snap'
- The first
-cflag is for specifying the config name, which is
root. To list all snapper configs:
- The second
-cflag is for specifying the cleanup algorithm to use, which is
number. Snapshots older than the number limit specified in
/etc/snapper/configs/rootwould be pruned
-dflag specifies a description for the snapshot
List the snapshots:
snapper -c root list
List all subvolumes:
[root@eos-laptop mnt]# btrfs subvolume list / ID 276 gen 1987 top level 5 path @ ID 277 gen 1987 top level 5 path @home ID 278 gen 1987 top level 5 path @log ID 279 gen 1987 top level 5 path @cache ID 290 gen 1987 top level 5 path @snapshots ID 291 gen 349 top level 290 path @snapshots/1/snapshot
We can see the snapshot we just created manually.
Get property of subvolume:
[root@eos-laptop mnt]# btrfs property get /.snapshots/1/snapshot ro=true [root@eos-laptop mnt]# btrfs property get / ro=false
By default, snapper creates read-only snapshots. This is preferred as it would prevent us from accidentally modifying any snapshots, one of which may be our last chance at restoring to a functional system state.
To restore from a snapshot, we create a read-write snapshot from the read-only snapshot (shown later).
Generate grub configuration manually once:
grub-mkconfig -o /boot/grub/grub.cfg
To automatically update grub menu entries on detecting changes in
/.snapshots directory, the
grub-btrfs package provides a systemd script:
systemctl enable grub-btrfs.path
Power on or reboot the system.
From the snapshot list, boot into the last functional snapshot. Make a mental note of the snapshot number.
The system should boot without issues as
/var/cacheare all read-write, only
/will be read-only.
If the system does not boot from the snapshot, boot from a live USB.
Make sure we have booted into a read-only snapshot:
[root@eos-laptop mnt]# btrfs property get / ro=true
The aforementioned step is crucial, do not proceed if it’s not a read-only snapshot.
Optional: If you don’t know the correct snapshot number to restore, read the
info.xml file associated with each snapshot created by snapper:
As an example, we will proceed with restoration of root
@) from snapshot number
Mount the btrfs partition to
mount -t btrfs /dev/sda2 /mnt
mv /mnt/@ /mnt/@.broken
Create a new
@ subvolume from our chosen snapshot:
btrfs subvolume snapshot /.snapshots/33/snapshot /mnt/@
By default, btrfs creates read-write snapshot. To create a read-only snapshot, specify the
Unmount and reboot into the restored system:
umount /mnt reboot
Once rebooted, delete the
mount -t btrfs /dev/sda2 /mnt btrfs subvolume delete /mnt/@.broken umount /mnt
Optionally, scrub the filesystem (verifies all data & metadata checksums):
btrfs scrub start / btrfs scrub status /
Boot from a live USB
List all block devices:
lsblk parted -l
Check the unmounted btrfs filesystem:
btrfs check /dev/sda2
Repairing an unmounted btrfs filesystem (DANGEROUS OPTION, see man page for
btrfs check --repair /dev/sda2
Install borg and create user on remote server with SSH access (in this case it’s a Debian server):
apt install borgbackup adduser borg --disabled-password
From local machine:
mkdir -p /root/.ssh cd /root/.ssh ssh-keygen -t rsa
Copy the public key to remote server location:
From local machine:
ssh email@example.com mkdir eos-laptop
In the previous command,
eos-laptoprefers to local machine’s hostname and
remote-server.tldrefers to remote server’s FQDN
Initialize borg repository from local machine:
borg init -e repokey firstname.lastname@example.org:/home/borg/eos-laptop
Make sure to securely backup the borg repo password as well as the private key created earlier.
Generate borgmatic config from local machine:
mkdir -p /etc/borgmatic generate-borgmatic-config -d /etc/borgmatic/config.yaml chmod 600 /etc/borgmatic/config.yaml chown root:root /etc/borgmatic/config.yaml
Edit the borgmatic config file
location: source_directories: - /mnt/@.latest - /email@example.com - /firstname.lastname@example.org - /email@example.com repositories: - firstname.lastname@example.org:/home/borg/eos-laptop storage: encryption_passphrase: "<your-borg-repo-password>" retention: keep_daily: 7 keep_weekly: 4 hooks: before_backup: - mount /dev/sda2 /mnt - btrfs subvolume snapshot -r /mnt/@ /mnt/@.latest - btrfs subvolume snapshot -r /mnt/@home /email@example.com - btrfs subvolume snapshot -r /mnt/@log /firstname.lastname@example.org - btrfs subvolume snapshot -r /mnt/@cache /email@example.com after_backup: - btrfs subvolume delete /mnt/@.latest - btrfs subvolume delete /firstname.lastname@example.org - btrfs subvolume delete /email@example.com - btrfs subvolume delete /firstname.lastname@example.org - umount /mnt on_error: - btrfs subvolume delete /mnt/@.latest - btrfs subvolume delete /email@example.com - btrfs subvolume delete /firstname.lastname@example.org - btrfs subvolume delete /email@example.com - umount /mnt
Validate the borgmatic config file
Perform a backup (this may take a while as it’s a first time sync):
borgmatic --verbosity 1
Setup cron to automatically perform backups.
0 3 * * * root /usr/bin/borgmatic --verbosity 1 --syslog-verbosity 1
Secure cron file:
chmod 600 /etc/cron.d/borgmatic chown root:root /etc/cron.d/borgmatic
List all backups:
Get information about backups:
Break lock if previous backup was interrupted and no borg processes are running:
borgmatic borg break-lock
This section details restoring to a fresh installation of the OS.
Boot into freshly installed system and install borg and borgmatic as stated in the previous section.
List all backups:
Mount the latest backup to check files:
borgmatic mount --archive latest --mount-point /mnt
borgmatic umount --mount-point /mnt
Extract files to
borgmatic extract --archive latest --destination /root --progress
cd /root mv mnt backup
Reboot into live environment
Mount btrfs partition:
mount -t btrfs /dev/sda2 /mnt
Rename existing subvolumes:
cd /mnt mv @ @.fresh mv @home @home.fresh mv @log @log.fresh mv @cache @cache.fresh
Move and rename backup files:
mv /mnt/@.fresh/root/backup/* /mnt mv @.latest @ mv @home.latest @home mv @log.latest @log mv @cache.latest @cache
/boot dir from
cp -a /mnt/@.fresh/boot /mnt/@
/mnt/@/etc/fstab with UUID of new EFI, btrfs and swap partitions:
lsblk blkid /dev/sda1 blkid /dev/sda2 blkid /dev/sda3
Reboot back into OS.
/etc/default/grub with UUID of new swap partition (for hibernation resume) and run:
grub-mkconfig -o /boot/grub/grub.cfg
Here’s a cute picture of Butters from South Park.