A Reminder to Backup All Important Data

I just watched a video about how, many years ago, Steam deleted someone’s entire computer, including an external backup drive that was plugged in, due to a poorly written Bash script. The script in question contained this (including the ominous comment):

# Scary!
rm -rf "$STEAMROOT/"*

Earlier in the script, the variable STEAMROOT was set using some cd and pwd nonsense, which under unforeseen circumstances failed and left the variable empty, causing rm -rf / to execute thus destroying all user owned data on the entire system.

Obviously, this was an awfully written script. But, it came from Valve, so you know it was run on thousands of computers. Maybe you ran it, too.

If such an outrageously bad script could pass Valve’s code review and quality assurance, think about the scripts you run almost every day, for example, when you update some package that was built from the AUR. If you are not thorougly inspecting PKGBUILDs, you’re tickling the dragon’s tail.

Here is another example how a space in a third-party script caused someone to lose their /usr directory. Stuff like that happens, all the time, rarely due to maliciousness, but often due to incompetence or carelessness.

Sometimes, things like updating the kernel can cause data loss. Even on ext4, even with LTS kernel.

Speaking of incompetence and carelessness, there is also the user error. Sooner or later you will mess up. More times than I’m willing to admit, I have typed in some command, triple checked everything, hit Enter… and then immediately realised what I have done. The cold sweat, the hands shaking… and all I could utter was: “oh, :poop:”…

Careless mistakes happen, and they will continue to happen. They also happen to much smarter and more experienced people than me.

System cleaning utilities like BleachBit are notorious for often removing more than the user wants. While using such tools is generally unnecessary on GNU/Linux and thus foolish to put it mildly, there are people who like using them, regardless of the risk.

Finally, there is hardware failure. All hardware eventually fails. All HDDs and SSDs will die, sooner or later. When they do, the data on them will probably be irrecoverable, for all practical purposes. It’s not a matter of if, but when. Good drives usually last long enough that they become obsolete, but even the best drives fail. Think of all your hardware as expandable.

In any case, whether it is your own fault or because of the things beyond your control, data loss is a very likely possibility.


If you have a good backup, any issue you might have with your system, no matter how bad, is just an annoyance.

Given how storage is fairly cheap, it is downright stupid not to have a backup.

What constitutes a good backup?

Obviously, the more important the data, the more resilient the backup has to be. But in general:

  1. The backup has to consist of multiple copies on different storage. One copy is like zero copies. And if you keep multiple copies on the same drive, you’ll lose all of them when the drive fails. Use different physical drives.

  2. The backup should be physically separate from your computer. A stupid command like rm -rf / will wipe all user-owned files on every mounted drive on your computer. Even unmounting a backup drive is not enough, a power surge or a faulty PSU can destroy all the hardware on your machine. A network drive is also not good enough. Important files should be kept backed up on drives that are physically disconnected from any running computer, and connected (manually) only when backup is taken or restored.

  3. The backup should be resistant to being overwritten by corrupted data. By the time you make a backup, a file might already be corrupted without you noticing, in which case you will save a corrupted file to your backup. What is really bad is when you overwrite a good version of the file with a corrupted file. Simply copying stuff to a different location, while infinitely better than no backup at all, is not good enough. Keeping old versions of backup is a good idea, an incremental backup where nothing is overwritten is also good. Using a backup solution like Borg is excellent, but keep in mind that the extra complexity can cause issues, too. Of course, avoid any proprietary backup software – proprietary formats are not future-proof by definition.

Exactly how you do it is entirely your choice. But if you’re not doing it already, rethink your life and start doing it.

16 Likes

WTF :rofl:

Indeed.

exploding_head_72

If a story like that doesn’t motivate you to keep a good backup, you probably deserve to lose your data.

I hope instead of that script getting executed…whoever wrote it was executed :rofl:

3 Likes

when I used to care enough to do a full backup (I just grsync important stuff regularly now) I would restic .gpg to an external drive that is never plugged in. You are calling my one copy Zero because if hardware/mobo take a dump I don’t have computer to restore the .gpg to? It’s also very likely that went over my head.

Point #2 I am in compliance with.

do you mean if something icko recursively copied got to the backup it would be “overwritten by corrupt data”? Or you mean by overwritten/corrupted if you wanted to share your external backup hdd with a movie or music library and were careless, for instance?

Also not covered in your fine tutorial (THANKS BTW for reminder I need to Grsync my crap tonight it’s been a while) is when I committed to this nifty toshiba TB external for all my backup data…
…had a convo with a guy…who was adamant FAT was the devil…
…and versatility (cross-platform) was nonsense…that gpt/Ext4 was the only Smart, Correct way to go when formatting your backup disk…
…it was just some guy on the internet…I didn’t know him…so of course I followed his advice blindly. What’s your take on FAT format backup disk?

computers are realy performant nowadays… i mean … rm on a decent machine can do a lot of removal in some milliseconds.

And now all together:

backups

b a c k u p s

BACKUPS

6 Likes

Backup home.

Redundant home.

Don’t backup OS cause it’s a waste of space and I can reinstall and be running again in like 30 min.

7 Likes

Bad scenario - one copy on a second drive within computer.

Electrical issues or fire can ruin all copies.

Better scenario -one copy on second drive and one on separate drive normally disconnected from computer.

Bonus points if in a locked sealed fire/waterproof safe outside of backup time

Best scenario. +1 in another safe at an off site location.

1 Like

I have lived and breathed this for a year. One day the lightbulb went off: “don’t waste your time with another full backup; setting up a fresh install really not that stressful after peripherals installed.”

Exactly. Data is important. The OS is not. Just keep one USB ready at all times and you’re golden.

2 Likes

all my important personal data is already saved by intelligences in case i ask it backup from them :fist:

I do no OS backups thats waste of space.

2 Likes

All hail intelligences.

1 Like

:beer: to intelligences.

mine is dirt-simple grsync. /home is sensible but I’m not married to /.config anymore. I used to be.

FAT is not a POSIX compatible filesystem. Every time you copy data to a FAT drive, you lose information about permissions, ownership, etc… Now, that may or may not be a problem. Typically, you only backup the data in your home directory, and most of it has the same permissions. But I wouldn’t do that, personally, especially since there is zero benefit to doing it, FAT is in general a crappy filesystem.

While infinitely better than nothing, that’s not ideal. Assume you get data corruption in your /home directory. That’s fine, you have a backup, no problem. But what if you do not notice that data is corrupted? Everything seems fine, except some files do not open, but normally you do not open these files frequently enough to notice. So you make a backup, by grsync over the existing backup. Now you’ve overwritten good files with corrupt ones, and there is nowhere to restore them from, they are gone.

Speaking of

Personally i always do:

  1. Full disk hashsums into file.
  2. Backup with rsync.
  3. Check hashes.
  4. Keep hashes both on disk and separately, to always have a reference of files not being corrupted, unless actually changed.
2 Likes

Oh that funny, i’ve watched this exact video last night :rofl:

I only plug my external backup drive in when I do a backup and a backup is all I do then I unmount and unplug. Meaning I don’t multitask during a backup. I also tar my backups not for space but just for the reason (3) incase of some corruption I can go back to an uncorrupted version.

another reason for taring my backups lol. I mess up a file I can get the old copy back easy enough. saved me a few times from my dumb decisions. Of course there are those instances where I know I made a backup of the file before I started editing I didn’t but just like that moment when you find yourself in the kitchen going What did I come in here for? you swear you did.

At least 2 devices IMHO should be used as dedicated backup. Backup the Backup.

2 Likes

I had already planned my backup strategy:

  • Thanks to my systemd service, my backup disk is automatically unmounted after auto daily incremental backup was fully processed.

  • My custom notification prompts me to decide “Yes” or “No” on Desktop when backup process starts by triggering my systemd timer.
    For example: If my system / data is broken or corrupted after a bad update or due to my mistake, then I click “No” → Do not start backup process, of course, my backup disk is not mounted. “No” is always default.

  • I have at least 2 different filesystems on different backup hard drives, because they are more safe than many same filesystems. If new update with a new bug corrupts the filesystem, I have a good chance that the other different filesystem is not affected by this bug.

  • I have another backup system that does not need to be mounted, but it is a remote backup via KVM/QEMU with ssh or http connection to access other native physical disks (not virtual disk) on my same computer (That means I have two separate Kernels, both run in parallel).

  • If backup process fails, I am automatically notified by journalctl-desktop-notification, then I take care of what problem of backup.

  • At least 2 different filesystems have the ability to verify my data due to its checksum on my backup disks.

  • systemd-timer-notify can tell me that “Please do not shutdown PC” while backup process is running.

Back I go to Windows. :joy:

In all seriousness, I tend to update an off-site backup once every few gigabytes rather than, say, every week. That is for my personal data. It would still be frustrating if I lost the original data - which is always on a mounted drive - to a bad AUR update.

I have a regularly updated Timeshift backup on a separate drive. In the event of the system being lost, is it easier to get everything back from that Timeshift backup or just with a reinstall and a home folder backup?