Not sure if this is the right place to ask this question, so apologies if it doesn’t belong here (or anywhere!).
An issue I recently had caused me to think I hadn’t backed up for a few weeks so had better do so. I generally use rsync. I was just selecting my default settings when I was struck by the following…
The only reason we back up is in case something happens to the files/disk as a result of some action we take (even if it’s hard drive failure - some action was always going to be the last straw). So I might be editing a spreadsheet and it becomes corrupted and I think “thank goodness I have a back up…”
But backing up files is some action taken on all my files…
If taking some action on files risks damaging them (the reason I have backups at all) then surely backing up all the files represents the biggest risk we take in damaging files.
Even more so with human error. If you’re quite likely to accidentally delete or overwrite some file you really needed and think “thank goodness I have a back up”, are you not just as likely to accidentally click ‘restore’ instead of ‘backup’, or hit go when you meant to hit ‘dry run’, and thereby wreck all your files?
Basically, if something can go wrong when using a computer to manipulate files (the reason we need a back up) then that same thing can go wrong when actually running the back up (which is an instance of using a computer to manipulate files), and backups act on every single file on the computer, so represent a massive risk (relative to working on a single file).
Thus running lots of backups massively increases the chances of something going wrong and thereby needing a backup.
Is there, then, a sensible rate of backing up which balances risk, has someone already worked this out?
The question is far too vague for an accurate answer to be provided. If it’s the “rate” (I believe “frequency” is the more accurate term) of backup you’re concerned about, then it would depend on a lot of factors such as:
The preferred method of backup
The current state and condition of your hardware
The build quality of your hardware
The amount of data involved in the backup process.
And so forth.
Now, let’s talk about drive failures, which, I guess, is really the focal point of the discussion.
Manufacturers typically define the “service life” of an SSD by the amount of data that can be written onto the drive before the drive craps itself. This number is usually around several hundreds terabytes. There are little tweaks you can perform on the software level to help improve the life span of an SSD (like wear-levelling with fstrim, etc.), but that’s another topic entirely.
So how does this relate to the current discussion?
Well, what’s your preferred back-up method? If your backup process mainly involves copying (reading) data on your SSD and then writing it onto an external hard disk, then doing so frequently probably wouldn’t cause any significant wear on your SSD – since you’re just reading data from it. If you’re doing something else that involves a lot of write operations, then that’s a different matter. The point is that frequently writing large amounts of data will wear your SSD faster. So you might want to take this into consideration if you want to balance the risks of backing up.
And how much data is involved in the backup? Are you backing up in small chunks (like the diff between the previous state of the drive and the current state) or are you backing up the whole drive every time? If the amount of data isn’t too large and bandwidth isn’t an issue, you might want to look into a cloud solution. User configuration files usually aren’t an issue since we can use some form of version control to keep track of them. For instance, I keep a local git repo for all my config files. And every time I want to back up, I just run git push and the changes will be pushed to a remote repository on github.
Having said that, the whole point of backing up frequently is that if something does go wrong, you would have retained the latest state of your drive prior to the failure. If you only back up once a year, you only have last year’s version of your files when your system fails. Compare that to the case where you back up once every two weeks.
There are a few ways to make backups more reliable. But you are right, nothing can fully prevent a human error.
Use several external disks for backups. This way you have many similar backups, and if one of the backup disks fail for any reason, you have another that should work.
Preferably keep some of those external disks in more than one trusted location. This would help in case of a flood or fire accidents.
Use also online locations for backups (but only if you trust them). That makes a nice additional backup.
Do backups reasonably often. What is often depends on what you are backing up.
If you need to store also old versions of some (text) files, you probably want to use version control software. Note that version control suits best for text files, and less well for binary files (e.g. photos or videos).
Here are some thoughts. I may have missed something, but others can add that.
I’ve got my rsync command up on my console right now, just need to hit enter, and I’m thinking…
If I get this wrong (maybe I’ve accidentally brought up the command for a restore, not a back up), or if rsync goes wrong (presumably software goes wrong sometimes), or if my computer crashes badly whilst doing this… I lose all my files since the last back up, and possibly also the last backup (depending how badly rsync has gone wrong in this hypothetical)
If, on the other hand, I don’t back up and something goes wrong today - some other software breaks, my computer crashes for some other reason… I just lose the files I’ve changed (since I’ve no back up), or the one I’m working on.
It’s hard to press enter… I know I’m going to need to at some point, otherwise I’ll have no backups ever, but at each actual instance of doing a backup, I’m taking this risk that never seems to be quite outweighed at that time, it’s only outweighed afterwards when I’m glad I had a backup and nothing did, in fact, go wrong.
I thought others might have experienced the existential backup crisis, but clearly, it’s just me!
First, don’t panic! Making careful preparations before each step is the key.
I’d suggest you simply copy your current backup drive (call it backup1) to another drive (backup2). After this you should have two identical backup drives, which is good already.
Then you can do the backup with rsync (which you are trying to do now) to backup1.
If that fails, you have backup2 available. Try to find out exactly what caused the problem.
When you find the exact cause, fix it.
Then copy from the good backup2 to backup1, and again you have two similar backup drives.
Then repeat the fixed rsync process. Iterate this until your backups are OK, first in backup1, and finally in backup2.
Try to be careful at every step. Just make sure which drive is the source, and which is the destination.
And remember to keep calm even if you think or see you made a mistake.
Do not do panic actions, take your time, take a break, and start again later.
Done it! One backup on a spare internal hard drive, second backup on an external hard drive, no errors (this time!). Now I don’t have to worry about it for another few weeks.
I’m just holding out for the AI backup software where it can check your commands and flags for you and say “I don’t think you mean to do that, do you…?”
Of course, the disadvantage of AI run backups is that they then take over the world and we have to get various robots/revolutionaries from the future to kill it off before it begins, but at least you get a film franchise out of it, so not all bad…
Just put it in a script? Write the script, test its effects to see whether it does what it is supposed to do, and then run the script every time you need the functionality. This eliminates human errors.
Data corruption due to hardware failure can happen even during normal usage, not just in the middle of a backup process. If you’re experiencing neurosis before a backup, then you must be going berserk when you’re browsing the web because your browser is writing something onto the disk every time you load a web page
Agreed. That’s why the service life of SSDs are usually defined in terms of write volume. Plus, writing to disks involve more cpu operations compared to read operations. Other information (directory structure etc.) has to be updated after every write.
I’m going to outline my backup strategy (using a mix of Vorta/Borg and rsync). First ; one backup = no backup.
Here’s my setup :
Backup to local drive repository using Vorta/Borg nightly.
Backup to cloud repository nightly, alongside backup to NAS.
Backup to air-gapped device as a point-in-time rotating backup set ; 4 x weekly, 12 x monthly, 4 x yearly. (4/12/4) - some people vary this in different ways depending on how much redundancy they need.
So yes, you’re thinking is sound although not hugely clear in terms of what you’re trying to solve. Hardware fails, - that’s just a given. It’s not a question of if, but when. So ensuring your strategy has resilience and redundancy is the most important step. Your design needs to work for you, - backups in themselves shouldn’t be a point of risk if you set them up correctly so they are automated.
When things go wrong, it’s inevitably precipitated by human error or lack of planning, or both… No one should be backing up with a manual rsync command at 2am on no sleep with fluffy fingers… Good luck on your journey!