Q: best method for copying large # of files

Am replacing a 1TB spinner (internal storage)with a 6TB Seagate NAS drive, and have 201,479 items, totalling 800.6 GB (801.0 GB on disk) to copy over. If I go about this wrong, it could be a clumsy mess, and I wouldn’t be able to do much with this machine perhaps for more than a day.
Any recommendations would be welcome.
BTW, I’m using EXT4 file system in the OS, would there be a faster, more efficient file system for this internal drive?

1 Like

Can’t get any better than rsync

honka_animated-128px-46

I do it like that, with nice progress output:

rsync -aHAX --info=progress2 --exclude={"./lost+found"} "SOURCE_DIR" "DESTINATION_DIR"
11 Likes

Indeed. rsync is what I would recommend as well.

5 Likes

Sorry for asking here.

I want to backup my disk before reinstalling eos and I used to do backups from GUI file manager and it seems it’s time to try something faster. So, having already an old backup that needs to be updated without including config directories/files and keeping creation/modification timestamps; my guess is that this command will be fine to sync to an external HDD:

rsync -av /home/triby/ /media/devicename/backups/triby --exclude={"/home/triby/.*" "/home/triby/.*.*"}

Am I right?

Maybe I’ll try it before with --dry-run to see what happens.

When designing a new rsync command for something I usually do --dry-run first.

@keybreak suggested additionally -HAX which should be considered.

If the external HDD’s file system supports hardlinks then use at least -H which preserves hardlinks.

4 Likes

Also that’s a weird looking exclude pattern to me…but you do you :upside_down_face:

2 Likes

Grsync is a pretty nifty GUI front-end for rsync, if need be.

3 Likes

Thanks for your reply, there’s a lot of things that I don’t understand about file systems:

  • It seems like the -A and -X option will be really useful, to keep ownership, permissions and extended attributes
  • I really don’t know if there are hard links on these directories and my concerns are:
    a. Including -H could affect performance, but doesn’t really matters
    b. Without -H hard-linked files in the source are treated as though they were separate files, but what would happen to them when restoring?

Where is the clown? :rofl:

That pattern is to exclude configuration directories and folders that are almost always hidden (prefixed with a dot), I’ll add Downloads and maybe other directories.

Can you give me some tips to make a better patern?

It looks nice and I’ll take a look to see what’s the best option, considering that I’ll move from Cinnamon (GTK) to KDE Plasma (QT).

2 Likes

Naaah, i’m just a :clown_face:

2 Likes

I’ve been searching around and found that maybe just --exclude={"/home/triby/.*"} would be enough to exclude hidden directories and files on my home directory. Is it right?

Permissions and ownerships are taken care of by -a which is an abbreviation of -rlptgoD. Check the rsync man page for what those options exactly mean.

-A: is about preserving ACLs
-X: is about preserving extended attributes.

1 Like

Thanks for clarification, I always like to learn something new.

Hm, I think in order to exclude hidden directories throughout the directory tree you should use --exclude='.*/'.

Here also --dry-run is your friend. Only Chuck Norris wouldn’t use --dry-run.

Important: Excluding hidden files and directories could mean you exclude important stuff which should be backed up. Example: ~/.config/

But it could be a good idea to exclude ~/.cache directory.

2 Likes

Thanks again for your explanation, I really don’t care about config directories and files, because I keep configurations manually only for some apps, besides, I’ll change from Cinnamon to KDE Plasma and there would be a lot of stuff not needed any more inside those directories.

After @keybreak’s advice and yours, I’ve got this:

rsync -aHAXvvuP /home/triby/ /run/media/triby/e30a029b-e6d2-4968-ab35-7fc669ea51c2/triby --exclude={.*,Downloads/}

What I understood (Did I?) from that is:

  • a Preserve owner (user and group), permissions, timestamps, etc.
  • H Preserve hard links
  • A Preserve ACL
  • X Preserve extended attributes
  • vv More verbosity, combined with u results in messages like /path/filename.ext is uptodate for skipped files
  • u Skip files that are newer on destination
  • P Doesn’t do anything with --dry-run but I guess it should show progress while copying files

The exclude pattern should be comma separated, doesn’t need quotes and just .* is enough to exclude anything hidden (Thanks, @keybreak for comments on that first strange pattern; btw I’m still missing your clown icon).

1 Like

I’d be very careful with this statement. .config and it’s sub-folders contain configurations for most things, same with dot files in your home directory. You may not change them manually but settings you change in apps might. I highly recommend you always back up hidden files as well. If there are specific exclusions you want to make for some reason, do that.

3 Likes

I fully agree.

BTW, there are others really important hidden directories or files. Just some examples which are important to me

~/.ssh/
~/.vim/
~/.zshrc
~/.zsh_history
~/.zshenv
~/.fetchmailrc
~/.thunderbird/
~/.mozilla/

and more…

3 Likes

@MikeDelta42 have a great point, it’s best to backup all and then decide on it’s fate, than forget something :wink:

What do you mean? Like avatar? I’ve always used that one, :clown_face: comes in a form of gif to my post!
honka_animated-128px-3
honka_animated-128px-8

1 Like

I am late to the show but want to express my support for rsync. I am using this tool since 20+ years and it never let me down.

I have an alias rsync-copy with my default rsync parameter:

alias rsync-copy='rsync -aAhHxX --progress'

The parameters are easy to remember. They are lower case and upper case of ahx.

--archive, -a            archive mode is -rlptgoD (no -A,-X,-U,-N,-H)
--acls, -A               preserve ACLs (implies --perms)
--human-readable, -h     output numbers in a human-readable format
--hard-links, -H         preserve hard links
--one-file-system, -x    don't cross filesystem boundaries
--xattrs, -X             preserve extended attributes
2 Likes

Actually, for this I would recommend rclone over rsync. You want something multithreaded for this, and rsync doesn’t multithread by default, you have to tell it to manually or else it does one file at a time. Also, in this case there’s no advantage in rsync’s delta copy capability, since you’re copying the entire contents of the drive to an empty drive. Other than those items, they perform pretty similarly with checksum checks and resume capability, etc etc.

As far as the OS filesystem, I don’t think you’d notice a performance difference if you went to something like XFS. BTRFS would probably be a little snower but get you snapshot capability, but not everyone wants that and there are still reports of issues here and there with the filesystem (Id’ never use it on a server, but I’m fine using it on my laptop).

Thanks for your answers @MikeDelta42 and @manfredlotz I’m aware of the importance of these directories and files, I’ll keep them too, just in case.

Sorry, gif, no icon, that was what I wanted to mean :partying_face:

Easy indeed. I’ll check about -x option, it sounds very advanced to my poor knowledge.

Nice suggestion, I’ll take a look at rclone.


Thanks all for your answers and sorry, @tnthomas, for using your thread for my questions.

1 Like