Deduplication software for photos

falke · November 25, 2023, 11:36am

Hello,

I’m looking for a deduplication software probably using hash sums to tidy up my pc…

Let me explain, I downloaded all sorts of photos from my smartphone to the pc, using warpinator among others. In short, I guess my photos have been downloaded several times, in different folders. In short, it’s a mess.

I’d like to sort it out once and for all. Do you know of any software that can do the job?

Thanks for your tips.

Maras · November 25, 2023, 12:09pm

Hello friend, try fdupes
https://archlinux.org/packages/?q=fdupes
or rmlint which say it’s faster
https://aur.archlinux.org/packages/rmlint-git

thefrog · November 25, 2023, 12:15pm

if your more comfortable with a gui app try out fslint

https://aur.archlinux.org/packages?O=0&K=fslint

more about the project

falke · November 25, 2023, 12:49pm

thanks,

I’m installing the gui version… Woao, what a compilation…

pebcak · November 25, 2023, 1:42pm

fdupes -rd /path/to directory does the job.

One gets interactively to decide which file to keep and delete all others.


-r --recurse            for every directory given follow subdirectories
                         encountered within
 -d --delete             prompt user for files to preserve and delete all
                         others; important: under particular circumstances,
                         data may be lost when using this option together
                         with -s or --symlinks, or when specifying a
                         particular directory more than once; refer to the
                         fdupes documentation for additional information

android · November 25, 2023, 5:37pm

I like using https://github.com/qarmin/czkawka
It has nice GUI: yay -S czkawka-gui-bin

falke · November 25, 2023, 7:02pm

really :smile ?

I just try to copy a photo from a directory to another, the file has not been detected as a dupe…

anon93652015 · November 25, 2023, 7:09pm

Another GUI app you can use is XnViewMP. It has a dupes finder that can check not just the name, but also the content of the images.

android · November 26, 2023, 5:00am

What can I say? It works perfectly for me… this sounds like an user error.

Add the path(s) where you want to look for duplicates
Select “Similar Images”
Play around with different algorithms
Press “Search”
…
Profit

EDIT: And if your problem is literally multiple copies of the same exact file, you could also try the “Duplicate Files” option.

falke · November 26, 2023, 12:17pm

hi, thank you, an thank you at all. I will have a look at all this.

I actually have a look at rmlint.

I’d just like to make sure that this software uses checksums and not just the name. Is this the case?

Also, the duplicates could well be in different directories, e.g. :

under /home/falke and under /data

But from what I understand of how exple works, it will search in /data and its sub-directories, won’t it ?

Maras · November 26, 2023, 1:16pm

You’re welcome friend,
Yes from here you can see:

Duplicate: A file that matches the original. Note that depending on rmlint settings, “match” may mean an exact match or just that the files have matching hash values.

For different locations a working example is this:

rmlint --types=duplicates --must-match-tagged --keep-all-tagged <path1> // <path2>

This will find files in path1 which have duplicates (same data content) in path2. It will create a shell script which, if run, will remove the duplicates under path1, leaving only the unique files.

source

falke · November 26, 2023, 2:57pm

hello,

indeed rmlink does a great job
I accidentally launched it without doing a dry run first.

The result is excellent work on the photos, but scratching in my personal documents for modifications that I should have validated one by one…

Fortunately, a good backup of my /home and /data by Borg-backup enabled me to pinpoint the differences. And yes, Borg allows you to make a diff between two successive backups…

Now it’s up to me to restore a number of things.

Too bad rmlint doesn’t keep a log for each launch…

I’ll have to practice using it.

Maras · November 26, 2023, 3:14pm

It has a report in .json file.

falke · November 26, 2023, 4:07pm

okay bad deletes restored thanks borg… my savior…

I will inform myself who to use these json files…

falke · November 26, 2023, 7:19pm

whoaou , top !

I have this bad habit of putting everything on my desk, and only putting a part of it from time to time on my data.

Apparently all duplicates are detected, even if the file has also been partially renamed.

drunkenvicar · November 26, 2023, 11:37pm

only exists as a Canonical as I can see.

thefrog · November 26, 2023, 11:39pm

its in the AUR

yay -S fslint fslint-gui

drunkenvicar · November 26, 2023, 11:42pm

no sooner do I hit reply than I found the gui zip version and aur. perhaps I should read whole thread first! thanks

thanks for asking this. my own personal slobbery has cursed my desktop pc and my grsync backups…

system · November 28, 2023, 11:43pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.