Looking for a tool or script to scan two folders and list subdirectories with similar names

I want to compare subdirectories of two folders to find directories with a similar name. I have looked around for a tool or script to do this, but most are tailored to matching identical names whereas these folders have some parts of their name which differ. An example:

FOLDER A
    BLUE IS THE WARMEST COLOR (2013)
    WILD STRAWBERRIES (1957)
    
FOLDER B
    Blue Is the Warmest Color (2013) (French 2160p AAC 5.1)
    Wild Strawberries (1957) (1080p Swedish AAC 1.0)

With the exception of the casing, the first part of the title of each subdirectory is always identical, but those in folder B have added properties at the end. Since these differ from directory to directory, using something like Thunar’s remove characters, find and replace, or insert overwrite features do not work.

Is there a tool on Linux to do this? Perhaps matching the first x characters and printing a list of matches?

In the end I want to delete matching titles from folder A, but I’d be content using some manual intervention to do this.

meld

Directory comparison would do?

1 Like

Tried this earlier, unless I’ve missed something it doesn’t show folders which aren’t 100% identical. It does list all folders of each directory, but so does my file manager.

Oh ok i see…Problem is that meld have option only to ignore the case of files, but not dirs, so in case of dirs they all just treated as “New” becaue they’re “too different” :thinking:

An alternative route, if this is perhaps easier, could be a tool or script to rename folders to the name of the file inside, e.g if the structure is:

Blue Is the Warmest Color (2013) (French 2160p AAC 5.1) [folder]
    Blue Is the Warmest Color (2013).mkv [file]
Wild Strawberries (1957) (1080p Swedish AAC 1.0) [folder]
    Wild Strawberries (1957).mkv [file]

It would become:

Blue Is the Warmest Color (2013)
    Blue Is the Warmest Color (2013).mkv
Wild Strawberries (1957)
    Wild Strawberries (1957).mkv

I could then compare the folder names using the various tools available, like Meld, since their names would now be identical.

I don’t know if this is easier or possible at all, though.

Probably easier unless i miss some other bloated tool, probably @Kresimir could help with that, if he has some scripting time :frog:

I don’t understand the problem, but you can use fzf to find similar but not quite identical filenames.

That would be pretty easy to do with a script. You would have to careful to only ever have one file in the folder though or is could get renamed to something you don’t expect.

You would also need a strategy for handling subfolders.

There are indeed no other files, each subfolder has 1 mkv file. No further subdirectories or other files.

In that case it is trivial. Just loop over all the folders and rename them to whatever the first thing you find inside them is.

I have two directories containing films, I know there are 100+ duplicates between these directories which I want to identify and remove, however, the naming scheme differs between the two folders. The files and subdirectories in in these two folders all contain the title of the film, but those in folder B also have properties added to their naming scheme, like resolution, audio etc.

I could look through them manually, ofcourse, and compare the name of each folder, but it’s ~400 folders and a lot of work.

Example picture of a duplicate:
2022-03-24-132621_348x181_scrot

An alternative approach would be to not download the same film more than once. :rofl:

As a side note, how do you decide which one to keep?

1 Like

Take the smaller of the two directories and create a list of names.

Then use fzf to find the similar names in the other directory, for each item on the list.

If you find a duplicate, delete it in one of the two directories. Then merge everything.

You can do it completely manually, which might be a bit tedious, or you can automatise one or more steps. I wouldn’t go with full automation on this, because you only need to do this once, and you’re likely going to spend more time developing the script than doing it manually. However, parts of it could easily be automated.

1 Like

It was less work to acquire these in batch, even with the work required to now find duplicates. Most of the duplicates will be identical releases, but since I’m a fool with a borderline ocd quirk of needing to rename the films and remove the release tags, it seems I now have to do a lot of extra manual work comparing them.

I keep the ones which are of acceptable standard for me. I have a decent top level folder structure, which makes it obvious where it came from, so in this case I know I want to keep the ones in folder B.

If I did not know where a film came from I would run them through Media info.

Rather than write a script I would first try a look at the directories with meld. It will fire up with a dual-pane display of both, and you can easily delete (or move, or whatever) from either as desired - that drops your hand workload to perhaps 100? It could be just me though :grin: