Looking for a tool or script to scan two folders and list subdirectories with similar names

Celty · March 24, 2022, 10:10am

I want to compare subdirectories of two folders to find directories with a similar name. I have looked around for a tool or script to do this, but most are tailored to matching identical names whereas these folders have some parts of their name which differ. An example:

FOLDER A
    BLUE IS THE WARMEST COLOR (2013)
    WILD STRAWBERRIES (1957)
    
FOLDER B
    Blue Is the Warmest Color (2013) (French 2160p AAC 5.1)
    Wild Strawberries (1957) (1080p Swedish AAC 1.0)

With the exception of the casing, the first part of the title of each subdirectory is always identical, but those in folder B have added properties at the end. Since these differ from directory to directory, using something like Thunar’s remove characters, find and replace, or insert overwrite features do not work.

Is there a tool on Linux to do this? Perhaps matching the first x characters and printing a list of matches?

In the end I want to delete matching titles from folder A, but I’d be content using some manual intervention to do this.

keybreak · March 24, 2022, 10:18am

meld

Directory comparison would do?

Celty · March 24, 2022, 10:27am

Tried this earlier, unless I’ve missed something it doesn’t show folders which aren’t 100% identical. It does list all folders of each directory, but so does my file manager.

keybreak · March 24, 2022, 10:30am

Oh ok i see…Problem is that meld have option only to ignore the case of files, but not dirs, so in case of dirs they all just treated as “New” becaue they’re “too different”

Celty · March 24, 2022, 10:42am

An alternative route, if this is perhaps easier, could be a tool or script to rename folders to the name of the file inside, e.g if the structure is:

Blue Is the Warmest Color (2013) (French 2160p AAC 5.1) [folder]
    Blue Is the Warmest Color (2013).mkv [file]
Wild Strawberries (1957) (1080p Swedish AAC 1.0) [folder]
    Wild Strawberries (1957).mkv [file]

It would become:

Blue Is the Warmest Color (2013)
    Blue Is the Warmest Color (2013).mkv
Wild Strawberries (1957)
    Wild Strawberries (1957).mkv

I could then compare the folder names using the various tools available, like Meld, since their names would now be identical.

I don’t know if this is easier or possible at all, though.

keybreak · March 24, 2022, 10:50am

Probably easier unless i miss some other bloated tool, probably @Kresimir could help with that, if he has some scripting time

Kresimir · March 24, 2022, 10:58am

I don’t understand the problem, but you can use fzf to find similar but not quite identical filenames.

dalto · March 24, 2022, 11:05am

That would be pretty easy to do with a script. You would have to careful to only ever have one file in the folder though or is could get renamed to something you don’t expect.

You would also need a strategy for handling subfolders.

Celty · March 24, 2022, 12:19pm

There are indeed no other files, each subfolder has 1 mkv file. No further subdirectories or other files.

dalto · March 24, 2022, 12:25pm

In that case it is trivial. Just loop over all the folders and rename them to whatever the first thing you find inside them is.

Celty · March 24, 2022, 12:27pm

I have two directories containing films, I know there are 100+ duplicates between these directories which I want to identify and remove, however, the naming scheme differs between the two folders. The files and subdirectories in in these two folders all contain the title of the film, but those in folder B also have properties added to their naming scheme, like resolution, audio etc.

I could look through them manually, ofcourse, and compare the name of each folder, but it’s ~400 folders and a lot of work.

Example picture of a duplicate:
2022-03-24-132621_348x181_scrot

dalto · March 24, 2022, 12:30pm

An alternative approach would be to not download the same film more than once.

As a side note, how do you decide which one to keep?

Kresimir · March 24, 2022, 12:31pm

Take the smaller of the two directories and create a list of names.

Then use fzf to find the similar names in the other directory, for each item on the list.

If you find a duplicate, delete it in one of the two directories. Then merge everything.

You can do it completely manually, which might be a bit tedious, or you can automatise one or more steps. I wouldn’t go with full automation on this, because you only need to do this once, and you’re likely going to spend more time developing the script than doing it manually. However, parts of it could easily be automated.

Celty · March 24, 2022, 12:58pm

It was less work to acquire these in batch, even with the work required to now find duplicates. Most of the duplicates will be identical releases, but since I’m a fool with a borderline ocd quirk of needing to rename the films and remove the release tags, it seems I now have to do a lot of extra manual work comparing them.

I keep the ones which are of acceptable standard for me. I have a decent top level folder structure, which makes it obvious where it came from, so in this case I know I want to keep the ones in folder B.

If I did not know where a film came from I would run them through Media info.

freebird54 · March 25, 2022, 3:39am

Rather than write a script I would first try a look at the directories with meld. It will fire up with a dual-pane display of both, and you can easily delete (or move, or whatever) from either as desired - that drops your hand workload to perhaps 100? It could be just me though