Do you have an elaborate scheme how you name your files?

I was wondering if anyone here have some elaborate way how they name their files (documents, movies, music, pictures, …). Since linux file systems are quite mature, can handle utf-8 characters and basicaly only forbidden character in file name is slash / then do you even care about it?
Do you always rename files you download from the internet?
Do you use suffixes (.sh for shell scripts, .txt for text files, …)?
Do you care about lowercase and uppercase?
Do you replace whitespace or characters with diacritics?
Do you name your files in english or you native language?

I for example do not like whitespace, diacritics and uppercase characters in the filenames since it is a little inconvinient when I type the file name in the terminal.
If anyone is interested I made a script to recursively rename everything in one directory. It’s good for my movie collection, not so much for a source code where all files relate to each other and renaming them causes all sort of fun.

Script
#!/bin/bash

# defaults
loc="$(pwd)"
sep='_'
silent=0
utf8=0
except=')('
lowercase=0
uppercase=0
titlecase=0
serialnum=0

usage() {
    echo "The script is used to standardise file names - without whitespace and diacritic.
It recursively goes through every file and directory name in the current directory

Usage: $0 [-hd:e:c:S:su]
    -h ........... this help
    -d <val> ..... a directory on which the conversion should be done (default is current directory)
    -e <val> ..... exception characters - do not replace them in the file name - default: ${except}
    -c <val> ..... character case selection (l:lowercase, u:uppercase, t:titlecase, ts:titlecase with sXXeYY numbering, n:no-change) - default: n
    -S <val> ..... separator that replaces whitespaces and other symbols - default: ${sep} if you want '.' write '\.'; don't use $^/\0
    -s ........... silent
    -u ........... do not convert names from utf8 to ascii (e.g. keep diacritic)
    " 1>&2; exit "$1";
}

while getopts "hd:e:c:S:su" arg; do
	case "${arg}" in
		h)
			usage 0
			;;
		d)
			loc="${OPTARG}"
			;;
		e)
			except="${OPTARG}"
			;;
		c)
		    chcase="${OPTARG}"
		    if [[ "${chcase}" == "l" ]]; then
			    lowercase=1
			    uppercase=0
			    titlecase=0
			    serialnum=0
            elif [[ "${chcase}" == "u" ]]; then
                lowercase=0
                uppercase=1
                titlecase=0
                serialnum=0
			elif [[ "${chcase}" == "t" ]]; then
                lowercase=1
                uppercase=0
                titlecase=1
                serialnum=0
            elif [[ "${chcase}" == "ts" ]]; then
                lowercase=1
                uppercase=0
                titlecase=1
                serialnum=1
			elif [[ "${chcase}" == "n" ]]; then
                lowercase=0
                uppercase=0
                titlecase=0
                serialnum=0
			else
			    echo -e "Wrong argument ${chcase} for -c flag\n"
			    usage 2
			fi
			;;
		S)
			sep="${OPTARG}"
			;;
		s)
			silent=1
			;;
		u)
		    utf8=1
		    ;;
		*)
		    usage 1
		    ;;
	esac
done

# fix for filenames with trailing whitespace
IFS=''

# Controlling a loop with bash read command by redirecting STDOUT as a STDIN to while loop
# find will not truncate filenames containing spaces
find "${loc}"               |\
awk '{print length, $0}'    |\
sort -fsnr                  |\
cut -d " " -f2-             |\
while read file; do

    if [[ "${file}" != "${loc}" ]]; then            # do not rename base directory from where is the script run

        # print which file is currently processed
        [[ silent -eq 0 ]] && echo "\"${file}\""

        # get some basic names for further processing
        file_name="${file##*/}"
        file_dir=${file%"${file_name}"*}
        if [[ -d "${file}" ]]; then                     # directory do not have an extension with dot
            base_name="${file_name}"
        else
            base_name="${file_name%.*}"
        fi
        new_name="${base_name}"                         # temporary working name of the file
        extension=${file_name##*"${base_name}"}         # extension with dot included

        # convert to ascii-only characters
        if [[ utf8 -eq 0 ]]; then
            new_name=$(iconv -f utf8 -t ascii//TRANSLIT <<< "${new_name}")
            extension=$(iconv -f utf8 -t ascii//TRANSLIT <<< "${extension}")
        fi

        # extension is always lowercase
        extension=$(perl -CSD -Mutf8 -ne "s,(\pL+),\L\1,gu; print" <<< "${extension}")

        # remove ' character without replacement
        new_name=$(perl -CSD -Mutf8 -ne "s,((?![\0${except}])')+,,gu; print" <<< "${new_name}")

        # convert file name to lowercase characters
        [[ $lowercase -eq 1 ]] && new_name=$(perl -CSD -Mutf8 -ne "s,(\pL+),\L\1,gu; print" <<< "${new_name}")

        # convert file name to title case where important words are uppercase and words like the/a/in/on are lowercase
        [[ $titlecase -eq 1 ]] && new_name=$(perl -CSD -Mutf8 -ne "s,(^|\pN[^\pL\pN]|[^\pL\pN](?!(?:\
                                                                   |the|a|an|on|in|at|of|from|to|since|with|for|as|by|vs|\
                                                                   )(?:[^\pL\pN]|$)))(\pL+),\1\u\2,gu; print" <<< "${new_name}")

        # special case for series sXXeYY formating
        [[ $serialnum -eq 1 ]] && new_name=$(perl -CSD -Mutf8 -ne "s,(?:[sS](\pN{1\,2})[^\pL\pN]{0\,3}[eE](\pN{1\,2})|(?<!\pN)(\pN{1\,2})[xX-](\pN{1\,2})),s\1\3e\2\4,gu; \
                                                                   s,s(\pN{1})e(\pN+),s0\1e\2,gu; \
                                                                   print" <<< "${new_name}")

        # convert file name to uppercase characters
        [[ $uppercase -eq 1 ]] && new_name=$(perl -CSD -Mutf8 -ne "s,(\pL+),\U\1,gu; print" <<< "${new_name}")

        # replace invalid characters with $sep, remove multiple $sep characters in a row; remove leading and trailing $sep character
        new_name=$(perl -CSD -Mutf8 -ne "s,((?![\0${except}])[^\pL\pN])+,${sep},gu; s,(\A${sep}|${sep}\Z),,gu; print" <<< "${new_name}")

        if [[ "${except}" != '' ]]; then
            new_name=$(perl -CSD -Mutf8 -ne "s,(?:${sep}|\A)+([${except}])(?:${sep}|\Z)+,\1,gu; print" <<< "${new_name}")
            new_name=$(perl -CSD -Mutf8 -ne "s,([${except}])(?:${sep})+([${except}]),\1\2,gu; print" <<< "${new_name}")
        fi

        # add $sep to the end of the base_name to prevent collision and file overwrites if the new name produces duplicity
        while [[ -e "${file_dir}${new_name}${extension}" && "${file_dir}${new_name}${extension}" != "${file}" ]]; do
            [[ silent -eq 0 ]] && echo "colision - new name:   ${new_name}"
            new_name="${new_name}${sep}"
        done

        # move file to the new destination - a.k.a. rename
        if [[ "${file}" != "${file_dir}${new_name}${extension}" ]]; then
            mv --strip-trailing-slashes "${file}" "${file_dir}${new_name}${extension}"
        fi
    fi
done

# print possible collision names
if [[ silent -eq 0 ]]; then
    echo -e "\npossible collisions:"
    find "${loc}" | grep -Eie "(_|_\..*)$" | sort -Vf
fi
  • No.
  • Not always just for bash scripts.
  • No.
  • Nothing special. One word.
  • English.

I use .sh for scripts that do not have #! and are not executable, so they have to be run as

sh script.sh

(same thing for .bash and .zsh). If I make the script executable, then I do not end its name with any of those filename extensions.


There is a utility named detox that does what your script does, only better.

You can install it with:

sudo pacman -S detox

ha :grinning_face_with_smiling_eyes:

Looking at detox man page I think I will just stick with my script for the time being. But thanks anyway. I did not know about it (and it is pretty bad name for an internet search :rofl:).

1 Like

Yeah, but who in the current year uses search engines? :slight_smile:

You can find the URL of the source code for detox by running:

pacman -Si detox

and you get this:

And yeah, sorry, but it is… :sweat_smile: Sanitising filenames is a surprisingly difficult problem when you take into consideration the most general case. For a specific use, a small Bash script can be great, but a tool that works in general is quite complicated, and is better done in a language like C.

1 Like