I was wondering if anyone here have some elaborate way how they name their files (documents, movies, music, pictures, …). Since linux file systems are quite mature, can handle utf-8 characters and basicaly only forbidden character in file name is slash /
then do you even care about it?
Do you always rename files you download from the internet?
Do you use suffixes (.sh
for shell scripts, .txt
for text files, …)?
Do you care about lowercase and uppercase?
Do you replace whitespace or characters with diacritics?
Do you name your files in english or you native language?
I for example do not like whitespace, diacritics and uppercase characters in the filenames since it is a little inconvinient when I type the file name in the terminal.
If anyone is interested I made a script to recursively rename everything in one directory. It’s good for my movie collection, not so much for a source code where all files relate to each other and renaming them causes all sort of fun.
Script
#!/bin/bash
# defaults
loc="$(pwd)"
sep='_'
silent=0
utf8=0
except=')('
lowercase=0
uppercase=0
titlecase=0
serialnum=0
usage() {
echo "The script is used to standardise file names - without whitespace and diacritic.
It recursively goes through every file and directory name in the current directory
Usage: $0 [-hd:e:c:S:su]
-h ........... this help
-d <val> ..... a directory on which the conversion should be done (default is current directory)
-e <val> ..... exception characters - do not replace them in the file name - default: ${except}
-c <val> ..... character case selection (l:lowercase, u:uppercase, t:titlecase, ts:titlecase with sXXeYY numbering, n:no-change) - default: n
-S <val> ..... separator that replaces whitespaces and other symbols - default: ${sep} if you want '.' write '\.'; don't use $^/\0
-s ........... silent
-u ........... do not convert names from utf8 to ascii (e.g. keep diacritic)
" 1>&2; exit "$1";
}
while getopts "hd:e:c:S:su" arg; do
case "${arg}" in
h)
usage 0
;;
d)
loc="${OPTARG}"
;;
e)
except="${OPTARG}"
;;
c)
chcase="${OPTARG}"
if [[ "${chcase}" == "l" ]]; then
lowercase=1
uppercase=0
titlecase=0
serialnum=0
elif [[ "${chcase}" == "u" ]]; then
lowercase=0
uppercase=1
titlecase=0
serialnum=0
elif [[ "${chcase}" == "t" ]]; then
lowercase=1
uppercase=0
titlecase=1
serialnum=0
elif [[ "${chcase}" == "ts" ]]; then
lowercase=1
uppercase=0
titlecase=1
serialnum=1
elif [[ "${chcase}" == "n" ]]; then
lowercase=0
uppercase=0
titlecase=0
serialnum=0
else
echo -e "Wrong argument ${chcase} for -c flag\n"
usage 2
fi
;;
S)
sep="${OPTARG}"
;;
s)
silent=1
;;
u)
utf8=1
;;
*)
usage 1
;;
esac
done
# fix for filenames with trailing whitespace
IFS=''
# Controlling a loop with bash read command by redirecting STDOUT as a STDIN to while loop
# find will not truncate filenames containing spaces
find "${loc}" |\
awk '{print length, $0}' |\
sort -fsnr |\
cut -d " " -f2- |\
while read file; do
if [[ "${file}" != "${loc}" ]]; then # do not rename base directory from where is the script run
# print which file is currently processed
[[ silent -eq 0 ]] && echo "\"${file}\""
# get some basic names for further processing
file_name="${file##*/}"
file_dir=${file%"${file_name}"*}
if [[ -d "${file}" ]]; then # directory do not have an extension with dot
base_name="${file_name}"
else
base_name="${file_name%.*}"
fi
new_name="${base_name}" # temporary working name of the file
extension=${file_name##*"${base_name}"} # extension with dot included
# convert to ascii-only characters
if [[ utf8 -eq 0 ]]; then
new_name=$(iconv -f utf8 -t ascii//TRANSLIT <<< "${new_name}")
extension=$(iconv -f utf8 -t ascii//TRANSLIT <<< "${extension}")
fi
# extension is always lowercase
extension=$(perl -CSD -Mutf8 -ne "s,(\pL+),\L\1,gu; print" <<< "${extension}")
# remove ' character without replacement
new_name=$(perl -CSD -Mutf8 -ne "s,((?![\0${except}])')+,,gu; print" <<< "${new_name}")
# convert file name to lowercase characters
[[ $lowercase -eq 1 ]] && new_name=$(perl -CSD -Mutf8 -ne "s,(\pL+),\L\1,gu; print" <<< "${new_name}")
# convert file name to title case where important words are uppercase and words like the/a/in/on are lowercase
[[ $titlecase -eq 1 ]] && new_name=$(perl -CSD -Mutf8 -ne "s,(^|\pN[^\pL\pN]|[^\pL\pN](?!(?:\
|the|a|an|on|in|at|of|from|to|since|with|for|as|by|vs|\
)(?:[^\pL\pN]|$)))(\pL+),\1\u\2,gu; print" <<< "${new_name}")
# special case for series sXXeYY formating
[[ $serialnum -eq 1 ]] && new_name=$(perl -CSD -Mutf8 -ne "s,(?:[sS](\pN{1\,2})[^\pL\pN]{0\,3}[eE](\pN{1\,2})|(?<!\pN)(\pN{1\,2})[xX-](\pN{1\,2})),s\1\3e\2\4,gu; \
s,s(\pN{1})e(\pN+),s0\1e\2,gu; \
print" <<< "${new_name}")
# convert file name to uppercase characters
[[ $uppercase -eq 1 ]] && new_name=$(perl -CSD -Mutf8 -ne "s,(\pL+),\U\1,gu; print" <<< "${new_name}")
# replace invalid characters with $sep, remove multiple $sep characters in a row; remove leading and trailing $sep character
new_name=$(perl -CSD -Mutf8 -ne "s,((?![\0${except}])[^\pL\pN])+,${sep},gu; s,(\A${sep}|${sep}\Z),,gu; print" <<< "${new_name}")
if [[ "${except}" != '' ]]; then
new_name=$(perl -CSD -Mutf8 -ne "s,(?:${sep}|\A)+([${except}])(?:${sep}|\Z)+,\1,gu; print" <<< "${new_name}")
new_name=$(perl -CSD -Mutf8 -ne "s,([${except}])(?:${sep})+([${except}]),\1\2,gu; print" <<< "${new_name}")
fi
# add $sep to the end of the base_name to prevent collision and file overwrites if the new name produces duplicity
while [[ -e "${file_dir}${new_name}${extension}" && "${file_dir}${new_name}${extension}" != "${file}" ]]; do
[[ silent -eq 0 ]] && echo "colision - new name: ${new_name}"
new_name="${new_name}${sep}"
done
# move file to the new destination - a.k.a. rename
if [[ "${file}" != "${file_dir}${new_name}${extension}" ]]; then
mv --strip-trailing-slashes "${file}" "${file_dir}${new_name}${extension}"
fi
fi
done
# print possible collision names
if [[ silent -eq 0 ]]; then
echo -e "\npossible collisions:"
find "${loc}" | grep -Eie "(_|_\..*)$" | sort -Vf
fi