Hi, I am trying to find a way to scan all my PDFs and move corrupted PDF to another folder (e.g. /home/limo/corruptedpdf).
I want to move them just to double check if there is an important file, or a file with simple error that is still readable.
I couldn’t find a way to do them in bulk, command line OK with me.
I tried “linux_czkawka_gui.AppImage” which has an option to find corrupted files but it did not find them while I still have some PDFs I can’t open!
Thank you.
You can use pdfinfo to check your files and try the answer to this question, just adapt it to move the file to other folder instead of the echo command.
find . -iname '*.pdf' | while read -r f
do
if pdftotext "$f" - &> /dev/null; then
echo "$f" was ok;
else
mv "$f" "$f.broken";
echo "$f" is broken;
fi;
done
Renamed to findmypdf.sh and made it executable.
Tried changing “mv “$f” “$f.broken”;” to mv “$f” “$/home/limo/corruptedpf/”; and “/home/limo/corruptedpf/corrupted.txt”
I don’t really know.
Some were on Google drive,pCloud, koofr, some were on an old external drive, some were accidentally deleted and recovered Many goes back to 2000 and after 2000.
The strange thing I noticed all have date 27/6/2022.
The newer files are ok (downloaded directly to laptop), some old files are ok, most corrupted files are the oldest.
It seems my scripts didn’t work because files were renamed “xyz.pdf.broken” so it did not find xyz.pdf.
You just reopened for me the “programming” world again. (Last serious done was in 1992/1993 and a little in 1998 when I was trying to learn M$Access programming and made a “sort of” office automation program - “learning by doing”)
UPDATE:
Learning by doing:
for f in *.pdf; do
if ! pdfinfo "$f" &> /dev/null; then
cp /home/limo/corrupted/"$f" /home/limo/corrupted/broken/"$f"
fi
done