Find and Move Corrupted PDF Files

Hi, I am trying to find a way to scan all my PDFs and move corrupted PDF to another folder (e.g. /home/limo/corruptedpdf).

I want to move them just to double check if there is an important file, or a file with simple error that is still readable.

I couldn’t find a way to do them in bulk, command line OK with me.
I tried “linux_czkawka_gui.AppImage” which has an option to find corrupted files but it did not find them while I still have some PDFs I can’t open!
Thank you.

You can use pdfinfo to check your files and try the answer to this question, just adapt it to move the file to other folder instead of the echo command.

ELI5 please!
I copy/paste

find . -iname '*.pdf' | while read -r f
  do
    if pdftotext "$f" - &> /dev/null; then 
        echo "$f" was ok;   
    else
        mv "$f" "$f.broken";
        echo "$f" is broken;   
    fi; 
done

Renamed to findmypdf.sh and made it executable.

Tried changing “mv “$f” “$f.broken”;” to mv “$f” “$/home/limo/corruptedpf/”; and “/home/limo/corruptedpf/corrupted.txt”

Not working!

But I could export list of files to a text file:

find . -name *.pdf > /home/limo/corruptedpf/corrupted.txt

The accepted answer for that question seems good to me, just use the right syntax to move corrupted files:

for f in *.pdf; do
  if ! pdfinfo "$f" &> /dev/null; then
    mv /current/location/"$f" /new/location
  fi
done

Remember that you can help by executing mv --help as well as with man mv. If you want to change name of files, you can use the rename command.

3 Likes

Why are they corrupted? :thinking:

1 Like

I don’t really know.
Some were on Google drive,pCloud, koofr, some were on an old external drive, some were accidentally deleted and recovered :rofl: Many goes back to 2000 and after 2000.
The strange thing I noticed all have date 27/6/2022.
The newer files are ok (downloaded directly to laptop), some old files are ok, most corrupted files are the oldest.

#!/bin/bash
for f in *.pdf; do
  if ! pdfinfo "$f" &> /dev/null; then
    mv "$f" /home/limo/corruptedpf/
  fi
done

Nothing happens! Target folder still the same!
UPDATE:
Manually reviewing the folder I found some files named “filexyz.pdf.broken

Seems to be result of

mv "$f" "$f.broken";

in my previous post

I will try again moving them to another folder (xyz.pdf.broken to /home/limo/broken/) just to practice Bash Scripting.

It seems my scripts didn’t work because files were renamed “xyz.pdf.broken” so it did not find xyz.pdf.

You just reopened for me the “programming” world again. (Last serious done was in 1992/1993 and a little in 1998 when I was trying to learn M$Access programming and made a “sort of” office automation program - “learning by doing”)

UPDATE:
Learning by doing:

for f in *.pdf; do
   if ! pdfinfo "$f" &> /dev/null; then
    cp  /home/limo/corrupted/"$f" /home/limo/corrupted/broken/"$f"
   fi
done

Worked after a few modifications and trials :partying_face:

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.