In my attempt to learn more and more (learning by doing), as in another thread Find and Move Corrupted PDF Files my corrupted files problem I just worked out a little script to find those corrupted files. I tried “linux_czkawka_gui.AppImage” which has an option to find corrupted files but it did not find them while I still have some PDFs I can’t open!
I left all “the garbage” in the script (what didn’t work), just for my reference.
You will find it easy to remove whatever hashes to execute “mv” or “rm” commands. The hashed default is to move corrupted files to a newly created folder named according to date and time.
You will need to edit the folder names according to your user name. I have made an alias “checkpdf” to the “checkpdf.sh”
So what it really does currently:
1- Search a specific folder and subfolders for corrupted files.
checkpdf /home/limo/xyz/
2- Search the WHOLE machine
checkpdf
I just noticed the checkpdf without folder path searches some funny folder names I don’t even know what are they (Snapshots/System?)
I would appreciate your opinion if I should keep it as it is now and let it search in snapshots for corrupted files?
I would like to invite you to test and tell me what you think, or how to improve. I would like to here from you to learn a bit more.
I am considering doing the same for other file formats that might be corrupted like office docs, music…
Here is my checkpdf.sh
#!/bin/bash
## This is needed if you want to specify another folder
fldr=$1
## This is needed with above if want to specify a specific folder
cd $fldr
### test globstar and shopt
## to make command run in subfolders
shopt -s globstar
x=$(date +"%d-%m-%y-%h-%m-%s") ## Current date to name a folder
mkdir $x ## Create a folder in selected folder named like "11-08-22-Aug-08-1660212789" to mv broken files to
## original working
### for f in *.pdf ; do
## new loop below
## below worked on ALL folders from root
### for f in /**/*.pdf ; do
### Try the above with a specific starting folder
## This is WORKING >
for f in $fldr/**/*.pdf ; do
## This DID NOT work > for f in *.pdf; do
if ! pdfinfo "$f" &> /dev/null;
then
## I am postponing the mv command below till I can check and check sub ## directories.(DONE NOW), NOW WAITING FOR COMMUNITY TESTING
## mv "$f" /home/limo/Downloads/broken/"$f"
echo "$f" is CORRUPTED
else
echo "$f" is OK
fi
done
## echo "Corrupted PDF in: "$x
dolphin $x
## xdg-open $x (got Error:Corrupted PDF in: 11-08-22-Aug-08-1660213868
##kf.service.services: KApplicationTrader: mimeType "x-scheme-handler/file" not found
## kf.service.services: KApplicationTrader: mimeType "x-scheme-handler/file" not found open the corrupted PDF folder with default file manager.
I did checkpdf now, the following is a sample of the output. It is clear some are reults for files in /home/ and subfolders. But there are some folder names I do not know what are they.
are they snapshots, snapshots of my /home, snapshots of system?
folders and file created by an app I installed? (e.g. LibreOffice?)
is it safe to delete corrupted PDF files in such folders? Some corrupted files at the very bottom are not mine I am sure. Maybe corrupted files that came with the system or with apps I installed?
I hope an experienced user gives feed back.
Thank you.
home/limo/zzz borken 2_3.pdf is OK
/home/limo/zzz borken 2_4.pdf is CORRUPTED
/home/limo/zzz borken 2_5.pdf is CORRUPTED
/proc/10295/cwd/zzz borken 2_1.pdf is OK
/proc/10295/cwd/zzz borken 2_2.pdf is OK
/proc/10295/cwd/zzz borken 2_3.pdf is OK
/proc/10295/cwd/zzz borken 2_4.pdf is CORRUPTED
/proc/10295/cwd/zzz borken 2_5.pdf is CORRUPTED
/proc/10295/task/10295/cwd/zzz borken 2_1.pdf is OK
/proc/10295/task/10295/cwd/zzz borken 2_2.pdf is OK
/proc/10295/task/10295/cwd/zzz borken 2_3.pdf is OK
/proc/10295/task/10295/cwd/zzz borken 2_4.pdf is CORRUPTED
/proc/10295/task/10295/cwd/zzz borken 2_5.pdf is CORRUPTED
/proc/1140/task/1140/cwd/zzz borken 2_2.pdf is OK
/proc/1140/task/1140/cwd/zzz borken 2_3.pdf is OK
/proc/1140/task/1140/cwd/zzz borken 2_4.pdf is CORRUPTED
/proc/785/task/72452/cwd/zzz borken 2_5.pdf is CORRUPTED
/proc/785/task/785/cwd/zzz borken 2_1.pdf is OK
/usr/share/cups/data/secret.pdf is OK
/usr/share/cups/data/standard.pdf is OK
/usr/share/cups/data/topsecret.pdf is OK
/usr/share/cups/data/unclassified.pdf is OK
/usr/share/cups/ipptool/document-a4.pdf is CORRUPTED
/usr/share/cups/ipptool/document-letter.pdf is CORRUPTED
/usr/share/cups/ipptool/testfile.pdf is CORRUPTED
/usr/share/doc/ghostscript/GS9_Color_Management.pdf is OK
/usr/share/doc/graphite2/api/refman.pdf is OK
/usr/share/doc/ijs/ijs_spec.pdf is OK
Snapper snapshots are exactly in the hide read-only directory .snapshots with root permission by default.
TimeShift snapshots are in the read-write directory /run/timeshift/backup/timeshift-btrfs/snapshots when opening TimeShift to automatically mount this path.
I think they are created in /proc/* by checkpdf process when executed and stored in your RAM like cache, not your disk.
You can delete corrupted PDF files in /home only, that is your decision.
Do not delete files in /proc and /usr!
Thanks @Zesko
So, better run it always checkpdf /home/path/name
I will look at modifying the script to run on /home/ only where as I remember does not return these strange folders and delete corrupted files.