Is there an app to download all PDFs from a page?

I found a site that has a lot of my favorite magazines in PDF format. Seeing as they are a lot, doing it manually would take ages. Anyone know of an app that will allow me to download them all at once?

Perhaps this Firefox’ addon?

Question 1. How to use this extension?

Answer Say for example you have opened a web page with many pdf resources. The most basic workflow it to open the extension popup (via the browser toolbar icon) and click there the Load page links button. This usually fills the resources list with all sort of links, not only the desired pdfs. So the next step is to filter this list, such that only desired pdfs stays here. Write pdf in the extensions text field. Now only pdf type resources show in the list. If the list is still too large (you want only smaller subset) you may continue the filtering process indicating some relevant terms in the next text filter field. Finally, you check desired items individually or all at once and start immediate downloading using the corresponding button (on right bottom corner).
The downloading items are now visible in the downloads tab of the popup; you can manage this list in the usual manner (pause, resume, open or remove individual items etc)

3 Likes

Thanks, I’ll try it now!

1 Like

It’s downloading HTML links but not the PDFs.

1 Like

This is the page If it helps:
https://archive.org/details/ZX-computing-magazine?&sort=date

I use wget for that purpose. I just copy and paste manually all the links into a file and then run wget on every line in a script.

that would take very long for these magazines If I were to do that.

How many files are we talking about? Can you give a link to the website?

Are they all linked in the same page? If so, just copy the HTML and, using regular expressions, extract all PDF urls from it. Once you have only a list of URLs to download, you can wget them easily and automatically.

He already did…

I was just looking at it, they don’t appear to be. You need to drill into each page then the pdf link is on that detail page.

2 Likes

It looks like they all follow a naming convention. You could probably write some kind of script that extracted the links from the page and then tried to download the file from the download server by modifying the URLs following a formula.

I am not sure that would be less work than downloading them manually though.

Yeah, it looks like I’m in for manual downloading which will take hours.
Thanks.

Yeah, that’s going to be a bit more difficult.

It seems you can generate a list of files using

https://archive.org/advancedsearch.php

I haven’t figured out how, yet.

If we can get it to a list of URLs, then it’s easy.

3 Likes

I think I figured it out, just testing a bit…

You should have a solution in a few minutes, I hope.

EDIT:

Here it is.

This command downloads one issue:

wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1982-08

Here is a list of all commands for all issues. You can run them one by one, or, if you feel brave, just save them as a script and run them all :slight_smile:

wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1982-08
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1987-01
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1985-04
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1984-06
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1986-08
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1984-08
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1986-11
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1982-Sum
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1985-12
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1986-07
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1984-04
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1983-02
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1983-04
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1986-02
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1983-12
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1987-03
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1985-02
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1986-06
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1986-10
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1987-02
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1985-06
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1985-10
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1987-04
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1985-08
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1987-06
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1984-10
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1987-05
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1986-05
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1986-09
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1983-06
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1983-08
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1984-02
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1983-10
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1986-04
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1982-10
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1986-12
wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1984-12

There are only 37 files, so it shouldn’t take too long.

How I generated the list...

I went to https://archive.org/advancedsearch.php, and under “Advanced Search returning JSON, XML, and more” in the “Query” field I entered:

collection:ZX-computing-magazine

I increased the number of results per page to 500 (in retrospective, that was unnecessary), and selected CSV as the output file. This file contained:

"identifier"
"ZX-computing-1982-08"
"ZX-computing-1987-01"
"ZX-computing-1984-08"
"ZX-computing-1986-11"
"ZX-computing-1982-Sum"
"ZX-computing-1985-12"
"ZX-computing-1986-07"
"ZX-computing-1985-04"
"ZX-computing-1984-06"
"ZX-computing-1986-08"
"ZX-computing-1984-04"
"ZX-computing-1983-02"
"ZX-computing-1983-04"
"ZX-computing-1986-02"
"ZX-computing-1986-10"
"ZX-computing-1983-12"
"ZX-computing-1987-03"
"ZX-computing-1985-02"
"ZX-computing-1986-06"
"ZX-computing-1987-02"
"ZX-computing-1983-06"
"ZX-computing-1983-08"
"ZX-computing-1985-06"
"ZX-computing-1985-10"
"ZX-computing-1987-04"
"ZX-computing-1985-08"
"ZX-computing-1987-06"
"ZX-computing-1984-10"
"ZX-computing-1987-05"
"ZX-computing-1986-05"
"ZX-computing-1986-09"
"ZX-computing-1986-12"
"ZX-computing-1984-12"
"ZX-computing-1984-02"
"ZX-computing-1983-10"
"ZX-computing-1986-04"
"ZX-computing-1982-10"

Then, with some text editing magic, I removed the first line, the quotes and pasted the wget command. A very useful feature for that is block selection in Kate.

7 Likes

This is most interesting, thank you. :slight_smile: :smiley:

1 Like

That is ingenious, thanks.

1 Like

Although on downloaded, the second one returns with this:

bash: n: command not found

The second what?

The second command in the list?

This one?

wget --no-directories --content-disposition -e robots=off -A.pdf -r http://archive.org/download/ZX-computing-1987-01

It works for me…

This should not happen. If you look closely, you’ll see that I am not using the command n anywhere in that script. It’s just 37 times wget, with a different final option. :man_shrugging:t3:

1 Like

It’s working fine, bad copy and paste due to slight fever.

1 Like

Get well soon. Stay hydrated! :frog:

3 Likes