Solved wget_ download webpages based on url

Solved issue

jindam

New Member
Joined
Nov 13, 2023
Messages
6
Reaction score
0
Credits
61
* i am trying to download web pages
from https://www.w3.org/WAI/WCAG22/Techniques/failures/ F1 to F110, total 110 pages
* i tried following commands and none of
them are not downloading all urls
*
Code:
wget -p -k https://www.w3.org/WAI/WCAG22/Techniques/failures/{F1,...F110}
wget -p -k -A '*[F1]' https://www.w3.org/WAI/WCAG22/Techniques/failures/
wget -p -k -A '*F*' https://www.w3.org/WAI/WCAG22/Techniques/failures/
* please provide some other way.....
 
Last edited:


You're on the right track.... this below worked for me. The "brace expansion" only works on the integers and can't use the "F" as part of the expansion.
Code:
wget -p -k https://www.w3.org/WAI/WCAG22/Techniques/failures/F{1..110}

This creates a folder called www.w3.org inside the folder you launch the command from, and it stores the files in a rather strange arrangement (to me). Dig down into this folder to find the index files, F1 thru F110, but note that they are not named with a .html extension... so you may have to force your browser to open them. I don't think I would change this folder structure or file names because it's likely to cause links not to work to other pages and to html resources, like css files and java scripts.

Here's where to look in your downloaded folder:
<folder-you-start-from>/www.w3.org/WAI/WCAG22/Techniques/failures/F1
<folder-you-start-from>/www.w3.org/WAI/WCAG22/Techniques/failures/F2
<folder-you-start-from>/www.w3.org/WAI/WCAG22/Techniques/failures/F3
(and so on)

However, if you use wget without the -p and -k options, like shown below, then it will only download each index file, F1 thru F110, and it will place them loose inside the folder you launch the command from. This does not download the css, java scripts, or anything else. You should probably start from a "test" folder if you want to run it this way so not make a big mess in your home folder, or desktop, or wherever you launch the command from.
Code:
wget https://www.w3.org/WAI/WCAG22/Techniques/failures/F{1..110}
 
Last edited:

Staff online

Members online


Latest posts

Top