Mirror a Website with WGet
TL;DR: if you want to mirror a site where you need to authenticate, you need to do it in two steps.
In the process of testing my app, I needed a mirror of the website I’m using. Since the website has authentication (via a form), the mirroring process has 2 parts.
First part is to log in and get the cookies:
--save-cookies cookies.txt \
--post-data "login[LOGIN]&password=[PASSWORD]&module=admission&controller=login&action=logindo&auth_act=1" \
Now, the cookies are saved in the cookies.txt file. Note that you need to specify
--keep-session-cookies. Otherwise, the file will be empty.
The second part is to actually perform the mirror:
wget --load-cookies cookies.txt \
-l 2 \
-R css,js,gif -R "*lang=*" -R "*srln=DE" -R "*srln=FR" \
-I /epso/application/account,/epso/application/cv_new \
I’ll explain each flag:
-l 2don’t exaggerate with recursivity!
-kconvert links to relative
-ncdon’t re-download things
-R...exclude links and files (e.g. don’t download css files)
-I...restrict downloading only to some paths
-D...download only from this domain
The result is that I get a mirror with all relevant items to me.
Note: Please read the wget manual for limitations on form access.
A little experiment: If you find this post and ad below useful, please check the ad out :-)