TL;DR: if you want to mirror a site where you need to authenticate, you need to do it in two steps.
In the process of testing my app, I needed a mirror of the website I’m using. Since the website has authentication (via a form), the mirroring process has 2 parts.
First part is to log in and get the cookies:
wget \ --keep-session-cookies \ --save-cookies cookies.txt \ --post-data "login[LOGIN]&password=[PASSWORD]&module=admission&controller=login&action=logindo&auth_act=1" \ https://europa.eu/epso/application/base/index.cfm
Now, the cookies are saved in the cookies.txt file. Note that you need to specify
--keep-session-cookies. Otherwise, the file will be empty.
The second part is to actually perform the mirror:
wget --load-cookies cookies.txt \ -r \ -l 2 \ -k \ -nc \ -R css,js,gif -R "*lang=*" -R "*srln=DE" -R "*srln=FR" \ -I /epso/application/account,/epso/application/cv_new \ -Deuropa.eu \ https://europa.eu/epso/application/cv_new/index.cfm
I’ll explain each flag:
-l 2don’t exaggerate with recursivity!
-kconvert links to relative
-ncdon’t re-download things
-R...exclude links and files (e.g. don’t download css files)
-I...restrict downloading only to some paths
-D...download only from this domain
The result is that I get a mirror with all relevant items to me.
Note: Please read the wget manual for limitations on form access.