Mirror a Website with WGet

Posted by in Software

TL;DR: if you want to mirror a site where you need to authenticate, you need to do it in two steps.

In the process of testing my app, I needed a mirror of the website I’m using. Since the website has authentication (via a form), the mirroring process has 2 parts.

First part is to log in and get the cookies:

Now, the cookies are saved in the cookies.txt file. Note that you need to specify --keep-session-cookies. Otherwise, the file will be empty.

The second part is to actually perform the mirror:

I’ll explain each flag:

  • -r recursive (d’oh!)
  • -l 2 don’t exaggerate with recursivity!
  • -k convert links to relative
  • -nc don’t re-download things
  • -R... exclude links and files (e.g. don’t download css files)
  • -I... restrict downloading only to some paths
  • -D... download only from this domain

The result is that I get a mirror with all relevant items to me.

Note: Please read the wget manual for limitations on form access.



A little experiment: If you find this post and ad below useful, please check the ad out 🙂