Remove Large Files in Git

Posted by in Software, Tutorial

TL;DR: BFG is your friend. java -jar bfg.jar -b 50M myrepo-bfg.git for example

The other day I’ve committed a rather large file by mistake (generated movie of the git commit history). This was a bit annoying because all other developers would have suddenly to check out a 1G file :( with no LFS support…

As I took the opportunity to clean up the repository also, I’ve noticed the size of the cloned reop was rather huge! So, I was presented with several options

  1. Create a new repo with the cleaned version (i.e. lose the history)
  2. Try to migrate selectively commits
  3. Remove the large file (and other larg-ish files)

Initially, I wanted to start with the second variant, but making the cuts and patching things around was difficult (well, active development, a bunch of commits in the meantime…). So, the third option looked promising. For this, you need the excellent BFG tool.

My workflow was something like this:

  1. git clone --mirror http://.../myrepo.git
  2. git clone --mirror myrepo.git myrepo-bfg.git
  3. java -jar bfg.jar -b 50M myrepo-bfg.git
  4. git reflog expire --expire=now --all && git gc --prune=now --aggressive
  5. git commit origin

i.e.,

  1. Clone the original repo as a mirror
  2. Make a copy (in case I mess up)
  3. Run bfg to remove all large files
  4. Expire the reflog and do a garbage collection to prune the tree
  5. Commit back

This way, I got rid of all big files. Cool!

Important note: This process will permanently delete stuff from git, as if it was never there. USE IT WITH CARE (aka I’m not responsible if you empty up your company’s code base!)

HTH,


A little experiment: If you find this post and ad below useful, please check the ad out :-)