-
Notifications
You must be signed in to change notification settings - Fork 284
Description
Right now, the repo size is unnecessarily large.
git filter-repo --analyze:
| Unpacked | Packed |
|---|---|
| 844.83MB | 605.33MB |
Folder sizes:
| .git | Worktree | Combined |
|---|---|---|
| 548MB | 318MB | 866MB |
From what I could find, this is caused by large PDF files and mostly by large jpgs/pngs/webps.
Solution
git filter-repo should be used to cleanup either only old deleted unused blobs with a bash script, or just git filter-repo --strip-blobs-bigger-than 5M to get rid of everything that's bigger than 5M including currently active files/blobs, though stuff in public/ should be moved to the CDN before that, and its riskier, but the payoff will definitely be worth it with the size reduction, because I don't think anyone wants to download a ~548mb git repo just to edit a single file.
This will most likely need someone with force push perms though and other people would need to re-clone the repo again, if I'm correct.
I tested this bash script, and got the .git down to 386M by only removing unused files:
# get every filename ever in history from %(rest)
git rev-list --objects --all | \
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | \
awk '/^blob/ {print $4}' | sort -u > all_files.txt
# get currently used files
git ls-tree -r --name-only main | sort > current_files.txt
# get dead files by comparing all files to current files
comm -23 all_files.txt current_files.txt > deleted_files.txt
# strip dead files from history
git filter-repo --invert-paths --paths-from-file deleted_files.txtBy just straight up doing git filter-repo --strip-blobs-bigger-than 3M, I got the .git size down to 219M.
By combining the bash script and the --strip-blobs-bigger-than 3M, I got it down to 159M, which is obviously a lot better than 548MB.
Largest files
Largest files in history from running git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | awk '/^blob/ {print $3, $4}' | sort -u | sort -n | awk '{printf "| %.2fMB | %s |\n", $1/1024/1024, $2}' | tail -10:
| Size | File |
|---|---|
| 9.96MB | public/jobs/zephyr-group-pic.jpg |
| 10.25MB | public/winter/2.png |
| 12.28MB | public/hc-cdn/7bf19e299e3e8253096906cef8d599c7aedeed09_image.png |
| 13.05MB | public/fiscal-sponsorship/hcb-gource.gif |
| 16.58MB | public/winter/11.gif |
| 22.57MB | public/home/assemble.jpg |
| 22.96MB | public/train_starry_night.png |
| 29.65MB | public/home/outernet-110.jpg |
| 38.72MB | public/philanthropy/hackclub.pdf |
| 40.43MB | public/onboard/first_and_hack_club.pdf |