Skip to content

Remove Future Git History from Dockerfiles#94

Open
ConnorBAdams wants to merge 9 commits into
scaleapi:mainfrom
ConnorBAdams:connorbadams/purge-git-future-history
Open

Remove Future Git History from Dockerfiles#94
ConnorBAdams wants to merge 9 commits into
scaleapi:mainfrom
ConnorBAdams:connorbadams/purge-git-future-history

Conversation

@ConnorBAdams
Copy link
Copy Markdown

@ConnorBAdams ConnorBAdams commented May 5, 2026

Summary

This is addressing the problem raised in #93 but to summarize:

  • All of the images in the repo have future git history in them
  • This makes them vulnerable to 3 variations of reward hacks via git mining:
    1. Future commits on the main branch
    2. Feature branches with future commits, for example origin/dev
    3. Git tags which reference future commits, although these are much more rare to see exploited
  • The fix is to remove future commits after checking out the desired commit

Steps to reproduce and examples are provided in the issue.

The core fix is:

# Strip future git history so the agent can't reach the reference fix.
git remote remove origin 2>/dev/null || true
git for-each-ref --format='delete %(refname)' refs/heads refs/remotes refs/tags | git update-ref --stdin
rm -f .git/FETCH_HEAD .git/ORIG_HEAD
git reflog expire --expire=now --all
git gc --prune=now

Which:

  1. Removes origin to avoid leaking info
  2. Removes local branches, remote tracking branches, and tags
  3. Removes metadata containing branches that used to exist (we removed the content previously, this removes metadata references that would otherwise fail to resolve)
  4. Expires all data we just removed so it can't be easily recovered
  5. Finally deletes everything

Changes

There isn't a script in the repo to produce all of the dockerfiles, so I used Claude to hack one together for the purpose of going through and adding the required commands to clean up the dockerfiles.

  1. Added the script used to update dockerfiles under a new scripts directory in the repo root
    • If we want to drop this let me know, happy to do so
  2. Updated all 731 instance dockerfiles
  3. Reworked the harness to extract the patch and apply it instead of loading it from git history backed into the image

Testing

I rebuilt the images separately and verified they work, with these I also verified:

  1. Historical commits are still available for exploration
  2. Future commits are not available
  3. Feature branches with future commits aren't available
  4. Git tags are also removed
  5. Ran each repo with the repo's harness

Related: Fix in Harbor's adapter for SWEBench Pro - harbor-framework/harbor#1593

/closes #93

@ConnorBAdams ConnorBAdams marked this pull request as ready for review May 5, 2026 23:56
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 5, 2026

Too many files changed for review. (734 files found, 100 file limit)

@ConnorBAdams
Copy link
Copy Markdown
Author

ConnorBAdams commented May 6, 2026

Just realized that the harness actually expects the patch to be in the container, working on a fix real and will update when done!

Edit: Done and confirmed working 😄
Edit 2: 35 cases regressed because git apply is not a faithful 1:1 of git diff - fixing that! Fixed!

@jeff-da
Copy link
Copy Markdown
Contributor

jeff-da commented May 11, 2026

@yannis-he could you merge this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Git Reward Hacking in SWEBench Pro OSS

2 participants