admin管理员组

文章数量:1346332

I use a remote Git repository for version control of a Matlab/Simulink project.
Over time, this repository grew over a few hundred MB because each commit done on Simulink files (*.slx, *.sldd, ...) is stored in binary files.
Shallow cloning at depth 1 remains below 50 MB, but doesn't allow to push back new commits to the remote repository if modificatons were pushed by others in-between.

In order to reduce repository size, I would like to delete automatically some older binary files with following rules:

  • Keep all current files
  • Keep commit history after a given date or a given commit ID
  • Delete the binary files commited before this date or replace them by an empty file if they aren't still in use in HEAD

I tried to use git_filter_repo:

git clone --mirror  <url> mymirror
cd mymirror
git filter-repo --path-glob "*.slx" --invert-paths --refs HEAD~50 --refs mybranch

Idea was to keep the 50 commits behind HEAD, and to remove the .slx files after that.
Not fully happy with this idea because it might also remove old files still used in HEAD but checked in before the 50 last commits.

Unexpected result was that the whole branch mybranch disappeared from mirrored repository and that I still find all .slx files in the commits behind the 50 last commits in all other branches. Looks like --invert-paths acts globally.

How to achieve this repository clean up with git_filter_repo or any other git solution?

I use a remote Git repository for version control of a Matlab/Simulink project.
Over time, this repository grew over a few hundred MB because each commit done on Simulink files (*.slx, *.sldd, ...) is stored in binary files.
Shallow cloning at depth 1 remains below 50 MB, but doesn't allow to push back new commits to the remote repository if modificatons were pushed by others in-between.

In order to reduce repository size, I would like to delete automatically some older binary files with following rules:

  • Keep all current files
  • Keep commit history after a given date or a given commit ID
  • Delete the binary files commited before this date or replace them by an empty file if they aren't still in use in HEAD

I tried to use git_filter_repo:

git clone --mirror  <url> mymirror
cd mymirror
git filter-repo --path-glob "*.slx" --invert-paths --refs HEAD~50 --refs mybranch

Idea was to keep the 50 commits behind HEAD, and to remove the .slx files after that.
Not fully happy with this idea because it might also remove old files still used in HEAD but checked in before the 50 last commits.

Unexpected result was that the whole branch mybranch disappeared from mirrored repository and that I still find all .slx files in the commits behind the 50 last commits in all other branches. Looks like --invert-paths acts globally.

How to achieve this repository clean up with git_filter_repo or any other git solution?

Share Improve this question edited 2 days ago Waldi asked 2 days ago WaldiWaldi 41.3k6 gold badges36 silver badges88 bronze badges 8
  • 1 I was under the distinct impression that you can push new history to/from a shallow base. Just tested the situation you mentioned: git clone --depth 1 file://$PWD `mktemp -d` && cd $_ && echo >file && git add file && git commit -m- file && git push origin HEAD:refs/heads/kilroy – jthill Commented 2 days ago
  • 1 @phd re your deleted answer, yah, I see, filter-branch wants something to call the new history and HEAD will do, so check it out or make a temp ref, git branch WIP @~50; git filter-branch --etc -- WIP – jthill Commented 2 days ago
  • @jthill, my experience was that most of the time push from shallow clone wasn't possible. After further reading, seems you're right if noone else pushed in-between. In my case other colleagues working in parallel have probably pushed, requiring git fetch --unshallow before push on my side, which is equivalent to full clone – Waldi Commented 2 days ago
  • 1 @phd my first attempt would be git replace @~50 WIP && git filter-branch -- WIP~..@ – jthill Commented 2 days ago
  • 1 Or even just git replace --graft @~49 && git filter--branch @ to do it all at once, what was I thinking. – jthill Commented 2 days ago
 |  Show 3 more comments

2 Answers 2

Reset to default 2

Both git filter-branch and git filter-repo require a branch. So to rewrite old commits we create a branch pointing to the @~50, filter the branch and then re-parent new commits @~50..@ on top of the filtered branch; it would be impossible to rebase — to many conflicts; many thanks to @jthill for providing helpful guidence about git replace. These commands work for me (I used different wildcard *.txt):

git branch WIP @~50
git filter-repo --path-glob "*.slx" --invert-paths --refs WIP
git replace @~50 WIP
git filter-repo --replace-refs update-no-add
git branch -D WIP

Probably the simplest way to keep local clones small when dealing with (effectively) frequently changing media files, is to filter out all the large blobs initially and let Git fetch those on demand:

git clone --filter=blob:limit=256k u://r/l

and that'll get you all the history that comes in <256KB-sized files and leave the rest out. Git will go back to the origin repo for anything it discovers it needs for a checkout or whatever.

本文标签: git filter repoCleanup binary files in Git repository before a given dateStack Overflow