admin管理员组

文章数量:1128286

We have a repository with several large sql files, ranging from 100MB to 10GB in size.

Been trying to setup local cloning so we only download the sql file(s) that need changed and committed at any given time, instead of downloading all the sql files, even if we only need one.

I've been able to get close with the following commands. It works up until I commit my changes, at that point it downloads all the files in the current branch.

git clone --filter=blob:none --depth 1 -n --sparse <url>
cd <repoDir>
git sparse-checkout set <fileNeedingChangedAndCommitted>
git restore --staged .
git restore <fileNeedingChangedAndCommitted>
# At this point, the file I need to change is downloaded locally, ready for changes.
# Make changes to file.
git add <fileNeedingChangedAndCommitted>
git commit -m "test"
# At this point, all other files in current branch are downloaded, even if not changed.

I feel like this should be possible, but maybe I'm misunderstanding the concept of sparse-checkout or missing a step/detail.

Is there any way to download only the files you want to change, and then commit those changes without downloading every file in the current branch?

EDIT: From my testing and the chatter on this question, came to the conclusion that this isn't possible with Git. However, we decided to keep the SQL files in their own orphaned branches in the same repo, so they each have their own commit history/chain but are in the same repository for organization. This allows us to checkout only the branches/files we need at any given time, and make changes/commits without downloading all the blobs/hashes of the other sql files. This won't work for every situation, but solves our requirement for now :)

We have a repository with several large sql files, ranging from 100MB to 10GB in size.

Been trying to setup local cloning so we only download the sql file(s) that need changed and committed at any given time, instead of downloading all the sql files, even if we only need one.

I've been able to get close with the following commands. It works up until I commit my changes, at that point it downloads all the files in the current branch.

git clone --filter=blob:none --depth 1 -n --sparse <url>
cd <repoDir>
git sparse-checkout set <fileNeedingChangedAndCommitted>
git restore --staged .
git restore <fileNeedingChangedAndCommitted>
# At this point, the file I need to change is downloaded locally, ready for changes.
# Make changes to file.
git add <fileNeedingChangedAndCommitted>
git commit -m "test"
# At this point, all other files in current branch are downloaded, even if not changed.

I feel like this should be possible, but maybe I'm misunderstanding the concept of sparse-checkout or missing a step/detail.

Is there any way to download only the files you want to change, and then commit those changes without downloading every file in the current branch?

EDIT: From my testing and the chatter on this question, came to the conclusion that this isn't possible with Git. However, we decided to keep the SQL files in their own orphaned branches in the same repo, so they each have their own commit history/chain but are in the same repository for organization. This allows us to checkout only the branches/files we need at any given time, and make changes/commits without downloading all the blobs/hashes of the other sql files. This won't work for every situation, but solves our requirement for now :)

Share Improve this question edited 2 days ago Eschin Tenebrous asked Jan 8 at 19:32 Eschin TenebrousEschin Tenebrous 278 bronze badges 5
  • It's too late for this advice to help, but next time I'd choose a version control system designed with partial trees as a core feature rather than something tacked on after-the-fact; SVN, for example. – Charles Duffy Commented Jan 8 at 19:43
  • Is it just that git doesn't handle partial trees very well? From the sparse-checkout docs,"This command is used to create sparse checkouts, which change the working tree from having all tracked files present to only having a subset of those files." Of course, in the same documentation, it states the command/feature is 'experimental' :) – Eschin Tenebrous Commented Jan 8 at 21:29
  • Pretty much, yes, git doesn't handle partial trees very well. The data model was designed assuming everyone would have a full copy of everything for offline use; shallow cloning, sparseness, &c was added on later. It's certainly theoretically possible for git to do better with the data model it has (by copying hashes it can't validate), at the expense of losing some integrity checking when those features are in use -- but when the core developers behind a tool use it a different way than you do, that puts you in a bad place wrt likelihood of surprises. – Charles Duffy Commented Jan 8 at 21:55
  • Would be interesting to know if git commit --quiet does not download all objects. One of the features of plain git commit is that you get a summary of the changes. But with --quiet the summary does not have to be computed. – j6t Commented 2 days ago
  • Tested --quiet a short bit ago, and the commit still caused all the unwanted blobs to download :( – Eschin Tenebrous Commented 2 days ago
Add a comment  | 

1 Answer 1

Reset to default 1

I feel like this should be possible, but maybe I'm misunderstanding the concept of sparse-checkout

The problem is not in sparse checkout, the problem is in --filter=blob:none. This filter prevents downloading all object at the clone time but Git downloads the necessary objects later when they're accessed.

Is there any way to download only the files you want to change, and then commit those changes without downloading every file

Most probably no. Virtually Git stores a copy of the entire working tree in an every commit. I said "virtually" because technically Git does all its best to never store copies, instead it saves pointers to existing objects. To construct a commit Git needs all trees and blobs from the previous commit so that's what it downloads. With sparse checkout but without filter Git would have all necessary objects and wouldn't download anything; but everything must be pre-downloaded.

The bottom line: you can download and use locally as little as possible. But once you gonna commit Git will need all objects. So either you tolerate Git downloading required objects or allow Git to pre-download everything by removing filter: git clone --depth 1 -n --sparse <url>

本文标签: filterGit clone and download only the files that need changed and committedStack Overflow