GitTheory

From Git SCM Wiki
Jump to: navigation, search

OBSOLETE CONTENT

This wiki has been archived and the content is no longer updated. Please visit git-scm.com/doc for up-to-date documentation.

DARCS tends to have more theoretical discussions than Git authors, but Git is not without its foundations.

Snapshots

Each commit holds a read-only snapshot of a file system.

The repository's working tree holds a writable snapshot - a better term would just be "file system". Part of that file system is ignored because of .gitignore and .git/info/excludes. For the rest, there isn't yet a good standard for what should be comparable to what is in HEAD, since "git add --all" and "git commit -a" have different behavior.

There is actually one other file system (or "writable snapshot"). The index file, together with HEAD, represents a file system. The index holds the differences between HEAD and what will become the next commit, so HEAD + the index file represents a complete picture of that snapshot. It's a writable snapshot, since the index file can be changed up until "git commit" is run.

NOTE: In a merge conflict, the index file actually holds 3 different snapshots, but 2 of those are just cached files from the two snapshots being merged.

NOTE: There have been multiple proposals to name this other file system. One proposal was to call it "NEXT" and the current working tree "WTREE". So that "git diff" would be equivalent to "git diff WTREE NEXT" and that "git diff --cached" could be written as "git diff NEXT HEAD".

Edits

In Git, an edit is the difference between two snapshots. It is derived from the snapshots. This is different than DARCS, which stores a concrete representation of edits and then derives the snapshots.

Git theoreticians believe it is better to use snapshots. Snapshots are what people use and a snapshot is known to contain or not contain a particular feature. While a patch file or other concrete representation of an edit is often said to "contain a feature", this can't be the case because that feature may require different changes based on the snapshot it is applied to.

In Git circles, you may hear "An evil merge is a merge that introduces changes that do not appear in any parent." This is, in general, true. However, if you a merging a branch with a feature and the main branch has changed so that additional changes are needed to enable that feature in the merged result, putting those changes into the merge commit should not be considered "evil". The merge commit represents bringing the feature into the main branch and so it should contain those changes.

To put it simply, source control users wish we could add and remove features from a branch. However, we can only add or remove edits. So, we try to have each edit contain a single feature and we use git to move those edits around. When a moved edit doesn't exactly represent a feature, we're required to adjust the edit to have it contain the feature.

Some discussion about edits and their properties is below.

Git can manipulate edits. An "edit" is the changes between two specific snapshots. I'll use the notation A:B to represent the edit between two snapshots are A and B.

Edits have mathematical properties. It's easy to see that B:A is the "inverse edit" of A:B. The edit A:A is the "empty edit" for snapshot A. An edit A:B can be "split" using a snapshot C to make edits A:C and C:B or, written another way, A:C:B. Likewise, edits A:B and B:C can be "joined" to form A:C.

  • The inverse edit is generated by "git revert".
  • The empty edit can be written by "git commit --allow-empty".
  • Splitting an edit where the second edit is the working tree is done with "git add".
  • Splitting an edit elsewhere is done with "git rebase --interactive". (See its manual page.)
  • Joining an edit where the first edit is HEAD is done with "git commit --amend"
  • Joining an edit elsewhere is done with "git rebase --interactive"

An edit has a specific start and end snapshot. If you want to do a similar change with a different starting snapshot, you need to "patch"; I'll define a theoretical function patch() that takes an edit and a new starting state and returns the ending snapshot of a new edit. So, patch(A:B, C) may return D, where C:D is a new edit containing a change similar to A:B. I say "may return" because a patch starting at snapshot C might not exist. For example, if the edit A:B moves file "foo.txt" to "bar.txt" and snapshot C does not have a file "foo.txt" or already has a file "bar.txt", then the patch cannot exist. [Note, there can be many definitions of a patch() function. I'm not picking one; I'm just saying one exists.]

  • patch() is most easily seen in "git cherry-pick"

The final definition concerns reordering edits A:B and B:C. The edits are reorderable if a patch of B:C can put in front of a patch for A:B and the resulting edit still ends up at the same final snapshot C. Formally, A:B:C is "reorderable" if there exists A:D:C such that patch(B:C, A) = D and patch(A:B, D) = C.

  • Reordering edits can be done with "git rebase --interactive"

Merge Conflicts

The theoretical question here is what should end up in the index file and working tree after a merge conflict.

For simplicity, we'll consider "git cherry-pick" which involves merging a single edit. It's the easiest case.

A cherry-pick is almost a direct application of the theoretical function patch(). We have an edit A:B and we want to move it onto snapshot C. But we said earlier, the result of a patch() function may or may not exist.

If patch(A:B, C) exists and equals D, then git just writes the snapshot D as the new commit.

But what if patch(A:B, C) does not exist? Git splits A:B! We'll introduce a new state S to get A:S:B. Now, the first edit, A:S, contains all the parts of A:B that can be patch()ed onto state C, and the second edit, S:B, contains all the parts of A:B that cannot be patch()ed onto state C. Obviously, patch(A:S, C) exists and the resulting changes are copied into the index file.

What happens to the unpatchable part in edit S:B? We don't want this change thrown away - it could be important. We want it presented to the user and let the user fix or dismiss it. So, git writes something reflecting the unpatachable part into files in the working tree and marks the files as "needs review" in the index file. (The index file also caches the SHAs of the files from both parent commits, so that they can be accessed quickly by "git diff".)

Personal tools