WhatIsTheIndex

From Git SCM Wiki
Jump to: navigation, search

OBSOLETE CONTENT

This wiki has been archived and the content is no longer updated. Please visit git-scm.com/doc for up-to-date documentation.

What is the index?

Many a times the subject comes up on the mailing list: "Why keep the index?" or, "The index is a performance trick."

The truth is: the index is a staging area. Every SCM has it, but git shows it to you, and uses it effectively.

A staging area

For those used to CVS or similar: What happens when you do

cvs add some.file
cvs commit some.file

The first command, cvs add some.file does not commit the file, right? Where has it gone? Answer: into the staging area.

With the second command, cvs commit some.file, you can finally commit. But what happens to the other modified files? Are they committed? Answer: no, the last revision is updated with the new version of some.file, "in the staging area", and then committed.

So really, it is neither a new concept, nor an intimidating one. The index comes naturally to you when you say

git add some.file

Now, the file is in the index. And here comes the difference to CVS: once you put something into the index, a simple git diff will only show differences with respect to the indexed version. Which means that you will not see any differences to some.file, once you put it into the index. The reason: git assumes that you know what you are doing. If you put something into the staging area with git add, remove a file with git rm, or put a modified version of a file into the staging area with git update-index some.file, git assumes that you want to commit this state, and will not bother you by showing differences you are most likely not interested in.

One special case exists:

git commit some.file

In this case, git assumes that you want to create a temporary staging area from the tip of the current branch ("HEAD"), update some.file, and commit the resulting state. After you committed that state, the staging area is resurrected as it was before that commit.

This operation ("save the current staging area, construct a new one, commit it, and then restore the staging area") seems a bit illogical, since you would usually expect only one staging area. However, in practice it happens quite often that you slap your forehead, because you forgot to commit something very important. So, just edit the respective files, commit just these, and continue with what you were doing before you hurt yourself (slapping your forehead).

In essence: The index is a staging area for the next commit, but for convenience, passing filenames explicitely to git commit builds a temporary staging area from the latest revision and the current version of the provided files before committing that state.

Merges

Normally, a git user will rarely be exposed to the index, if he/she is not committing a revision. But there is one notable exception: merges.

When you merge the work of others, sometimes conflicts happen. These are put in the index. (Strictly speaking, the whole merge is done inside the index by inserting the current version, the version of the branch-to-be-merged, and the merge base (common ancestor) into the index, and merging them using a three-way-diff. If there are no conflicts, these three entries are collapsed into a single entry. Otherwise the three entries stay there, with the common ancestor being replaced by the result of the merge.)

Again, git is intelligent about what to show you upon a git diff: those entries which merged cleanly are already updated in the staging area. It is unlikely that you want to see these differences right now, because you have to fix up conflicts -- if there are any. So, a git diff will show you a combined diff, i.e. a simultaneous diff of the merged-with-conflicts file against both the current version and the version in the branch-to-be-merged.

Now you know what the index is good for.


Personal tools