GitCommonMistakes

From Git SCM Wiki
Jump to: navigation, search

There are cases where projects migrating to Git from SVN or CVS have made mistakes in the transition which are hard to back out because of git's cryptographically signed commit history.

This page details some common mistakes and pitfalls that projects have experienced when switching to git for version control.

It might make sense to eventually update git or its porcelains to identify some of these cases programmatically, or to update the official documentation to help users avoid these mistakes.

If you wish to add to this list, avoid hypothetical cases and try to cite real repositories in the wild where your mistake has been observed.


Table of contents:

Contents


Why does this matter?

You only get one shot at publishing your repository.

Misattribution of authors can taint the legality of your entire codebase and bring in to question the source of your changesets.

Mismatched commit history is less than useless; it can lead to confusion and may conceal bugs.

You will have to keep your legacy CVS/SVN tree around forever and won't get the benefits that are made possible with the git workflow.

There's usually no going back. This is a feature. Once a git repository is published, attempting to rewrite its history will invalidate:

  • Release/snapshot announcements containing hashes
  • Email archives with gitweb URLs
  • Bugzilla links
  • Every git clone that any developer or user has ever made

It's clear by this point that getting things right to start with is highly desirable.

Mistakes made during repository migration

Some projects have a decade or more of incredibly valuable version history locked up in their existing legacy source code repositories.

In the rush to benefit from the new generation of distributed versioning tools, repository administrators have failed to accurately convert their existing project history accurately to the git format.

Getting off to a good start is important to avoid backlash from committers during the transition, and doing it right will save headache in the long run.

Examples

Monolithic repository

Lack of SubprojectSupport in git, or the import of an old repository, can lead people to keep multiple logical projects in the same repository.

TODO: explain

Cairo: Incorrectly mapped names

http://lists.freedesktop.org/archives/cairo/2006-May/006992.html

TODO: explain mixup between committers with similar initials leading to author misattribution

Xorg: Missing branches

TODO: explain, particularly vendor branches etc.

Cairo: Using a home-rolled or outdated conversion tool

http://gitweb.cairographics.org/?p=cairo;a=commit;h=b913a81c3482f4dc8bc452d91bfaf64b416523f3

This commit highlights a near worst-case scenario where the git log and author have absolutely no relevance to the actual commit. To see for yourself, compare the ChangeLog contents to the information on the gitweb page.

It has been speculated (TODO: link to list here) that the tools provided with git for CVS-to-SVN migration have an inadequate understanding of CVS semantics, particularly branching. For one-time conversions (i.e., when the CVS repository will be switched off after the conversion), cvs2svn/cvs2git migrates the CVS history much more accurately.

http://cvs2svn.tigris.org/cvs2git.html

Mistakes made in day-to-day operation

Git is a joy to use, but there are cases when developers misunderstand the day-to-day operation of a git repository. It's possible to think you are using git properly for months without realizing that you are getting it wrong and have a project history that is less useful than it could have been.

These mistakes are generally less severe than mistakes made when a repository is first initialized or converted from a legacy format, but they are also less obvious and harder to catch.

Examples

Linux: The botched DVB merges

http://www.ussg.iu.edu/hypermail/linux/kernel/0603.2/1961.html You sync the source trees, but the histories doesn't match up.

TODO: explain better

What Linus does to check stuff; I tend to have a two-level safety check. The first is that I just check the diffstat of the result of the pull

This first-level safety check generally catches any serious mistakes, and it's helped me catch cases where the other end did something really stupid by mistake.

--- So this one was caught by the fact that I'm just fairly anal about the history being sensible (as the ACPI people may remember ;), so I usually also do

gitk ORIG_HEAD.. -- Linus on mailinglist

Xorg: Log message conventions not followed

http://lists.freedesktop.org/archives/xorg/2006-June/016127.html

"Apparently, several git tools assume a special formated log message.
We don't follow these conventions yet, and at least git-rebase behaves
strangely if they are not followed." - mhopf

See also: CommitMessageConventions.

TODO: explain, link

Xorg: Author attribution not done properly

http://lists.freedesktop.org/archives/xorg/2006-June/015946.html

For each commit, git keeps track of both the committer and the author or the patch. This is useful in cases like the one in the mail above where you commit a patch on behalf of the author who doesn't have commit access. There are several ways to do this, but for a one-off commit the easiest is to use the --author option to git-commit:

    git commit --author "Name Surname <email@address.com>"

Xorg: Accidental 'origin' branches

http://lists.freedesktop.org/archives/xorg/2006-July/016735.html

Public git repositories should usually not have an 'origin' branch.

TODO: explain how this breaks things, and how to avoid it

Note that newer git is better at this, at least for newly cloned repositories. See "Repository layout and objects transfer" in http://www.kernel.org/pub/software/scm/git/docs/v1.5.0/RelNotes-1.5.0.txt.


Personal tools