There are cases where projects migrating to Git from SVN or CVS have made mistakes in the transition which are hard to back out because of git's cryptographically signed commit history.
This page details some common mistakes and pitfalls that projects have experienced when switching to git for version control.
If you wish to add to this list, avoid hypothetical cases and try to cite real repositories in the wild where your mistake has been observed.
Table of contents:
Why does this matter?
You only get one shot at publishing your repository.
Misattribution of authors can taint the legality of your entire codebase and bring in to question the source of your changesets.
Mismatched commit history is less than useless; it can lead to confusion and may conceal bugs.
You will have to keep your legacy CVS/SVN tree around forever and won't get the benefits that are made possible with the git workflow.
There's usually no going back. This is a feature. Once a git repository is published, attempting to rewrite its history will invalidate:
- Release/snapshot announcements containing hashes
- Email archives with gitweb URLs
- Bugzilla links
- Every git clone that any developer or user has ever made
It's clear by this point that getting things right to start with is highly desirable.
Mistakes made during repository migration
Some projects have a decade or more of incredibly valuable version history locked up in their existing legacy source code repositories.
In the rush to benefit from the new generation of distributed versioning tools, repository administrators have failed to accurately convert their existing project history accurately to the git format.
Getting off to a good start is important to avoid backlash from committers during the transition, and doing it right will save headache in the long run.
Lack of SubprojectSupport in git, or the import of an old repository, can lead people to keep multiple logical projects in the same repository.
Cairo: Incorrectly mapped names
TODO: explain mixup between committers with similar initials leading to author misattribution
Xorg: Missing branches
TODO: explain, particularly vendor branches etc.
Cairo: Using a home-rolled or outdated conversion tool
This commit highlights a near worst-case scenario where the git log and author have absolutely no relevance to the actual commit. To see for yourself, compare the ChangeLog contents to the information on the gitweb page.
It has been speculated (TODO: link to list here) that the tools provided with git for CVS-to-SVN migration have an inadequate understanding of CVS semantics, particularly branching. For one-time conversions (i.e., when the CVS repository will be switched off after the conversion), cvs2svn/cvs2git migrates the CVS history much more accurately.
Mistakes made in day-to-day operation
Git is a joy to use, but there are cases when developers misunderstand the day-to-day operation of a git repository. It's possible to think you are using git properly for months without realizing that you are getting it wrong and have a project history that is less useful than it could have been.
These mistakes are generally less severe than mistakes made when a repository is first initialized or converted from a legacy format, but they are also less obvious and harder to catch.
Linux: The botched DVB merges
http://www.ussg.iu.edu/hypermail/linux/kernel/0603.2/1961.html You sync the source trees, but the histories doesn't match up.
TODO: explain better
What Linus does to check stuff; I tend to have a two-level safety check. The first is that I just check the diffstat of the result of the pull
This first-level safety check generally catches any serious mistakes, and it's helped me catch cases where the other end did something really stupid by mistake.
--- So this one was caught by the fact that I'm just fairly anal about the history being sensible (as the ACPI people may remember ;), so I usually also do
gitk ORIG_HEAD.. -- Linus on mailinglist
Xorg: Log message conventions not followed
"Apparently, several git tools assume a special formated log message. We don't follow these conventions yet, and at least git-rebase behaves strangely if they are not followed." - mhopf
See also: CommitMessageConventions.
TODO: explain, link
Xorg: Author attribution not done properly
For each commit, git keeps track of both the committer and the author or the patch. This is useful in cases like the one in the mail above where you commit a patch on behalf of the author who doesn't have commit access. There are several ways to do this, but for a one-off commit the easiest is to use the --author option to git-commit:
git commit --author "Name Surname <email@example.com>"
Xorg: Accidental 'origin' branches
Public git repositories should usually not have an 'origin' branch.
TODO: explain how this breaks things, and how to avoid it
Note that newer git is better at this, at least for newly cloned repositories. See "Repository layout and objects transfer" in http://www.kernel.org/pub/software/scm/git/docs/v1.5.0/RelNotes-1.5.0.txt.