GitSvnComparison

From Git SCM Wiki
(Redirected from GitSvnComparsion)
Jump to: navigation, search

OBSOLETE CONTENT

This wiki has been archived and the content is no longer updated. Please visit git-scm.com/doc for up-to-date documentation.

Note: This page is currently a work in progress. It started out as a private email to someone who currently uses Subversion. I decided to make it available and try to extend it further. I'll remove this comment when the page is improved.  :) -- Shawn Pearce

See the discussion page for further comments


Although this page is hosted on a Git-specific Wiki it tries to provide a fair and unbiased comparison of Git and Subversion to help prospective users of both tools better evaluate their choices. This page only describes base Subversion and does not discuss the benefits and drawbacks to using SVK, a distributed wrapper around Subversion. It also does not discuss using Git as subversion client.

A summary of differences

  • Git is much faster than Subversion
  • Subversion allows you to check out just a subtree of a repository; Git requires you to clone the entire repository (including history) and create a working copy that mirrors at least a subset of the items under version control.
  • Git's repositories are much smaller than Subversions (for the Mozilla project, 30x smaller)
  • Git was designed to be fully distributed from the start, allowing each developer to have full local control
  • Git branches are simpler and less resource heavy than Subversion's
  • Git branches carry their entire history
  • Merging in Git does not require you to remember the revision you merged from (this benefit was obviated with the release of Subversion 1.5)
  • Git provides better auditing of branch and merge events
  • Git's repo file formats are simple, so repair is easy and corruption is rare.
  • Backing up Subversion repositories centrally is potentially simpler - since you can choose to distributed folders within a repo in git
  • Git repository clones act as full repository backups
  • Subversion's UI is more mature than Git's
  • Walking through versions is simpler in Subversion because it uses sequential revision numbers (1,2,3,..); Git uses unpredictable SHA-1 hashes. Walking backwards in Git is easy using the "^" syntax, but there is no easy way to walk forward.

Git's Major Features Over Subversion

Distributed Nature

Git was designed from the ground up as a distributed version control system. Being a distributed version control system means that multiple redundant repositories and branching are first class concepts of the tool.

In a distributed VCS like Git every user has a complete copy of the repository data stored locally, thereby making access to file history extremely fast, as well as allowing full functionality when disconnected from the network. It also means every user has a complete backup of the repository. Have 20 users? You probably have more than 20 complete backups of the repository as some users tend to keep more than one repository for the same project. If any repository is lost due to system failure only the changes which were unique to that repository are lost. If users frequently push and fetch changes with each other this tends to be a small amount of loss, if any.

In a centralized VCS like Subversion only the central repository has the complete history. This means that users must communicate over the network with the central repository to obtain history about a file. Backups must be maintained independently of the VCS. If the central repository is lost due to system failure it must be restored from backup and changes since that last backup are likely to be lost. Depending on the backup policies in place this could be several human-weeks worth of work.

(Note that even SVK doesn't do quite the same thing as git. SVK downloads a complete history and allows disconnected commits, but there is still a unique "upstream" repository. Two SVK users can't merge with each other and then push the changes to the upstream.)

Access Control

Due to being distributed, you inherently do not have to give commit access to other people in order for them to use the versioning features. Instead, you decide when to merge what from whom.

That is, because subversion controls access, in order for daily checkins to be allowed - for example - the user requires commit access. In git, users are able to have version control of their own work while the source is controlled by the repo owner.

(There exist different mechanisms of control in case you do want to have a repository into which multiple people can push to. -not covered here yet-)

Branch Handling

Branches in Git are a core concept used everyday by every user. In Subversion they are more cumbersome and often used sparingly.

The reason branches are so core in Git is every developer's working directory is itself a branch. Even if two developers are modifying two different unrelated files at the same time it's easy to view these two different working directories as different branches stemming from the same common base revision of the project.

Consequently Git:

  • Tracks the project revision the branch started from - this information is necessary to merge the branch back to trunk
  • Records branch merge events including:
    • author, time and date
    • branch and revision information
    • Changes made on the branch(es) remain attributed to the original authors and the original timestamps of those changes
    • What changes were made to complete the merge. These are attributed to the merging user
    • Why the merge was done (optional; can be supplied by the user).
  • Automatically starts the next merge at the last merge.
    • Knowing what revision was last merged is necessary in order to successfully merge the same branches together again in the future.

This is different to Subversion's handling of branches. As of Subversion 1.5:

  • Automatically tracks the project revision the branch started from.
    • Like Git, Subversion remembers where a branch originated.
  • It's impossible to see only merge related changes.
    • If the merging user had to modify 12 lines of code to complete the merge successfully you can't tell what those 12 lines were, or how those 12 lines differ from the versions on the branches being merged. 'I grant you this point - however, this can be overcome with standards'
  • In Subversion, branches and tags all are copies. Sometimes this is inconvenient, it is easy to check out the whole repository by mistake. Branch path and file path lie in same namespace but they have different semantics - this can be confusing.

Performance (Speed of Operation)

Git is extremely fast. Since all operations (except for push and fetch) are local there is no network latency involved to:

  • Perform a diff.
  • View file history.
  • Commit changes.
  • Merge branches.
  • Obtain any other revision of a file (not just the prior committed revision).
  • Switch branches.

Notes and Commentary:

This section should, ideally include actual comparisons, e.g. load Git code into both Git and SVN. This comparison must include GIT performance exclusively over the internet.

The purpose of subversion is to work on one centralized server. GIT committing locally may be quick, but how is it when committing to a centralized GIT server? And what is the speed committing locally (including user interaction) and committing to a GIT central repository (since both steps are required when pushing changes to the build farm)?

This comparison is highly opinionated. Like comparing apples and salt: both are good for completely different reasons (I wouldn't eat salt like I eat apples, and I wouldn't store apples like I store salt-cured beef).

I do agree GIT has the potential to replace Subversion's engine (both client tools and server, when some missing features are added), but its use would still make it Subversion. (sorry to add more opinions to this page)

Smaller Space Requirements

Git's repository and working directory sizes are extremely small when compared to SVN.

For example the Mozilla repository is reported to be almost 12 Gb when stored in SVN using the fsfs backend. Previously, the fsfs backend also required over 240,000 files in one directory to record all 240,000 commits made over the 10 year project history. This was fixed in SVN 1.5, where every 1000 revisions are placed in a separate directory. The exact same history is stored in Git by only two files totaling just over 420 Mb. This means that SVN requires 30x the disk space to store the same history.

One of the reasons for the smaller repo size is that an SVN working directory always contains two copies of each file: one for the user to actually work with and another hidden in .svn/ to aid operations such as status, diff and commit. In contrast a Git working directory requires only one small index file that stores about 100 bytes of data per tracked file. On projects with a large number of files this can be a substantial difference in the disk space required per working copy.

As a full Git clone is often smaller than a full checkout, Git working directories (including the repositories) are typically smaller than the corresponding SVN working directories. There are even ways in Git to share one repository across many working directories, but in contrast to SVN, this requires the working directories to be colocated.

Line Ending Conversion

Subversion can be easily configured to automatically convert line endings to CRLF or LF, depending on the native line ending used by the client's operating system. This conversion feature is useful when Windows and UNIX users are collaborating on the same set of source code. It is also possible to configure a fixed line ending independent of the native operating system. Files such as a Makefile need to only use LFs, even when they are accessed from Windows. This can be adjusted in a global config and overridden in user configs. Binary files are checked in with a binary flag (like with CVS except that SVN does this almost always automatically) and such never get converted or keyword substituted. Subversion also allows the user to specify line ending conversion on a file-by-file basis. But if the user does not check the binary flag on adding (Subversion prints for every added file whether it recognized it as binary) binary content might get corrupted.

Whilst Git versions prior 1.5.1 never convert files and always assume that every file is opaque and should not be modified. Git 1.5.1 and onwards make [line ending conversion configurable]. Git's advantage over Subversion is that you do not have to manually specify which files this conversion should be applied to, it happens automatically (hence autocrlf).

Subversion's Major Features Over Git

Subversion has some notable features that Git currently doesn't have or will never have.

User Interfaces Maturity

Currently Subversion has a wider range of user interface tools than Git. For example there are SVN plugins available for most popular IDEs. There is a Windows Explorer shell extension. There are a number of native Windows and Mac OS X GUI tools available in ready-to-install packages.

Git's primary user interface is through the command line. There are two graphical interfaces: git-gui (distributed with Git) and qgit, which is making great strides towards providing another feature-complete graphical interface. Also gitk, the graphical history browser, can be more than just a fancy log reader. git-gui and gitk usually work out-of-box for common operating systems, and qgit is being ported to Qt4, which improves its portability. There are some user interface tools in development for Git, namely TortoiseGit, a port of TortoiseSVN. There is also Git Extensions, another explorer shell extension.

Single Repository

Since Subversion only supports a single repository there is little doubt about where something is stored. Once a user knows the repository URL they can reasonably assume that all materials and all branches related to that project are always available at that location. Backup to tape/CD/DVD is also simple as there is exactly one location that needs to be backed up regularly.

Since Git is distributed by nature not everything related to a project may be stored in the same location. Therefore there may be some degree of confusion about where to obtain a particular branch, unless repository location is always explicitly specified. There may also be some confusion about which repositories are backed up to tape/CD/DVD regularly, and which aren't.

What are you trying to argue here? That a central repository is easier to locate/backup? Git can work with a central repository workflow. Git wins here - speed, size (less backup media needed).

Access Controls

Since Subversion has a single central repository it is possible to specify read and write access controls in a single location and have them be enforced across the entire project.

Git can operate with a central repository workflow. Read and write access can be specified at the central repository.

Binary Files

Detection and Properties

Subversion can be used with binary files (it is automatically detected; if that detection fails, you have to mark the file binary yourself). Just like Git.

Only that with Git, the default is to interpret the files as binary to begin with. If you _have_ to have CR+LF line endings (even though most modern programs grok the saner LF-only line endings just fine), you have to tell Git so. Git will then autodetect if a file is text (just like Subversion), and act accordingly. Analogous to Subversion, you can correct an erroneous autodetection by setting a git attribute.

Change Tracking

In an earlier version of git [number?] seemingly minor changes to binary files, such as adjusting brightness on an image, could be different enough that Git interprets them as a new file, causing the content history to split. Since Subversion tracks by file, history for such changes is maintained.

Partial Checkout/Bandwidth Requirements

With Subversion, you can check out just a subdirectory of a repository. This is not possible with Git. For a large project, this means that you always have to download the whole repository, even if you only need the current version of some sub-directory. In times where fast Internet connections are only available in most cities and traffic over mobile internet connections is expensive, git can cost much more time and money in rural areas or with mobile devices.

In other cases, requirements other than the raw repository size provide the motivation for wanting a partial checkout, e.g. access control (you can't restrict read access to part of the repository with Git) or directory layout requirements. There is no general solution for this problem other than to split the original Git repository into multiple repositories, then cloning one of the new repositories.

Git submodules can mitigate but not eliminate these differences. In practice, monolithic Git repositories don't happen, and are broken into many smaller repositories. You then just download the specific repository that you need. This fixes some of the objections in practice, but Subversion still keeps a decided edge here. First, the repository designer must anticipate what divisions people will want. If he or she gets it wrong at first, then it can be a fair bit of tedious effort to break up the repository later (particularly if you want to cut out data from other parts in order to get the bandwidth requirement). This of course assumes that the repository administrator is amenable to this in the first place. :-) Second, managing the separate repositories becomes a hassle if there's even loose coordination betwen them; making changes has more steps. Third, the submodule commands in particular bring out Git's rough UI edges.

Shorter and Predictable Revision Numbers

First, as SVN assigns revision numbers sequentially (starting from 1) even very old projects such as Mozilla have short unique revision numbers (Mozilla is only up to 6 digits in length). Many users find this convenient when entering revisions for historical research purposes. They also find this number easy to embed into their product, supposedly making it easy to determine which sources were used to create a particular executable. However since the revision number is global to the entire repository, including all branches, there is still a question of which branch the revision number corresponds to. (However, see below: strictly speaking, Subversion's ability to hold richer history means that "what branch was this commit on" does not always need to make sense, as a single commit can cross branches.)

Unless the last committed revision is recorded. Since revisions are global for a repository, the last committed revision makes it possible to determine which branch was used

As Git uses a SHA1 to uniquely identify a commit each specific revision can only be described by a 40 character hexadecimal string, however this string not only identifies the revision but also the branch it came from. In practice the first 8 characters tends to be unique for a project, however most users try to not rely on this over the long term. Rather than embedding long commit SHA1s into executables Git users generate a uniquely named tag. This is an additional step, but a simple one.

Secondly, SVN's revision numbers are predictable. If the current commit is 435 the next one will be 436. It's very easy then to go through a few sequential revisions to, e.g. look at differences, revert to an old revision to find when a regression was introduced, etc. Furthermore, without looking up any additional information, you know that commit 436 was done after 435. Similar actions and knowledge from git requires looking at the log.

Git provides shorthand syntax to partially compensate for this by allowing you to add any number of ^ after a revision to indicate how far back to go. e8fa9c^^^..e8fa9c, for instance, would show the history for e8fa9c and it's 3 parent revisions. (However, it does not provide any shorthand syntax for going forward in time.)

Ability to Represent Richer Histories

The benefits of Git's branch and merge handling that are mentioned above come with a downside that can occasionally surface: they slightly restrict your freedom. For instance, it is not possible to commit to two branches at once with Git, but it is in Subversion. There are other restrictions as well; [[1]] has a brief, unfortunately not-terribly-informative discussion of some of the issues that the team working on cvs2git ran into in terms of Git being less flexible than CVS and Subversion.

[[2]] has another brief discussion, including this quote: "CVS allows a branch or tag to be created from arbitrary combinations of source revisions from multiple source branches. It even allows file revisions that were never contemporaneous to be added to a single branch/tag. Git, on the other hand, only allows the full source tree, as it existed at some instant in the history, to be branched or tagged as a unit. Moreover, the ancestry of a git revision makes implications about the contents of that revision. This difference means that it is fundamentally impossible to represent an arbitrary CVS history in a git repository 100% faithfully."

As demonstrated by the large number of large projects that use Git for version control, none of these operations are at all essential; however, it's certainly believable that there are use cases where Git's restrictions are an actual impediment. Perhaps more seriously, these restrictions mean that switching from CVS or Subversion to Git may result in a loss of fidelity in terms of the history of the project.

However, in most cases the extra freedom in creating revisions and tags in Subversion may bring nothing but a possibility of mishandling and confusion.


The Original Email

Provided as reference, until this page is cleaned up.

The key things that I like about Git are:

  • - It's incredibly fast.
    • No other SCM that I have used has been able to keep up with it, and I've used a lot, including Subversion, Perforce, darcs, BitKeeper, ClearCase and CVS.
  • - It's fully distributed.
    • The repository owner can't dictate how I work. I can create branches and commit changes while disconnected on my laptop, then later synchronize that with any number of other repositories.
  • - Synchronization can occur over many media.
    • An SSH channel, over HTTP via WebDAV, by FTP, or by sending emails holding patches to be applied by the recipient of the message. A central repository isn't necessary, but can be used.
  • - Branches are even cheaper than they are in Subversion.
    • Creating a branch is as simple as writing a 41 byte file to disk. Deleting a branch is as simple as deleting that file.
  • - Unlike Subversion branches carry along their complete history.
    • without having to perform a strange copy and walk through the copy. When using Subversion I always found it awkward to look at the history of a file on branch that occurred before the branch was created. from #git: <robinr> spearce: I don't understand one thing about SVN in the page. I made a branch i SVN and browsing the history showed the whole history a file in the branch
  • - Branch merging is simpler and more automatic in Git.
    • In Subversion you need to remember what was the last revision you merged from so you can generate the correct merge command. Git does this automatically, and always does it right. Which means there's less chance of making a mistake when merging two branches together.
  • - Branch merges are recorded as part of the proper history of the
    • repository. If I merge two branches together, or if I merge a branch back into the trunk it came from, that merge operation is recorded as part of the repostory history as having been performed by me, and when. It's hard to dispute who performed the merge when it's right there in the log.
  • - Creating a repository is a trivial operation:
    • mkdir foo; cd foo; git init
    • That's it. Which means I create a Git repository for everything these days. I tend to use one repository per class. Most of those repositories are under 1 MB in disk as they only store lecture notes, homework assignments, and my LaTeX answers.
  • - The repository's internal file formats are incredible simple.
    • This means repair is very easy to do, but even better because it's so simple its very hard to get corrupted. I don't think anyone has ever had a Git repository get corrupted. I've seen Subversion with fsfs corrupt itself. And I've seen Berkley DB corrupt itself too many times to trust my code to the bdb backend of Subversion.
  • - Git's file format is very good at compressing data, despite
    • it's a very simple format. The Mozilla project's CVS repository is about 3 GB; it's about 12 GB in Subversion's fsfs format. In Git it's around 300 MB.

Links


Personal tools