Table of contents:
Initial patchbomb comparison with Mercurial
On Mon, Apr 25, 2005 at 07:08:28PM -0700, Linus Torvalds wrote:
> To make an interesting benchmark, try applying the first 200 patches in
> the current git kernel archive. Can you do them three per second? THAT is
> the thing you should optimize for, not checking in huge changes.
Ok, I've optimized for it a bit. This is basically:
hg import -p1 -b ../broken-out `cat ../broken-out | grep -v #`
My benchmark is to apply all 819 patches from -mm3 to 2.6.12-rc:
819/(60+57.195 + 14.068) = 6.239 patches/second user+sys
repository: before 167M after 173M (3.5% growth)
819/(60+11.196+50.144) = 6.750 patches/second user+sys
repository: before 102M after 154M (51% growth)
Old Bazaar-NG versus Git comparison
Comparison using linux-2.6.0 and linux-18.104.22.168 as a large source tree to use.
|Operation||bzr (0.7pre)||git (0.99.9c)|
|initial commit||2m 4.757s||1m22.773s||54.964s||53.016s|
|diff no changes||3m51.918s||9.186s||0.057s||0.056s|
|status no changes||19.711s||16.358s||0.442s||0.458s|
|commit one file||2m 6.685s||35.192s||7.364s||7.366s|
Note that `git-add` adds contents of a file to repository, and not only marks file for later commit. Note also that diff is done before any commit; see "Bzr vs git, the sequel".
Note also discrepancy between real and user+sys for "bzr diff" comparing imported linux-2.6.0 with linux-22.214.171.124 in working tree (1m13.869s vs 29.028s), "bzr commit" for initial commit (2m4.757s vs 1m22.773s), "bzr diff" for no changes (3m51.918s vs 9.186s) and "bzr commit" with singe file change (2m6.685s vs 35.192s) in original reference.
- Bzr versus git from March 3rd, 2006 on Jo Vermeulen blog (mentioned Talk:Git (software) page on Wikipedia).
- Bzr vs git, the sequel from June 26th, 2008, mentions benchmark with newer versions of git and bzr, and explains some details of the tests.
bzr, git, and hg performance on the Linux tree
This is compilation of two comparisons of Bazaar-NG (bzr), Git and Mercurial (hg) speed. The "current" git/bzr/hg versions are the ones currently found in Ubuntu 8.04: git (126.96.36.199), hg (0.9.5), and bzr (1.3.1).
Tests were done using Linux 2.6.0 tarball and linux-188.8.131.52 tarball (note the short history!).
It is not mentioned how time was measured, is it user+sys result, or wallclock (real) time. Also the details of hardware, operating system and filesystem used are not given.
|Diff (no changes)||0m0.057s||0m0.343s||3m51.918s||0m47.448s||0m1.340s|
|Status (no changes)||0m0.442s||0m1.230s||0m19.711s||0m4.027s||0m1.077s|
"Diff (large)" means diff between 2.6.0 and 184.108.40.206; "Commit (small)" means committing state which differs by one line in single file (MAINTAINERS).
About "Adding files" / "Adding 2.6.0" performance: it was not stated what was command used to add all files for git; note that
git add . has much better performance than
git add * (the performance of latter was since improved). Note also that `git add` does more than just marking file as being under version control; it has to add content (blob and tree objects) to object database and some more info to index.
Repository sizes after all those commands:
|92 MB||112 MB||179 MB|
For Git and Bazaar repository sizes are after `gc` for git and `pack` for Bazaar; without repacking git repository size was 162 MB
- git/bzr historical performance comparison on May 8, 2008 by Jordan (LaserJock)
- bzr, git, and hg performance on the Linux tree on May 9, 2008 by Jordan (LaserJock), in response to comment in blog post above.
Quilt import comparison of Git and Mercurial
- commit 773 patches (20MB) for 2.6.18-rc1 to-mm2
- 1.8GHz AMD64 laptop, 1.2GB of RAM, freshly formatted ext3 filesystem, data=writeback,noatime
Mercurial (version unknown) (probably latest at that time)
$ hg qpush -a 2.6.18-rc1-mm2
773/(42.511+10.105) = 14.691 patches/second user+sys
$ git-quilt-import 2.6.18-rc1-mm2
773/(60+15.953+30.186) = 7.282 patches/second user+sys
Clone/status/pull comparison of Git and Mercurial
On Tue, May 23, 2006 in scm next steps post Jeremy Blosser did speed benchmarks with hg vs. git. The post contains fairly large table of results. Here's the summary, where all times are given in seconds, and are user+sys time averaged over all results and all cases:
Git repository has all 4 branches in one repository, Mercurial (hg) had one repository per branch; `git log` is for git_p4 with only 2 branches.
As it can be easily seen Git is faster than Mercurial in all tested areas except pull/merge, where it is around 4 times slower.
Reference: scm next steps by Jeremy Blosser.
Diff comparison of Git, Mercurial and Bazaar-NG
Tests were made on the whole Mozilla tree, with one-line change to one file. All numbers are best of 3 runs, back to back on the same mostly idle computer.
|Operation||bzr (0.12.0c1)||hg (0.9)||git (220.127.116.11)|
|diff (top level)||16.957||5.600||1.572|
|diff in dom||16.833||5.548||0.136|
|diff changed file||10.472||2.084||0.116|
|diff unchanged file||10.012||2.024||0.088|
Reference: Jst's Blog: bzr/hg/git performance.
Comparison of Bazaar-NG, Darcs, Git and Mercurial
Tests were made using a directory with 266 MB of files, 258 KB of which were text files, with the rest being image files (not very typical but reflects nature of author's repository). Here's a table summarizing the real times reported by time(1):
As you can see, Mercurial (`hg`) was the fastest. The fact that Mercurial outperforms Git is probably caused by the fact that Git just doesn't handle the binary files as well, although there were some patches lately dealing with overly-big blobs.
- Cyclopedia Square: Cutting Edge Revision Control, blog by Bryan Murdock, at Saturday, March 3, 2007.
- Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial thread at git mailing list.
Clone and pull benchmark of Git and Mercurial for Google Code
This is synthetic benchmark comparing mainly transfer over HTTP protocol; note that Mercurial uses hg-serve here, while Git uses ordinary web server (HTTP is "dumb" protocol in git).
Repositories were seeded with approximately 1500 files totaling 35 M of data for initial "clone" test. Then a single file in the repository was then changed 50 times and the clients pulled the updates for "pull" test. The servers were running in Chicago and the clients in Mountain View (51 ms ping time).
|clone||8.1 sec||178 sec||8.7 sec|
|pull||1.5 sec||18 sec||2.8 sec|
Git was 22 times slower than Mercurial for clone over HTTP, and 12 times slower for pull over HTTP. Times for smart protocols were similar for Git and for Mercurial.
- Analysis of Git and Mercurial for DVCS support in Google Code.
Repository size benchmarks
--window and --depth parameters
A Blogpost about Git repack parameters mentions that most repositories should be ok with a window value of 50 for a complete repack. Really large repositories might benefit from a higher value. The --depth parameter becomes insignificant when a large window value is used.
Estimate size of Linux kernel repository
The historic Linux repo currently repacks itself into a ~175MB pack for 63428 commits.
The current Linux repo is ~103MB with a much shorter history (27153 commits).
Given the above we can estimate the size of the kernel repository after x commits as follows:
- slope = (175 - 103) / (63428 - 27153) = approx 2KB per commit
- initial size = 175 - .001985*63428 = 49MB
So the initial kernel commit is about 49MB in size which is coherent with the corresponding compressed tarball. Subsequent commits are 2KB in size on average.
Pack size vs number of commits of git repository
Fitting linear function to the size of pack (created
git pack-objects) versus number of commit (number of lines returned by
git rev-list $commit) for git repository (which is repository which starts from scratch, from two files, and not is imported project) gives around 1.2 kB per revision (per commit); 1.35 KB per revision if we add index size into account. The size of (gzipped) checkout is not substracted. Only mainline is taken into account.
- slope = 1.25 kB / revision (for git.git repository)
Gains from deltaification: pack size vs size of tarballs
Comparison of an archive of all released Gnumeric versions versus git repository containing all released Gnumeric versions.
- Total size of 172 tar files: 1508026377 bytes.
- Total size of git archive: 139733921 bytes
- Ratio: 10.8
Reference: 10x+ Better Compression Than Gzip blog entry (from January 7th, 2007).
Git, Mercurial, Bazaar repository size benchmark
As candidate repositories there were chosen three projects which use given version control system for each SCMs tested. The choice is explained in DVCS Comparison: Meet the candidates blog post.
For all repositories, only a single branch was converted. For all repositories except Samba, this meant the development branch. As Samba has multiple development branches, v3-3-test branch was chosen. Where possible fast-export/import interface was chosen.
After conversion, pack command was run for the repositories that support this feature. For Git, this meant a `git repack -adf \u2014window=250`, for Bazaar it meant a `bzr pack`, and removing the obsolete packs. This means that the table below show size of tightly packed repository.
Tests were done using Git v18.104.22.168, Bzr v1.5, Hg v1.0
As can be seen from the table, Git really is the most efficient in storing the data. Next up is Mercurial, which also does a nice job. Bazaar is the least efficient by far, taking on average 2.8 times the space of an equivalent Git repository.
- DVCS Comparison: Git, Mercurial, Bazaar Repository Size Benchmark blog entry (from June 1, 2008), from DVCS Comparison blog.
- Some further comments to above blog post on Reddit
Other benchmarks and references
Synthetic benchmark: how system acts under stress (SVK, Darcs, Monotone, Bazaar, Mercurial, Git)
DVCS Round-up: One System to Rule Them All? -- Part 3 by Robert Fendt on Linux Developer Network contains results of two synthetic benchmarks testing how a system acts under stress (number of commits in repository, or number of files comitted). The test system was a VM running Ubuntu 8.10 (\u201cIntrepid\u201d), and the software versions used were SVK 2.0.2, darcs 2.1.0, monotone 0.42, Bazaar 1.10, Mercurial 1.1.2, and Git 1.6.1.
First test was about dependence on repository growth, testing commit time and repository size versus number of commits present in repository. SVK and darcs show the worst scaling behaviour, with darcs showing the strongest dependence on repository size. Mercurial shows a very weak dependence, with check-in time increasing by 180 milliseconds over the whole range. Git shows almost no dependence at all, with check-in time increasing by 20 milliseconds. Both SVK and darcs use quite a lot of space on the disk, with SVK showing very fast growth. Mercurial's repository size was only about 10 percent larger, and Git's even about 10 percent smaller than the checkout (note that for Git there was performed repacking every 500 commits).
Second test was about dependence on check-in size, testing commit time versus number of files added to empty repository, and time it took to create single file commit after the one from previous test. Adding 2000 files to an empty repository took SVK nearly 1400 seconds, or over 23 minutes. Among the rest, Darcs was again the slowest with about 5.5 seconds, while Git managed the task in about 300 milliseconds. Monotone took about a second, Bazaar 1.9 and Mercurial 2.5 seconds. Mercurial and Git showed more or less no dependence of single file commit time on the size of the previous commit (which is also number of files in repository), and the other systems only small changes.
See also: DVCS Follow-Up: Managing the Python Repository (PDF) -- a quick test using a 'real' repository.