GitBenchmarks

From Git SCM Wiki
Jump to: navigation, search

OBSOLETE CONTENT

This wiki has been archived and the content is no longer updated. Please visit git-scm.com/doc for up-to-date documentation.


Table of contents:

Contents


Speed benchmarks

Initial patchbomb comparison with Mercurial

On Mon, Apr 25, 2005 at 07:08:28PM -0700, Linus Torvalds wrote:
>
> To make an interesting benchmark, try applying the first 200 patches in
> the current git kernel archive. Can you do them three per second? THAT is
> the thing you should optimize for, not checking in huge changes.

Ok, I've optimized for it a bit. This is basically:

 hg import -p1 -b ../broken-out `cat ../broken-out | grep -v #`

My benchmark is to apply all 819 patches from -mm3 to 2.6.12-rc:

hg (Mercurial 0.4b)
real 3m22.075s
user 1m57.195s
sys 0m14.068s

819/(60+57.195 + 14.068) = 6.239 patches/second user+sys
repository: before 167M after 173M (3.5% growth)

git (version unknown, as of April 29, 2005)
real 2m58.568s
user 1m11.196s
sys 0m50.144s

819/(60+11.196+50.144) = 6.750 patches/second user+sys
repository: before 102M after 154M (51% growth)

Reference: Mercurial 0.4b vs git patchbomb benchmark by Matt Mackall, via Git (software) article at Wikipedia.


Old Bazaar-NG versus Git comparison

Comparison using linux-2.6.0 and linux-2.6.15.4 as a large source tree to use.

Operation bzr (0.7pre) git (0.99.9c)
real user+sys real user+sys
init 1.593s 0.187s 0.161s 0.006s
add 2.6.0 31.870s 31.592s 42.121s 35.636s
diff 2.6.15.4 1m13.869s 29.028s 2m26.982s 2m28.000s
initial commit 2m 4.757s 1m22.773s 54.964s 53.016s
diff no changes 3m51.918s 9.186s 0.057s 0.056s
status no changes 19.711s 16.358s 0.442s 0.458s
commit one file 2m 6.685s 35.192s 7.364s 7.366s

Note that `git-add` adds contents of a file to repository, and not only marks file for later commit. Note also that diff is done before any commit; see "Bzr vs git, the sequel".

Note also discrepancy between real and user+sys for "bzr diff" comparing imported linux-2.6.0 with linux-2.6.15.4 in working tree (1m13.869s vs 29.028s), "bzr commit" for initial commit (2m4.757s vs 1m22.773s), "bzr diff" for no changes (3m51.918s vs 9.186s) and "bzr commit" with singe file change (2m6.685s vs 35.192s) in original reference.

References:


bzr, git, and hg performance on the Linux tree

This is compilation of two comparisons of Bazaar-NG (bzr), Git and Mercurial (hg) speed. The "current" git/bzr/hg versions are the ones currently found in Ubuntu 8.04: git (1.5.4.3), hg (0.9.5), and bzr (1.3.1).

Tests were done using Linux 2.6.0 tarball and linux-2.6.25.2 tarball (note the short history!).

It is not mentioned how time was measured, is it user+sys result, or wallclock (real) time. Also the details of hardware, operating system and filesystem used are not given.

Operation git bzr hg
0.99.9c 1.5.4.3 0.7pre 1.3.1 0.9.5
Initialization 0m0.161s 0m0.086s 0m1.593s 0m0.334s 0m0.137s
Adding 2.6.0 0m42.121s 0m14.269s 0m31.870s 0m4.852s 0m2.526s
Commit 2.6.0  ? 0m10.263s  ? 0m43.968s 0m30.890s
Diff (large) 2m26.982s 0m24.425s 1m13.869s 0m51.158s 0m37.846s
Commit 2.6.x 0m54.964s 0m28.468s 2m4.757s 1m8.627s 0m47.948s
Diff (no changes) 0m0.057s 0m0.343s 3m51.918s 0m47.448s 0m1.340s
Status (no changes) 0m0.442s 0m1.230s 0m19.711s 0m4.027s 0m1.077s
Commit (small) 0m7.364s 0m0.397s 2m6.685s 0m9.010s 0m1.913s

"Diff (large)" means diff between 2.6.0 and 2.6.25.2; "Commit (small)" means committing state which differs by one line in single file (MAINTAINERS).

About "Adding files" / "Adding 2.6.0" performance: it was not stated what was command used to add all files for git; note that git add . has much better performance than git add * (the performance of latter was since improved). Note also that `git add` does more than just marking file as being under version control; it has to add content (blob and tree objects) to object database and some more info to index.

Repository sizes after all those commands:

git bzr hg
92 MB 112 MB 179 MB

For Git and Bazaar repository sizes are after `gc` for git and `pack` for Bazaar; without repacking git repository size was 162 MB

References:


Quilt import comparison of Git and Mercurial

  • commit 773 patches (20MB) for 2.6.18-rc1 to-mm2
  • 1.8GHz AMD64 laptop, 1.2GB of RAM, freshly formatted ext3 filesystem, data=writeback,noatime

Mercurial (version unknown) (probably latest at that time)

 $ hg qpush -a 2.6.18-rc1-mm2
real 1m18.398s
user 0m42.511s
sys 0m10.105s

773/(42.511+10.105) = 14.691 patches/second user+sys

Git 1.4.1

 $ git-quilt-import 2.6.18-rc1-mm2
real 2m7.701s
user 1m15.953s
sys 0m30.186s

773/(60+15.953+30.186) = 7.282 patches/second user+sys

Reference: Talk by mpm (Matt Mackall), the primary Mercurial author: Towards A Better SCM: Revlogs and Mercurial at 2006 Ottawa Linux Symposium. Slides: PDF ODP (page 51 of 52).


Clone/status/pull comparison of Git and Mercurial

On Tue, May 23, 2006 in scm next steps post Jeremy Blosser did speed benchmarks with hg vs. git. The post contains fairly large table of results. Here's the summary, where all times are given in seconds, and are user+sys time averaged over all results and all cases:

tool clone status log pull/merge
local ssh http test full log log -p server workst.
git 9,529s 14,020s 36,820s 0,723s 1,893s 0,380s 20,005s 165,494s 479,750s
hg 53,783s 109,052s 109,305s 1,857s 4,167s 6,020s (stopped) 48,992s 108,150s

Git repository has all 4 branches in one repository, Mercurial (hg) had one repository per branch; `git log` is for git_p4 with only 2 branches.

As it can be easily seen Git is faster than Mercurial in all tested areas except pull/merge, where it is around 4 times slower.

Reference: scm next steps by Jeremy Blosser.


Diff comparison of Git, Mercurial and Bazaar-NG

Tests were made on the whole Mozilla tree, with one-line change to one file. All numbers are best of 3 runs, back to back on the same mostly idle computer.

Operation bzr (0.12.0c1) hg (0.9) git (1.4.2.4)
diff (top level) 16.957 5.600 1.572
diff dom/ 10.596 2.240 0.140
diff in dom 16.833 5.548 0.136
diff changed file 10.472 2.084 0.116
diff unchanged file 10.012 2.024 0.088

Reference: Jst's Blog: bzr/hg/git performance.


Comparison of Bazaar-NG, Darcs, Git and Mercurial

Tests were made using a directory with 266 MB of files, 258 KB of which were text files, with the rest being image files (not very typical but reflects nature of author's repository). Here's a table summarizing the real times reported by time(1):

Tool initialize initial initial branch/clone non-conflicting total
repository file import commit repository merge
bzr 0m1.144s 0m0.839s 1m7.836s 0m31.145s 0m1.154s 1m42.118s
darcs 0m0.429s 12m50.321s 0m0.164s 0m5.691s 0m6.717s 13m3.322s
git 0m0.081s 1m1.918s 0m6.679s 1m37.630s 0m1.910s 2m48.218s
hg 0m0.781s 0m0.377s 0m49.015s 0m8.831s 0m0.342s 59.356s

As you can see, Mercurial (`hg`) was the fastest. The fact that Mercurial outperforms Git is probably caused by the fact that Git just doesn't handle the binary files as well, although there were some patches lately dealing with overly-big blobs.

References:


Clone and pull benchmark of Git and Mercurial for Google Code

This is synthetic benchmark comparing mainly transfer over HTTP protocol; note that Mercurial uses hg-serve here, while Git uses ordinary web server (HTTP is "dumb" protocol in git).

Repositories were seeded with approximately 1500 files totaling 35 M of data for initial "clone" test. Then a single file in the repository was then changed 50 times and the clients pulled the updates for "pull" test. The servers were running in Chicago and the clients in Mountain View (51 ms ping time).

Operation hg git
HTTP git
clone 8.1 sec 178 sec 8.7 sec
pull 1.5 sec 18 sec 2.8 sec

Git was 22 times slower than Mercurial for clone over HTTP, and 12 times slower for pull over HTTP. Times for smart protocols were similar for Git and for Mercurial.

Reference:


Repository size benchmarks

--window and --depth parameters

A Blogpost about Git repack parameters mentions that most repositories should be ok with a window value of 50 for a complete repack. Really large repositories might benefit from a higher value. The --depth parameter becomes insignificant when a large window value is used.


Estimate size of Linux kernel repository

The historic Linux repo currently repacks itself into a ~175MB pack for 63428 commits.

The current Linux repo is ~103MB with a much shorter history (27153 commits).

Given the above we can estimate the size of the kernel repository after x commits as follows:

slope = (175 - 103) / (63428 - 27153) = approx 2KB per commit
initial size = 175 - .001985*63428 = 49MB

So the initial kernel commit is about 49MB in size which is coherent with the corresponding compressed tarball. Subsequent commits are 2KB in size on average.

Reference: Nicolas Pitre's email on GitMailingList


Pack size vs number of commits of git repository

Fitting linear function to the size of pack (created git pack-objects) versus number of commit (number of lines returned by git rev-list $commit) for git repository (which is repository which starts from scratch, from two files, and not is imported project) gives around 1.2 kB per revision (per commit); 1.35 KB per revision if we add index size into account. The size of (gzipped) checkout is not substracted. Only mainline is taken into account.

slope = 1.25 kB / revision (for git.git repository)

Reference:
gitgitpacksizefullgq3.th.png
(click on thumbnail to go to the plot)


Gains from deltaification: pack size vs size of tarballs

Comparison of an archive of all released Gnumeric versions versus git repository containing all released Gnumeric versions.

  • Total size of 172 tar files: 1508026377 bytes.
  • Total size of git archive: 139733921 bytes
  • Ratio: 10.8

Reference: 10x+ Better Compression Than Gzip blog entry (from January 7th, 2007).


Git, Mercurial, Bazaar repository size benchmark

As candidate repositories there were chosen three projects which use given version control system for each SCMs tested. The choice is explained in DVCS Comparison: Meet the candidates blog post.

For all repositories, only a single branch was converted. For all repositories except Samba, this meant the development branch. As Samba has multiple development branches, v3-3-test branch was chosen. Where possible fast-export/import interface was chosen.

After conversion, pack command was run for the repositories that support this feature. For Git, this meant a `git repack -adf \u2014window=250`, for Bazaar it meant a `bzr pack`, and removing the obsolete packs. This means that the table below show size of tightly packed repository.

Tests were done using Git v1.5.5.3, Bzr v1.5, Hg v1.0

Repository Git Mercurial Bazaar
Git
Cairo 15MB 24MB 30MB
Coreutils 29MB 44MB 76MB
Samba 82MB 146MB 310MB
Mercurial
Octave 22MB 49MB 57MB
Mozilla 78MB 205MB 255MB
Dovecot 9MB 14MB 23MB
Bazaar
Emacs 120MB 163MB 300MB
Mailman 42MB 75MB 73MB
Pkgconfig 1.1MB 1.3MB 1.8MB
Total 398MB 721MB 1125MB
Relative 1 1.8 2.8

As can be seen from the table, Git really is the most efficient in storing the data. Next up is Mercurial, which also does a nice job. Bazaar is the least efficient by far, taking on average 2.8 times the space of an equivalent Git repository.

References:


Other benchmarks and references

Synthetic benchmark: how system acts under stress (SVK, Darcs, Monotone, Bazaar, Mercurial, Git)

DVCS Round-up: One System to Rule Them All? -- Part 3 by Robert Fendt on Linux Developer Network contains results of two synthetic benchmarks testing how a system acts under stress (number of commits in repository, or number of files comitted). The test system was a VM running Ubuntu 8.10 (\u201cIntrepid\u201d), and the software versions used were SVK 2.0.2, darcs 2.1.0, monotone 0.42, Bazaar 1.10, Mercurial 1.1.2, and Git 1.6.1.

First test was about dependence on repository growth, testing commit time and repository size versus number of commits present in repository. SVK and darcs show the worst scaling behaviour, with darcs showing the strongest dependence on repository size. Mercurial shows a very weak dependence, with check-in time increasing by 180 milliseconds over the whole range. Git shows almost no dependence at all, with check-in time increasing by 20 milliseconds. Both SVK and darcs use quite a lot of space on the disk, with SVK showing very fast growth. Mercurial's repository size was only about 10 percent larger, and Git's even about 10 percent smaller than the checkout (note that for Git there was performed repacking every 500 commits).

Second test was about dependence on check-in size, testing commit time versus number of files added to empty repository, and time it took to create single file commit after the one from previous test. Adding 2000 files to an empty repository took SVK nearly 1400 seconds, or over 23 minutes. Among the rest, Darcs was again the slowest with about 5.5 seconds, while Git managed the task in about 300 milliseconds. Monotone took about a second, Bazaar 1.9 and Mercurial 2.5 seconds. Mercurial and Git showed more or less no dependence of single file commit time on the size of the previous commit (which is also number of files in repository), and the other systems only small changes.

See also: DVCS Follow-Up: Managing the Python Repository (PDF) -- a quick test using a 'real' repository.


Personal tools