SoC2008Ideas

From Git SCM Wiki
Jump to: navigation, search

Welcome!

Summer of code projects announced for 2008.

Git has been accepted as a mentoring organization for the 2008 edition of Google Summer of Code. This is the second year that the Git community has participated, and we're very excited about the possibilities that this year's GSoC students will bring.

This page contains project ideas culled from the Git user and developer community. You can get started by reading some project descriptions, and the mailing list thread(s) that spawned them. Also consider joining the developer mailing list, or finding us on IRC. Details can be found in GitCommunity.

Don't like a project you see here? Toss your own idea out there, it may be just the thing we have been looking for!

If you are interested, you can also find projects which got accepted last year for Summer of Code 2007 at SoC2007Projects wiki page.

Contents


General Requirements

All projects have the following basic requirements:

  • Unless otherwise stated, projects will require programming in C, as nearly all of Git is written in C, for maximum speed and portability.
  • All materials must be released under the GNU General Public License (GPL), version 2.
  • Individual students shall retain copyright on their works.
  • Projects must be tracked and managed in Git, and published on http://repo.or.cz.
  • Weekly project status reports should be sent to the project's mentors. Each status report should outline what was accomplished that week, any issues that prevented that week's goals from being completed, and your goals for the next week. This will help you to break your project down into manageable chunks, and will also help the project's mentors to better support your efforts.

Interested students are encouraged to read the Advice for GSoC Students Page, as it has excellent suggestions that might help you to pick a project and shape your proposal.

If your proposal is accepted by the Git Development Community you will be expected to work on it full time during the summer. Its cool if you want to take a week off for vacation, but remember that Google is hiring you for the summer to help us improve Git. That should be your focus. Don't expect that you will be able to work on your project for just 10 hours a week and then collect at the end.

If your original proposal doesn't pan out or becomes too much of a challenge, you should work with your mentor to help redefine it. We really want to see every project succeed this summer, as there is a great deal of interest in these projects from within the user community.

Students can apply for the program at the Google Summer of Code website. Please consider reviewing our SoC2008Template and answering its questions as part of your application.

New To Git? New To Open Source Development?

A collection of smaller projects. Pick one or two and get introduced to the open source community!

Make git-merge and helpers Builtins

We have generally found that the C version of a tool is more portable and easier to maintain across systems, not to mention faster.

A lot of scripts have been converted already, but "merge" is the big remaining chunk.

Goal: Replace git-merge and its helpers with C versions.
Language: C
Mentor: Johannes Schindelin (johannes.schindelin@gmx.de)
Suggested by: Johannes Schindelin on gmane

Better Windows support

We have a pretty good Windows support through msysGit, although it has not been merged with "official" Git yet. However, comparing between the Windows and the Linux version on the same machine, Windows' Git is still slower (even if it is way faster than, say, Subversion).

There must be ways to avoid creating many processes, which is the main reason scripts are slow (e.g. git-svn).

NOTE: A good knowledge of Windows performance tuning, and probably of the Windows API in general, is required.

Goal: Make Git on Windows perform faster.
Language: C
Mentor: Johannes Schindelin (johannes.schindelin@gmx.de)
Suggested by: Johannes on this Wiki page

Apply sparse To Fix Errors

Teach sparse how to fix common errors, and then use sparse to actually fix them. Perhaps more of a sparse project than a Git project. The community just wants to see Git improved, if sparse is improved at the same time, double bonus points!  :-)

Goal: Fix existing errors in Git.
Language: C
Mentor: Johannes Schindelin (johannes.schindelin@gmx.de)
Suggested by: Johannes Schindelin on gmane

Restartable Clone

When cloning a large repository (such as KDE, Open Office, Linux kernel) there is currently no way to restart an interrupted clone. It may take considerable time for a user on the end of a small pipe to download the data, and if the clone is interrupted in the middle the user currently needs to start over from the beginning and try again. For some users this may make it impossible to clone a large repository.

Goal: Allow git-clone to automatically resume a previously failed download over the native git:// protocol.
Language: C
Mentor: Shawn Pearce (gsoc@spearce.org)
Suggested by: Shawn Pearce on gmane

Lazy clone / remote alternates

The idea here is to be able to remotely access objects from a network based object server as needed, rather than having them all local. This would make Git more approachable for very large projects, especially in a more corporate LAN-user setting.

Goal: A working lazy clone prototype implementation that could be considered for inclusion, in a nice series of commits (separate branch/fork)
Language: C
Suggested mentor: Jan Holesovsky, who submitted proof-of-concept patch for lazy clone

Partial (subtree) checkout, or its generalization: sparse checkout

The idea is to checkout for example only the Documentation subdirectory, work on it, but still commit the full tree. Some workflows may be better suited to this type of usage than using submodules. Optionally should include partial clone (not needed objects not in repository).

Goal: A working partial checkout prototype implementation, with technical documentation.
Language: C
Suggested mentors:

  • gitzilla (sent proposal),
  • Nguyen Thai Ngoc Duy (pclouds), proposed to implements it

Implement git-submodule using .git file

Rewrite git-submodule, placing the repository for each referenced submodules in the superproject's $GIT_DIR/modules, and use .git as a file (not a directory), which is currently in 'pu', to point at it. This resolves issues related to switching between versions of the superproject which contain/do-not-contain the submodule, as the submodule working directory can be safely created/deleted at will and its objects are safe somewhere within $GIT_DIR/modules.

Goal: Rework git-submodule, keeping submodule repositories safe under $GIT_DIR.
Language: C, Bourne shell
See: gitster's recent remarks on #git on gmane.

Teach git-apply the 3-way merge fallback git-am knows

When git-apply patches file(s) the files may not patch cleanly. In such a case git-apply currently rejects the patch. If it is called from git-am then a 3-way merge is attempted by taking the "index ..." data from the patch and constructing a temporary tree, applying the patch to that, then merging the temporary tree to the current branch.

For various reasons it would be nice to do the entire temporary tree dance directly inside of git-apply, as it can benefit say the git-sequence's "apply" command, speed up git-am processing by forking less, etc.

Goal: Apply a patch that git-apply rejects today, but that "git am -3" can apply cleanly.
Language: C
See: gitster's recent remarks on #git on gmane.

Gitweb caching

Implementing very smart caching in gitweb, to avoid the thundering herd problem on kernel.org whenever a repository gets updated, or at least support for caching engines in the form of generating proper Last-Modified: and ETag: headers, and responding to If-Modified-Since: and If-None-Match: requests, cheaply.

Perhaps becoming the gitweb maintainer could come of it, or at least the gitweb admin for kernel.org (sorely needed).

Goal: At minimum, port kernel.org's caching to mainline (git's) gitweb
Language: Perl, HTML, perhaps JavaScript
Mentors:

  • John 'Warthog9' Hawley (wrote caching for kernel.org's gitweb)

Co-mentors:

  • Petr Baudis (repo.or.cz admin)
  • Jakub Narebski (gitweb contributor)

Suggested mentors:

  • Lars Hjemli (cgit author, git web interface in C, with caching)

Git / Subversion Interoperability

The idea here is implement something in Git that speaks the Subversion protocol on the wire, but uses Git as the backend storage. (This would be like the existing git-cvsserver.)

There are two potential approaches:

  1. git-svnserver
  2. write a backend for Subversion

Goal: To be able to access git repository, at minimum read-only, from a Subversion client, at least svn CLI.
Language: Open for proposal.
Suggested mentors:

  • Eric Wong (git-svn author)
  • Matthias Urlichs (git-svnimport author)
  • Julian Phillips (has a partial python implementation! at http://git.q42.co.uk/w/git_svn_server.git)
  • Subversion project member t.b.c. as a co-mentor

Resources:

Better Emacs integration

Make git mode for Emacs full featured git GUI, and not only commit tool, following ideas of PCL-CVS... and its limitation. Existing is today git.el, which is very limited in features, and DVC (http://download.gna.org/dvc), which still doesn't take full advantage of Git's capabilities. Merging git.el and DVC would probably be a good start. Other git GUIs, like git-gui, gitk, qgit, tig could also be inspiration for features.

Goal: Allow creating and switching branches, examining history, merging, fetching etc. from withing Emacs. Should include modes for git config file format and format-patch patches.
Language: Emacs Lisp
Suggested mentors:

  • Alexandre Julliard
  • David Kagedal


Possible Co-mentor:

  • Matthieu Moy

Larger Projects

The following projects are probably more suited to the junio/senior/graduate students. Pick one and really make it shine.

GitTorrent

This is a P2P method for distributing git repositories, which will allow for faster downloads, lower startup overheads, etc. For use for sites such as git.kernel.org, repo.or.cz, etc. Existing partial implementations exist at http://jonas.nitro.dk/gittorrent/libgtp.git/ and http://git.utsl.gen.nz/VCS-Git-Torrent

Goal: to implement the GTP/0.1 standard
Language: Open for proposal.
Suggested mentors:

  • Jonas Fonseca (designed the original protocol)
  • Johannes Schindelin (interested party)

Implement git-sequencer

git-am, git-rebase, git-rebase -m, and git-rebase -i are all roughly performing the same task, yet do it through almost four different code paths. They are also all written in Bourne shell script, which limits their performance and portability, especially to non-POSIX systems (such as Windows).

The idea of git-sequencer is to take the "todo" file format used by git-rebase -i and extend it to also support applying patches split out of mbox files, and implement the "todo" processor in C. Frontends like git-am, git-rebase, etc. can then setup the "todo" script and pass it to git-sequencer, which does the actual patch application, editing, etc.

Goal: implement a common rebase/am backend.
Language: C
Suggested menors:

  • Shawn Pearce (gsoc@spearce.org)
  • Johannes Schindelin (johannes.schindelin@gmx.de)

Suggested by: Junio C Hamano on kerneltrap

Git Eclipse Plugin

Extend the currently-under-development Eclipse plugin so that it is actually useful for daily development work. The plugin can currently read a Git repository (but not quite fast enough) and can write new history to the repository, but a lot of Eclipse user interfaces still need to be built to allow an end-user to actually interact with peers on a project.

Particularly of note is the plugin's missing functionality to push changes to another repository, and to merge changes locally.

This was also one of Eclipse SoC2007 Ideas (More work on git support).

Language: Java (Eclipse API, SWT API)
Goal: Push changes over native Git protocol, or merge changes locally, or both!.
Mentor: Shawn O. Pearce (gsoc@spearce.org)
Suggested by: Jakub Narebski on gmane
See: EclipsePluginWishlist

Implement pack v4/v5 for higher compression

Although Git's packfile format offers incredible disk space savings over other VCS systems, we already know we can still do better. A prototype design dubbed 'packv4' has been floating around the Git world for almost two years, but has not been fully implemented for inclusion in a shipping release.

At this point the implementation for 'packv4' would need to be rewritten from scratch, as much of the affected parts of the core Git code have been modified to support other enhancements. A student with a strong interest in data compression could really get into this project and take it to places we have never thought possible.

Goal: Save at least 10% on current packv2 for benchmark repositories (linux-2.6, git, mozilla, gcc) and boost runtime ~10%.
Language: C
Suggested mentors:

  • Shawn Pearce (gsoc@spearce.org)


See: on kerneltrap and the entire "understanding version 4 packs" thread on gmane

git-cheetah, a TortoiseCVS Lookalike

There is already a basic "shell extension" providing "Git Bash Here" and "Git Gui Here" functions in the context menu. But it still needs a lot of work, to be as helpful as TortoiseCVS.

And when this work is done, let's enhance the code base to provide the same extension to Dolphin, konqueror, Finder, etc

Language: C/COM.
Goal: Have a fully functional TortoiseCVS lookalike for Git.
Mentor: Johannes Schindelin (johannes.schindelin@gmx.de).
Suggested by: Johannes Schindelin

"Smart" HTTP based GIT Transport

Hosting a publicly available Git repository currently requires a server where a TCP port based service may be operated. This not permitted by most of the web site hosting companies except as part of their "dedicated" server offerings. This project is to implement a CGI based "smart" Git transport for Git's fetch and push operations, and the necessary client code. The completed project should also include support both authenticated and anonymous clients.

Most of what is required for the "smart" Git transport for this project is already implemented by the bundle facilities that Git already supports but changes to the existing bundle code may be necessary. The CGI component should be implemented in in a language that a majority of web site hosting companies permit to be used with their product offerings and should rely the Git suite of programs for most of the repository manipulation.

Language: Perl, C, shell, other
Goal: Server (CGI) and client that supports fetch & push operation for both authenticated & anonymous clients.
Mentor:
Suggested by: Gitzilla

Projects So Large Your Head Will Spin

These are not doable as a GSoC project. But a very focused subpart might, so they are listed here to get those hamster wheels in your brain spinning.

Continue "libification" work

Description copied from SoC2007Ideas:

Make more parts of libgit.a reentrant by removing static variables, avoid dying e.g. when malloc() fails, etc. Most of Git was written with the run-once-and-exit design pattern. This makes it difficult/impossible to embed core Git data access routines within a scripting language such as Perl or Python, or to embed it in a longer-running server such as an Apache module. Existing code must be modified to return errors rather than terminating the process, and memory leaks need to be plugged.

Goal: In addition to the general goal of being able to link libgit.a into external programs that are not exclusively git utilities and have the git code be usable as a well-behaved library, one specific goal given on the SoC2007Ideas page was: Usable Perl binding that a future gitweb.cgi could take advantage of, rather than repeatedly forking Git processes.
Language: C
See: The existing work on libgit-thin may be relevant; gitster's recent remarks on #git on gmane.
Suggested mentors:

  • Shawn Pearce (gsoc@spearce.org)

Other Resources

Other links


Personal tools