SoC2007Ideas

From Git SCM Wiki
Jump to: navigation, search

Welcome!

Git has been accepted as a mentoring organization for the 2007 edition of Google Summer of Code. This is the first year that the Git community has participated, and we're pretty excited about the possibilities that GSoC will bring.

This page contains project ideas culled from the Git user and developer community. You can get started by reading some project descriptions, and the mailing list thread(s) that spawned them. Also consider joining the developer mailing list, or finding us on IRC. Details can be found in GitCommunity.

Don't like a project you see here? Toss your own idea out there, it may be just the thing we have been looking for!

You can find projects which got accepted by Google for Summer of Code 2007 at SoC2007Projects wiki page. The list of projects accepted was taken from GSoC Organization Information for Git Development Community.

General Requirements

All projects have the following basic requirements:

  • Unless otherwise stated, projects will require programming in C, as nearly all of Git is written in C, for maximum speed and portability.
  • All materials must be released under the GNU Public License (GPL), version 2.
  • Individual students shall retain copyright on their works.
  • Projects must be tracked and managed in Git, and published on http://repo.or.cz.
  • Weekly project status reports should be sent to the project's mentors. Each status report should outline what was accomplished that week, any issues that prevented that week's goals from being completed, and your goals for the next week. This will help you to break your project down into manageable chunks, and will also help the project's mentors to better support your efforts.

Interested students are encouraged to read the Advice for GSoC Students Page, as it has excellent suggestions that might help you to pick a project and shape your proposal.

If your proposal is accepted by the Git Development Community you will be expected to work on it full time during the summer. Its cool if you want to take a week off for vacation, but remember that Google is hiring you for the summer to help us improve Git. That should be your focus. Don't expect that you will be able to work on your project for just 10 hours a week and then collect at the end.

If your original proposal doesn't pan out or becomes too much of a challenge, you should work with your mentor to help redefine it. We really want to see every project succeed this summer, as there is a great deal of interest in these projects from within the user community.

Students can apply for the program at the Google Summer of Code website. Please consider reviewing our SoC2007Template and answering its questions as part of your application.

New To Git? New To Open Source Development?

A collection of smaller projects. Pick one or two and get introduced to the open source community!

More Complete Tests

The Git test suite does not cover enough of the code base, there are still plenty of corners that lack automated and repeatable tests. Our current set of tests has served us well by catching many regressions before they are published, but more are needed to maintain high quality releases in the future.

The test suite is currently written in Bourne-shell. Any additions or improvements should also be done in Bourne-shell.
Goal: Significantly extend the Git test suite.
Mentor: Shawn O. Pearce (gsoc@spearce.org), Brian Gernhardt (benji@silverinsanity.com), Johannes Schindelin (johannes.schindelin@gmx.de)
Suggested by: Johannes Schindelin on gmane

More tools Builtin

Convert existing shell and Perl scripts to more portable C. We have generally found that the C version of a tool is more portable and easier to maintain across systems, not to mention faster.

Small cases like git-gc and git-merge are fairly easy, larger cases like git-fetch will require some work. Figure three days for the smaller cases, and a couple of weeks for the larger ones. We currently have 33 scripts up for grabs, making a complete conversion more than a full project. Pick a few to convert, and maybe another small project on the side.
Goal: Replace most core scripts with C versions.
Mentor: Shawn O. Pearce (gsoc@spearce.org), Petr Baudis (pasky@suse.cz), Johannes Schindelin (johannes.schindelin@gmx.de)
Suggested by: Johannes Schindelin on gmane

One Click Windows Install

Make it so that one .exe click-and-install would make the full power of Git available to users running on Windows (2000/XP/2003/Vista). The MinGW port somewhat addresses this need, but much work still remains to make it stable, fully functional, and click-and-installable.

Improvements to git-gui (written in Tcl/Tk) might also be implemented as part of this project, as git-gui is rapidly becoming the user-friendly interface to Git.
Goal: Stable release-candidate-quality one-click-installable Git for Windows.
Mentor: Shawn O. Pearce (gsoc@spearce.org), Johannes Schindelin (johannes.schindelin@gmx.de)
Suggested by: Andy Parkins on gmane

Lightweight Checkout

Otherwise known as the .gitlink idea.

Have a file that would point to the object directory, refs directory, working directory private index file and `HEAD` file. This might also help submodule support (providing where to search for object database, index and refs). This feature is a bit similar to `CVS/Root` in CVS. Users might activate this feature by a new option to git-clone.
Goal: Make it possible to easily use one .git directory with multiple working directories.
Mentor: Shawn O. Pearce (gsoc@spearce.org), Petr Baudis (pasky@suse.cz)
Suggested by: Jakub Narebski on gmane
See: Light-weight checkouts via ".gitlink" thread (started by Josef Weidendorfer)

Apply sparse To Fix Errors

Even better, teach sparse how to fix common errors, and then use sparse to actually fix them. Perhaps more of a sparse project than a Git project. The community just wants to see Git improved, if sparse is improved at the same time, double bonus points!  :-)
Goal: Fix existing errors in Git.
Mentor: Shawn O. Pearce (gsoc@spearce.org), Brian Gernhardt (benji@silverinsanity.com), Johannes Schindelin (johannes.schindelin@gmx.de)
Suggested by: Johannes Schindelin on gmane

Partial/Sparse Checkouts

Where you can checkout for example only Documentation subdirectory, work on it, but commit full tree. Some workflows may be better suited to this type of usage than using submodules. Some core support already exists, but some tricks need to be used to get everything working right. The real work in this project is figuring out how to use those tricks, and using them in the right places.  ;)

Merging is somewhat complicated here as a subdirectory checkout user probably cannot resolve conflicts in parts of the tree that they do not have checked out. These types of merges would probably need to be detected and stopped.
Goal: Make it possible to checkout a subdirectory on a branch, commit to it, and merge with another branch.
Mentor: Shawn O. Pearce (gsoc@spearce.org), Petr Baudis (pasky@suse.cz)
Suggested by: Jakub Narebski on gmane

Larger Projects

The following projects are probably more suited to the junio/senior/graduate students. Pick one and really make it shine.

Subproject/Submodule Support

Much talked about (and even has three different independent implementations, none were ever accepted for release).

Subproject (submodule) support would certainly help using Git for large modular projects like KDE, Mozilla or Linux distributions. A submodule (or subproject) is a part of project (usually one subdirectory) that is a separately SCM managed project (it is SVN externals done right). For example external/separately managed library that other projects uses and probably modifies to meet their own needs.
Goal: A working subproject implementation that is seriously considered for acceptance into a future release. Note that the three existing implementations did not make the cut!
Mentor: Shawn O. Pearce (gsoc@spearce.org), Martin Waitz (tali@admingilde.org), Petr Baudis (pasky@suse.cz)
Suggested by: Raimund Bauer on gmane, Jakub Narebski on gmane
See: SubprojectSupport

Libification

Make more parts of libgit.a reentrant by removing static variables, avoid dying e.g. when malloc() fails, etc. Most of Git was written with the run-once-and-exit design pattern. This makes it difficult/impossible to embed core Git data access routines within a scripting language such as Perl or Python, or to embed it in a longer-running server such as an Apache module. Existing code must be modified to return errors rather than terminating the process, and memory leaks need to be plugged.
Goal: Usable Perl binding that a future gitweb.cgi could take advantage of, rather than repeatedly forking Git processes.
Mentor: Shawn O. Pearce (gsoc@spearce.org), Petr Baudis (pasky@suse.cz)
Suggested by: Jakub Narebski on gmane

Git Eclipse Plugin

Extend the currently-under-development Eclipse plugin so that it is actually useful for daily development work. The plugin can currently read a Git repository (but not fast enough) and can write new history to the repository, but a lot of Eclipse user interfaces still need to be built to allow an end-user to actually commit changes, or transfer them over the network. This is also one of Eclipse SoC2007 Ideas (More work on git support).
Language: Java (Eclipse API, SWT API)
Goal: Commit, create branches, delete branches, switch between branches. Maybe also fetch changes to tracking branches and push changes to remote branches, depending on the student's comfort level with the Eclipse API.
Mentor: Shawn O. Pearce (gsoc@spearce.org)
Suggested by: Jakub Narebski on gmane
See: EclipsePluginWishlist

blame Merge Strategy

A new merge strategy "merge-blame". By running "git-blame -C $ancestor..$other -- $path", you can find $path contains pieces from other paths (it could be a single other path, in which case you found a wholesale rename). Then you can run "git-diff $ancestor..$mine -- $other_path" for each of these paths to find out if our branch changes the parts of them and how. By incorporating these changes you can come up a merged $path that reflects changes made to other paths, in other words, tracking code movements.

This strategy should probably be a function within merge-recursive that is activated only when merge-recursive is invoked as merge-blame (or some perhaps some better name). Further it probably should only be used when a file fails to be merged cleanly, as a "last resort before asking the developer to do it for me" sort of trick. Rationale here is the blame computation is non-trivial and will take some CPU time, so we want to avoid that in the common cases of non-moved code.
Goal: Demonstrate a working merge-blame that can fairly accurately merge changes made to moved code segments.
Mentor: Shawn O. Pearce (gsoc@spearce.org)
Suggested by: Anonymously added to the Wiki; on gmane

True Branch Support In cvsserver

This will make it easier for projects to migrate from CVS to Git. The idea would be to expose the Git branch space as CVS branches (and likewise the Git tag space as CVS tags), permitting existing CVS clients such as TortoiseCVS to more easily access these objects.

git-cvsserver is currently written in Perl and makes heavy use of SQLLite.
Language: Perl, SQL
Goal: Allow CVS clients to work with Git branches more naturally from within the client itself.
Mentor: Martin Langhoff
Suggested by: Martin Langhoff

Large Scale Repository Migration

Continue work in large-scale repository migration. The goal is to demonstrate a successful Mozilla CVS import, based on the work started by Jon Smirl gmane thread.

The conversion tools should take advantage of git-fast-import to actually create and update the Git repository files, as it was designed and built specifically to support these large scale migration efforts.

Conversion tools might be coded in C, Perl, Python, Ruby, etc., as they are not typically considered to be a core part of the Git system that must be available on all platforms "out of the box".
Language: Open for proposal.
Goal: Successfully import the Mozilla CVS repository, including all branches and tags. Bonus points for also being able to (reasonably) compute branch merge points and record these in Git as proper merge commits.
Mentor: Martin Langhoff
Suggested by: Martin Langhoff

Lazy Clone

Aka remote alternates. The idea here is to be able to remotely access objects from a network based object server, rather than having them all local. Users could initialize a working directory by fetching only the objects referenced by the tip commit, and then only fetch needed objects on demand in the future.

This is an extension of the existing "shallow clone" support as we want to cut not just commits off, but whole blobs and trees to. In other words, a client might wind up downloading and caching all commits for the past 3 months, but not the blobs or trees associated with them. Instead it would only have the most recent blobs/trees, such as the tip revision and one or two revisions back.

This is tricky to do right, as git-fsck needs to know what is going on, and many tools assume that if we have a commit, we have all trees/blobs mentioned by that commit.
Goal: A working lazy clone prototype implementation that could be considered for inclusion.
Mentor: Shawn O. Pearce (gsoc@spearce.org), Petr Baudis (pasky@suse.cz)
Suggested by: Jakub Narebski on gmane

gitweb Caching, gitweb Improvements

Bringing together all gitweb implementations. Implementing very smart caching in gitweb, to avoid the thundering herd problem on kernel.org whenever a repository gets updated, or at least support for caching engines in the form of generating proper Last-Modified: and ETag: headers, and responding to If-Modified-Since: and If-None-Match: requests. Perhaps becoming the gitweb maintainer could come of it, or at least the gitweb admin for kernel.org (sorely needed).

Merging interesting and useful features like the client-side commit graph (gitk like clone) such as on http://repo.or.cz/, or the incremental blame support proposed on the Git mailing list would be nice. Another idea is to generate some data (such as what is displayed on the front page) on update time, and saved in a format which can be read quickly by Perl.

This project might also work nicely in parallel with the Libification project above, as gitweb could really benefit from a libified git.
Language: Perl; some HTML, CSS, and JavaScript depending on the actual tasks taken on by the student.
Goal: Improve gitweb performance on large installations like kernel.org; merge some interesting features from other gitweb forks/clones.
Mentor: Shawn O. Pearce (gsoc@spearce.org), Petr Baudis (pasky@suse.cz)
Suggested by: Jakub Narebski on gmane
See: gitweb entry on InterfacesFrontendsAndTools page.

Checkout Hooks

Finish the already started file attribute support. Currently Git only tracks the executable bit of a file, and does not track other interesting data such as owner, group, full permission bits, POSIX ACL, extended attributes, etc. One possible use for a checkout hook would be to restore these values after checking a file out from the repository, and to save these values into an auxiliary metadata upon checking it into the repository.

Another possibility is to use a checkout/checkin hook system to automatically transform OpenOffice document files. Since these are binary ZIP files they do not delta very well within Git packfiles, by uncompressing the ZIP file and storing the raw XML Git can get good delta compression. By recreating the ZIP file during checkout, OpenOffice never knows the difference.

This idea can be extended to other compression and archive formats, too. Sam Vilain made a "git-hash-archive" that transparently stored gzip files, keeping the same content MD-5 hash (though that part was admittedly clinically insane).
Goal: Improved file attribute support capable of saving/restoring auxiliary attributes associated with files and transforming files to a canonical "in repository" format, and back again.
Mentor: Shawn O. Pearce (gsoc@spearce.org)
Suggested by: David Lang, privately suggested by email.

Gittorrent Server and Peer

Implement a server and P2P client/server for the distributed gittorrent specification (see http://gittorrent.utsl.gen.nz/). This project has some interesting benefits for larger projects such as kernel.org, where the mirror servers are currently overloaded trying to serve up all of the Git repositories and other files.

There is already an implementation of parts of the protocol as a C library (currently at http://jonas.nitro.dk/gittorrent/gittorrent.git/) You can use this library, linked with the language of your choice, or re-implement completely in another language. I'd prefer implementations in C, Perl (any version), Java, Ruby, or Python, but please feel free to ask whether I feel up to assisting you in your favorite language. :-)
Language: Open for proposal.
Goal: Hosting of massive projects, massively parallel mirroring.
Mentor: Sam Vilain (samv@catalyst.net.nz)
Suggested by: Sam Vilain

Git / Subversion Interoperability

The idea here is implement something in Git that speaks the Subversion protocol on the wire, but uses Git as the backend storage. (This would be like the existing git-cvsserver.)

There are two potential approaches:

  1. git-svnserver
  2. write a backend for Subversion

In order to do this, you could make use of the reflog support already built into Git. That enables a you to assign a monotonically increasing number to each commit in a repository, along a single branch, or by combining reflogs, along multiple branches.
Language: Open for proposal.
Goal: For projects already using Subversion, they could transparently migrate to using Git (SVN clients, Git server).
Mentor: Open for proposal.
Suggested by: Anand Kumria

HareGIT, a TortoiseCVS Lookalike

Many Windows users are already used to TortoiseCVS. Something like this is clearly lacking for Git.

There is a small proof-of-concept code to make a "Shell Extension", i.e. a DLL which adds a menu entry to the context menu in the "Windows Explorer". This proof-of-concept code is meant to be compiled with the MSYS/MinGW combo, so you only need free tools.
Language: C/COM.
Goal: Have a TortoiseCVS lookalike for Git.
Mentor: Johannes Schindelin (johannes.schindelin@gmx.de), ANYONE ELSE?.
Suggested by: Johannes Schindelin

Other Resources

See also our SoC2007Application.


Personal tools