SoC2010Ideas

From Git SCM Wiki
Jump to: navigation, search

OBSOLETE CONTENT

This wiki has been archived and the content is no longer updated. Please visit git-scm.com/doc for up-to-date documentation.

Table of Contents:

Contents

Projects touching the core of Git

Unify Pathspec Semantics

There is a longstanding inconsistency in pathspec limiting done by various families of commands. We should unify them.

  • The pathspec limiter that originates from "git diff-tree" and shared with "git log" is the most limited. It only allows you to specify leading paths and limits by leading substring match (anchored at directory boundaries '/'); no globbing is supported.
  • The pathspec limiter used by "git ls-files -- $path" and shared by "git checkout [treeish] -- $path" is more flexible. In addition to the leading directory match above, it does allow globbing.

The latter unfortunately has at least two different implementations. The one ls-files and checkout uses is in dir.c; grep uses a different implementation that is more advanced and better suited for a caller that can benefit from being able to skip a whole subdirectory.

Goal: Make "git log 't/*.sh'" work as expected, without introducing unnecessary overhead.
Language: C


Git Submodules Enhancements

Git submodules are architecturally nice, but could do with more work on its user interface and functionality. Tasks carried out by tools like repo (used for Android) could be accommodated by core git functionality.

Major bugs:

  • the submodules in a git checkout do not work as regular files do in general through the git plumbing, it would be nice to fix that. eg, switching between branches with submodules in a different layout should be more transparent.
  • specifying a symbolic branch name to check out in .gitmodules, and making the default behaviour on 'git submodule update' to check out that branch would be useful
  • it's far too easy to create a commit in a supermodule which contains a commit which does not yet exist in the submodule repo. This should require forcing
  • It should be possible to use relative URLs in .gitmodules
  • Enhancements to tools and CLIs to make working with submodules more easy.

Goal: Get one or more of the above major bugs fixed and the changes accepted into git core.
Language: C


Git Fetch/Clone Mirror Support

Git automatic mirroring is a system for making Git repositories distribute automatically. It is the first stage of more general P2P features planned. The idea is that when fetching data from a server, the server can inform the client of mirrors of the repository, which the client can fetch the majority of the data from, then fetch any remaining objects from the main server. This way large Git projects can be served from local mirrors for distributed teams, or large hosts such as git.kernel.org or github can effectively offload bandwidth to closer or more lightly loaded servers.

More details of the proposed updates are in wave; exported to: http://exporty-bot.appspot.com/export?waveId=googlewave.com!w%252BX89ak6dHA

Goal: Allow Git fetch (client) and upload-pack (server) commands to list and partially fetch from mirror servers.
Language: C
Possible Mentors: Scott Chacon / Sam Vilain

git sequencer again

There was a "git sequencer" GSoC 2008 project, the student was Stephan Beyer and the mentors were Christian Couder and Daniel Barkalow. This project came up with some working code but it did not get merged.

The goal of this project is to take some features, ideas and/or code from the former git sequencer project and other related efforts by Christian Couder to port "rebase -i" to C, and to step by step get them properly integrated and merged into Git mainline.

One intermediate step could be to have something like "git cherry-pick A..B" that many people have asked for.

Goal: Get some git sequencer related features properly merged.
Language: C
Mentor: Christian Couder

Line-level history browser

Output from "git blame" is often not useful, as you do not see the actual changes. It could have been an indentation change, a typo fix, a code movement, or really the origin of the code that you are looking for.

What is needed, then, is a browser similar to "git log", except that it should be able to follow the history of particular line ranges, not whole files.

Like "git log", it should be possible to see the commits, the diffs (restricted to the interesting area), and code moves. It would be mighty cool if it could see code copies, too.

Some of above functionality can be found in graphical blame browsers such like sample program blameview from `contrib/` (Perl, Gtk2), or "git gui blame" (Tcl/Tk), or git-age 'blame visualizer' (PyGTK).

Note that this project is about providing a command line program working at line level (and certainly accessing the objects directly for best performance). This is not the same as Giddy (see below), which is a graphical tool working on arbitrary content fragments (and possibly just using Git's pickaxe interface).

Goal: A program that can follow history of line ranges.
Language: C
Mentor: Thomas Rast, Co-Mentor: Jens Lehmann

Add-ons to Git

Merge helper for LaTeX files

Not everything tracked by Git is source code for programs; Git is actually a fantastic tool to collaborate on writing documents in LaTeX format.

Unfortunately, the merge conflicts created by Git's recursive merge are better suited for C code than for LaTeX, as the latter is frequently written such that a single line will produce a whole paragraph.

Therefore, a tool is needed which uses Git's output to present the differences between the two differing versions and the base version in a nice way. The tool should also offer an easy method to pick one version over the other, and to edit the result (and still show the three versions, pointing to the position corresponding to the current cursor position).

Goal: Provide a graphical tool to help with merge conflicts specifically in LaTeX files.
Language: As preferred by the student, but it should be portable between Linux, MacOSX and Windows.
Mentor: NN

A remote helper for svn

Since the 1.6.6 release git has support for 'remote helpers'. These are standalone programs that allow git to talk to foreign scm's. On the other side, git has had support for 'git-svn' for quite a while now, providing (somewhat clumsy) support for working with svn repositories with git.

In order to make working with svn repositories as easy as working with regular git repositories, a new remote helper needs to be written that will bridge between svn and git. This can be done either by rewriting the current 'git svn' code (which is written in perl, and somewhat arcane), or by creating a helper from scratch, possibly using the svn bindings directly (although that might result in licensing conflicts).

Implementing a remote helper is fairly simple, it can be as simple as hooking up an implementation of fast-export and fast-import for the target source. Doing so has a lot of advantages though, since it allows users to natively access the other vcs. That is, at the end of the project, 'git clone svn://example.com' will Just Work. Remote helpers can be implemented in any language (as long as they speak the already defined 'remote-helper' protocol), but unless there is a very compelling reason not to, should be implemented in C, perl, or python.

Goal: Create a remote-helper for svn.
Language: C (preferred), perl, or python.
Mentor: Sverre Rabbelier

Content History Browser

There is plenty of various visual commit history browsers, but little has been tried yet with regards to the really innovative stuff Git allows thanks to its speedy history access and what has been envisioned at the dawn of some of Git's basic design choices. This project should try to create a graphical interface where the content is the primary object (instead of the commits) and that allows to easily visually follow history of certain content fragment (simply mouse selection, or auto-selection of certain code block, ...): how it moved within a single file and between files, when did it appear, who changed the code block in which way.

Petr Baudis already implemented a simple proof-of-concept tool giddy, however it stopped at providing trivial point-and-click pickaxe interface and lamely visualizing diffs within the file; it cannot visualize the fragment's "path" through history or follow it across changes within the fragment. Nevertheless, Giddy might be used as a base for this project - or the implementation could start from scratch.

The project will involve combination of routine GUI development, smart UI design and research of proper heuristics to allow compensation against changes within the fragment. Ideas for future extensions include integration with 'blame' view of the content, showing "age" of various code parts like in git-age, caching the movement information, etc.

Goal: Graphical application allowing to easily select a content fragment and visualize its history and movement even through smaller changes within the fragment.
Language: Reasonably portable with low startup time; ideally C/C++/Perl/Python.
Possible Mentor: Petr Baudis

Patch management on top of git

Patch management is needed mostly by linux distributions (Debian, Fedora) to manage changes applied to the original upstream tarball of a packaged software. One popular project that tried to provide this feature is topgit. However it seems that topgit is not the best solution to the problem and much to complicate to gain broad addoption. An alternative approach the subject has been presented in this mail to the git list: rethinking patch management with GIT / topgit. The task of this project would be to work out the details of this idea, provide an implementation an test whether it's usable in distribution packaging.

Language: Perl, Python, C, shellscript
Possilbe Mentors: NN, eventually Thomas Koch
Suggested by: Thomas Koch

libgit2

Complete libgit2

Git's source code was not written to be reentrant, but to execute a program doing a specific task, and then exit. This is not really useful if you want to access a Git repository using a library rather than executing a program, catching and parsing its output. Further, some companies seem to be put off by the GPL that applies to Git's source code, and would prefer a BSD licensed library to base their product on.

Shawn Pearce already started libgit2, but as of now most functions are not implemented. This project's purpose is to implement most of these functions, and maybe provide language bindings for Python, Ruby, etc.

In order to consider the libgit2 project successful, the library should be able to read and write the git index file as well as reading (and understanding) and writing all kinds of objects and refs (blobs, trees, commits, heads, tags).

Some things are out of scope for the GSoC project, since they're rather specialized areas that not many users would benefit hugely from.
It is not necessary for the library to be able to write packfiles.
It is not necessary for the library to be able to generate patches.
It is not necessary for the library to be able to create merges.

It is necessary for the library to be able to read objects from the loose object store as well as from packs (this is already implemented to some degree).
It is necessary for the library to be able to manipulate the git index file.
It is necessary for the library to be able to manipulate branch heads and tags (including packed refs).
It is necessary for the library to understand the configuration file format and certain configuration parameters from the config file.
It is necessary to properly document each and every publicly available library function with doxygen style comments.
It is necessary to have unit- and function-tests for all non-trivial code. This is a very strict requirement.

This covers about 95% of the everyday functionality that git provides to the majority of its userbase.

In essence, one should be able to write "git fsck", "git add", "git rm", "git commit-tree", "git tag" and "git update-ref" type applications using only library calls to access and manipulate the git repository.

Goal: Implement enough of libgit2 to access a Git repository from a C program and/or a Python/Ruby/Whatever script.
Language: C, doxygen
Possible Mentors: Andreas Ericsson (ae@op5.com)
Suggested By: Scott Chacon

EGit/JGit related projects

An implementation of Git's recursive merge in JGit

The recursive merge algorithm is surprisingly powerful for its simplicity. A few fine points have to be taken care of, though, such as rename detection and file/directory conflicts.

The basic building block, however, is a diff algorithm, and JGit recently got its own implementation of that.

An implementation of the recursive merge will allow easy implementation of cherry-pick, and consequently of rebase -i, too.

Goal: Implement the recursive merge algorithm in JGit.
Language: Java.
Mentor: NN

jrebase -i

One of the parts of Git most liked by power users is the interactive rebase. This project is about having the same functionality and more in the Java implementation of Git.

With a proper GUI, this could even be used in EGit.

This project relies on the recursive merge in JGit project.

Goal: Implement 'rebase -i' in JGit.
Language: Java.
Mentor: NN

Platform-specific projects

Several small projects improving msysGit

The msysGit project provides everything you need to compile Git for Windows (except Windows).

A couple of projects are in early stages, but lack the time of a GSoC project to be completed:

  • Add new functions to Git Cheetah, our TortoiseCVS lookalike (which is already cross-platform, thanks to Heiko Voigt, but needs more functions).
  • Provide a script compiling Perl with Subversion bindings for MinGW (rather than MSys). This should make all Perl-related stuff in Git for Windows much, much faster.
  • Provide a script compiling Python within msysGit (we will need Python support once Sverre and Johan manage to finalize their Mercurial and CVS remote helpers, respectively).
  • Provide a recipe how to cross-compile a full Git for Windows installer on Linux (the biggest problem is how to get InnoSetup to run).
  • Describe the common msysGit workflows to increase the bus number of the msysGit project.
  • Provide a recipe to cross-compile Git for 64-bit Windows using the fantastic MinGW-W64 project.

Goal: Provide long-needed improvements to msysGit.
Language: C. Pascal. Wiki markup.
Mentor: NN


Other sources of ideas

  • Ideas from previous Google Summer of Code. Note that some of ideas got implemented since then.
    • SoC2007Ideas (e.g. .gitlink, lazy clone, GitTorrent, blame merge strategy, git-svnserver)
    • SoC2008Ideas (e.g. resumable clone/fetch, pack v4/v5)
    • SoC2009Ideas (e.g. packfile caching for git-daemon, git mirror-sync, libgit2, directory renames)
Personal tools