SmallProjectsIdeas

From Git SCM Wiki
Jump to: navigation, search

This page lists some projects that could be implemented by someone not familiar with Git's code in a few days or weeks. Projects should be smaller/easier than GSoC projects (e.g. SoC-2014-Ideas). It serves as a list of suggestions for Ensimag students, but others are welcome to add/use projects listed here!

Warning: This page lists ideas, but all of them may not be "good" ideas. If you go for one of the ideas listed here, you still need to convince 1) yourself, 2) the list/the maintainer that the idea is good.

See also:

Contents

Fix asciidoc formatting in documentation

The asciidoc markup is not very consistent from page to page. For example, the doc for git config (http://git-scm.com/docs/git-config) often use italic for configuration variables, but also uses monospaced. The convention is documented in CodingGuidelines but should be applied more consistently.

Many fixes have been done already, but more could be done easily with a few regex substitution + manual review:

  • Search for \-- in *.txt
  • Search for unquoted options (--foo or -f) in *.txt, e.g.: git grep -e '[^`]--[a-z]' 'git*.txt'
  • Search for path (/path/to/file)

A bunch of fixes were done in the range of commits 0b65a8dbdb38962e700ee16776a3042beb489060..661c3e9bc064564a492281364413dc805eaddf95, but 1) new breakages seem to have been introduced in the meantime (for example, option names in git-interpret-trailers.txt) and 2) not everything was fixed (for example, git grep "'.*/.*'" returns a lot of instances of path that are quoted with forward quotes).

It would actually be nice to have some of these rules enforced by some kind of tests, so that we avoid regressions in new documentation. A few rules are already implemented in Documentation/lint-gitlink.perl.

Add missing options in documentation synopsis

The documentation for each git command starts with a synopsis. Unfortunately this synopsis often does not show all the available options for the command.

Allow setting an environment variable for every run of Git

My(Matthieu Moy) motivation for this is to allow setting $LANG to en_US.UTF-8 or so, to disable translation. One possibility would be to have a dedicated variable like code.translateOutput=false, but a more general idea would be to allow setting any environment variable with

[env "LANG"]
    set = en_US.UTF-8

Users could possibly use this for other variables like

[env "LESS"]
    set=FRX

or

[env "LESS"]
    unset

This should be relatively easy to implement as we already load the config file on startup.

Add configuration options for some commonly used command-line options

This includes:

  • <are there any useful left?>

Some people always run the command with these options, and would prefer to be able to activate them by default in ~/.gitconfig.

Better indications on merge conflict markers

When Git leaves 3-way merge conflict markers (the <<<< ==== >>>> stuff), it also leaves an indication of which version is which (HEAD, and the commit being merged with it after the <<< and >>> markers).

These indications are confusing for many beginners: merge and rebase leaves markers in different orders (because internally, rebase merges local commits on top of the remote history, so the commit being merged is the local one), and HEAD or a commit id doesn't say much without further indication (HEAD points to different commits at different steps of the rebase, and a commit id is hard to remember). Showing a branch name and/or a commit message instead of a sha1 could improve the situation for example. Having the same order for merge and rebase would probably make more sense from the user's point of view, even though there's an accurate technical justification for the current choice.

git pull --set-upstream

It would be nice to be able to say

git pull --set-upstream remote branch

just like we can already do with "git push".

git push --set-pushremote

For central workflows, we have "git push --set-upstream". Probably something could be done to set branch.*.pushremote automatically to set up a triangular workflow with a command-line option too.

(partly, if not completely obsoleted by the "git push --set-publish" option cooking in pu as of April 22 2014)

Add more builtin patterns for userdiff

"git diff" shows the function name corresponding to each hunk after the @@ ... @@ line. For common languages (C, HTML, Ada, Matlab, ...), the way to find the function name is built-in Git's source code as regular expressions (see userdiff.c). A few languages are common enough to deserve a built-in driver, but are not yet recognized. For example, CSS.

git rebase --status for non-interactive rebase

When "git rebase" stops for some reason (failed exec command, conflicts), git status shows some information about what has been done, why it stopped and what's still to be done. It does not work for "git rebase" without "--interactive" yet (some information is shown, but much less than for interactive).

Safety feature for "git rebase -i" to avoid dropping commits by mistake

Currently, dropping a commit from the todo-list of "git rebase -i" is done by deleting the corresponding line. One issue with this is that the user may drop a line by mistake (e.g. trying to move a line with cut-and-paste, and forgetting to paste one line), and then the commit is silently dropped from history.

One solution, implemented by hg histedit, is to force the user to write "drop <sha1>", and abort the rebase if one commit is missing from the todolist. This safety measure could be activated with a config option like rebase.forceDrop or so, so that old-timers can continue relying on the old behavior.

status: essentially done. Now that the feature has been in Git for a while, we probably want to advertise it a bit more (e.g. consider activating rebase.missingCommitsCheck by default), and the documentation could/should be clearer. See for example https://stackoverflow.com/questions/35846154/git-rebase-interactive-drop-vs-deleting-the-commit-line/35846899 for an example of confused user. Also, part of the checks is missing:

 When rebase.missingCommitsCheck is enabled, git will warn the user if a
 commit was dropped with `git rebase -i'.  This check only occurs after
 the initial editing.  But the user can edit the todo list with
 `--edit-todo'.  Here, git won’t warn the user if a commit was dropped.
 The goal is to have `--edit-todo' to warn the user when
 rebase.missingCommitsCheck is enabled, too.

(https://public-inbox.org/git/38df1011-34d3-e34e-9c4c-b70a0c6576df@gmail.com/)

Allow "git add -p" to use "git diff --color-words" to show hunks

"git add -p" is a very convenient way to stage patch hunks one by one. But when staging hunks that contain a lot of whitespace difference (typically, a re-wrapped paragraph with a few actual changes in a text/LaTeX file), one would appreciate the goodness of "git diff --color-words" when "git add -p" shows the hunk to apply.

Since "git add -p" is written in perl (git-add--interactive.perl), it can't use directly the C API. Right now, the diff coloring is re-implemented in perl, which is fine because diff-coloring is very simple, but we don't want to re-implement the --color-words algorithm in perl. Two possible implementations would be:

  • modify the plumbing diff machinery to offer a per-hunk view, and then let git-add--interactive.perl use it, or
  • run diff with --word-diff=porcelain or find some mode where it spews out both the line-diff (for later application) and the word-diff (for display) in some clever format, so that the perl code can easily match them up. and parse the result on the Perl side

See also the GSoC proposal: https://github.com/peff/git/wiki/SoC-2012-Ideas

Warn before/when rewriting published history

(borrowed from Git's user survey)

Commands like "git rebase", "git rebase -i", "git reset", "git filter-branch", "git commit --amend" can be very powerful to rewrite local history before publishing it. On the other hand, they can be very dangerous if used on an already published history.

Git could relatively easily detect that one is rewriting a commit that is an ancestor of a remote-tracking branch, and warn the user (perhaps giving a way to "git reset --hard" back to the original state).

Two patch series were proposed in 2012, but were far from the expected quality and performance to be merged in git.git :

warn on "rebase -i": http://thread.gmane.org/gmane.comp.version-control.git/199455/focus=199644

Actually, one variant of this would be to allow the todo-list of "git rebase -i" look like "git log --oneline --decorate"

warn on "commit --amend": http://thread.gmane.org/gmane.comp.version-control.git/199455/focus=199767

Mercurial's phase feature can be a source of inspiration: http://mercurial.selenic.com/wiki/Phases

git push --create

(borrowed from Git's user survey)

Today, Git provides no way to create a remote repository. The user has to use other ways (web interface for most online service providers, or open a shell in SSH to run "git init" by hand on the remote server).

Git could optionally allow the user to create a repository on first push, to allow a workflow like

git remote add new-remote ssh://server.com/path/to/new/repo
git push --create --set-upstream

and then use "git push"/"git pull" normally. A patch is already available: http://thread.gmane.org/gmane.comp.version-control.git/111799/focus=112003

git bisect: automatically determine which commit is old and which one is bad

Currently, running

git bisect bad <some old commit>
git bisect good <a recent commit>

raises an error. Instead, "git bisect" could automatically notice that the change is from bad to good (i.e. a fix) and not from good to bad (i.e. a regression).

Similarly, when given a good and a bad commit which are not parent of each other, git bisect tests the merge base, and currently errors out if it is bad. It could continue the bisection and look for a fix between the merge base and the good commit.

See discussion: http://thread.gmane.org/gmane.comp.version-control.git/272792/focus=272791

Allow finer memory management in the strbuf API

Git internally uses a string manipulation API called strbuf which is more user-friendly than the C standard library str{cat,cpy,...} functions. Instead of requiring the user to pre-allocate the destination strings, the strbuf API manages a buffer internally, which is extended as needed.

This is convenient for the user, but may lead to a few unneeded malloc() calls. A detailed proposal to allow the caller of the API to reuse a preallocated buffer has been sent to the mailing list: http://thread.gmane.org/gmane.comp.version-control.git/245929/focus=246470

Status: implemented, see e.g. https://public-inbox.org/git/20160530103642.7213-1-william.duclot@ensimag.grenoble-inp.fr/, but the conclusion is that the possible gain is probably not worth the effort.

Better integration of contrib/ with the build and test system of Git

Currently, commands provided in contrib/ (e.g. git-subtree, or git-remote-mediawiki) have to re-implement most of the Makefile to implement targets like "make doc", "make install-doc", "make test" or "make install". The integration of git-subtree made a first step towards making Git's testsuite useable in contrib/, and git-remote-mediawiki has a small Makefile that can re-use Git's configuration for "make install" in most cases, but they have a few problems:

  • code duplication: see how t/Makefile and contrib/subtree/t/Makefile are similar (i.e. cut-and-pasted from each other)
  • $PATH management: cd $git/contrib/$project/t/ && make test should use $git/bin-wrappers/ and $git/contrib/$project/git-$project, not Git from the $PATH. git-remote-mediawiki has an ugly hack to symlink the executable git-remote-mediawiki directly in the toplevel directory, this should be removed.
  • "make doc" can't currently re-use all the goodness from $git/Documentation/Makefile

git send-email --reply-to <file>

It would be nice if I could say "git send-email --reply-to <file>", feeding it an existing email message. This would set "To" and "Cc" appropriately but also quote the existing message for me to reply to, either into the cover letter or after the "---" in the patch (it's probably an error if there's no cover letter and more than one patch in this mode).

A funny extension would be to allow "git send-email --reply-to-id <message-id>", that would download http://mid.gmane.org/<message-id> for the user (well, hardcoding mid.gmane.org is a bit weird, so ideally it would be configurable)

A way to check merges using textconv filters

The textconv feature allows users to view diffs of, e.g. OpenDocument or MS Word files as text. However, merging these documents far more difficult, and usually has to be done partly by hand (the merge engine in OpenOffice and MS Word are 2-way merges hence very limited). It would be nice if Git could verify a merge after the fact using the textconv drivers. An example case :

  • User has a history like this (both A->B and A->C modify a binary file) :
A --- B
 \
  C
  • User does a merge, resolve the conflicts in the binary file :
A --- B
 \     \
  C --- D
  • Git applies the textconv filter to A, B and C, getting text blobs A', B' and C'.
  • Git does a merge of B' and C' using A' as common ancestor, and gets a merged text blob Dmerged
  • Git applies the textconv filter to D, and gets a text blob Dactual
  • Git does a textual diff between Dmerged and Dactual
    • Hopefully, the diff will be empty (at least with options to ignore whitespace changes)
    • If not, the user gets a diagnosis of where the merge went wrong

This can all be done by hand, but it would be cool if it could be done with a single command (e.g. "git check-merge D --ignore-all-space")

Be nicer to the user on tracked/untracked merge conflicts

When merging a commit which has tracked files with the same name as local untracked files, Git refuses to proceed. It could be nice to:

  • Accept the situation without conflict when the tracked file has the exact same content as the local untracked file (which would become tracked). No data is lost, nothing can be committed accidentally.
  • Possibly, for fast-forward merges, if a local files belongs to the index but not to the last commit, attempt a merge between the upstream version and the local one (resulting in the same content as if the file had just been committed, but without introducing an extra commit).

Recent versions SVN do something similar: on update, it considers added but not committed files like normal tracked files, and attempts a merge of the upstream version with the local one (which always succeeds when the files have identical content). Attempting a merge for non-fast forward cases would probably not make sense: it would mix changes coming from the merge with other changes that do not come from a commit.

Use-cases:

git fsck --repair

When git fsck finds missing or corrupted objects, it would be nice to allow fetching uncorrupted version of these objects from another (backup) repository. That could be something like:

git fsck
# reports eg. missing blob
git fsck --repair-from=/backup/another-repo
# Git copies missing object files
git gc

One could also imagine various heuristics to try to recover lost objects, like hashing all the files in the worktree as blobs and see whether one sha1 matches the missing one.

It seems http://git-repair.branchable.com/ does essentially that, but it would be nice to have the feature built-in Git instead of having to install a separate program.

Allow using # even at the start of a line in a commit message

See discussion: https://public-inbox.org/git/xmqqtwsodsw0.fsf@gitster.dls.corp.google.com/

interpret-trailers improvements

`git interpret-trailers` is currently meant to be called from a hook. It could be improved:

  • Setting up a hook is less flexible than setting up a configuration (which allows per-repo, per-user, per-host). A configuration-based system would be nice too.

Allow "git rebase --autosquash" to understand "fixup! $commit # $message"

When --autoquash is used locally by a single user, there's usually no need for a real commit message. It's OK to have commit message like "fixup! <whole subject line" then.

But when working collaboratively on a branch, a possible use of the --autosquash is to have multiple people suggesting amendments on the existing branch. Then, other developers will want to review the proposed fixups, and they will appreciate a nice commit message.

One option would be to allow commit message like: "fixup! $commit # $message". "git rebase --autosquash" would ignore the part behind the "#" and consider it as if the subject line was just "fixup! $commit". This could work both for a $commit being the sha1 of a commit or the subject line of a commit. Perhaps there is a better syntax (I initially thought of spelling it "fixup! $commit: $message" but this would be ambiguous when $commit is the subject line because it often contains a : (and less often a # even though we cannot rule it out).

See discussion here: http://thread.gmane.org/gmane.comp.version-control.git/288173/focus=288185

Detect inconsistencies between index and object database

Currently, the index contains a "size" field, which may be out-of-sync with the actual size of the object. In other words, git ls-files --debug reports a size different from git cat-file -p $sha1 | wc -c. When this happens, git status reports spurious "modified" entries for the files in question. Currently, git fsck does not report any error in these cases.

Personal tools