ExampleScripts

From Git SCM Wiki
Jump to: navigation, search

Example scripts

Often, new users request a new feature of Git, which is very easy to do in a script. In fact, most of Git's user interface originated as a bash or Perl script. As a consequence, the interfaces of core Git programs (e.g. `rev-list` or `diff-tree`) are quite easy to use from scripts.

Table of contents:

Contents


git log

For example, the command `git log` originally looked like this:

#!/bin/sh
. git-sh-setup-script || die "Not a git archive"
git-rev-list --pretty $(git-rev-parse --default HEAD "$@") | LESS=-S ${PAGER:-less}

(To see that this is true, just say `git show v0.99:git-log-script`.)

The basic concept of these scripts was to parse the options with `git-rev-parse`, possibly filtering just the revision parameters, or just the flags, and then call `git-rev-list` on the revisions.

Finding which commits last touched the files

If you _need_ to know which commit gave what file its current form, this script will help you:

(_please_, if you try to be helpful and edit the script, make _at least_ sure that it _still runs_ (that is particularly true if you insist on adding "use strict". Yikes, I thought this goes without saying!)

#!/usr/bin/perl

my %attributions;
my @files;

open IN, "git ls-tree -r --full-name HEAD |" or die;
while (<IN>) {
	if (/^\S+\s+blob \S+\s+(\S+)$/) {
		push(@files, $1);
		$attributions{$1} = -1;
	}
}
close IN;

my $remaining = $#files + 1;

open IN, "git log -r --root --raw --no-abbrev --pretty=format:%h~%an~%ad~ |" or die;
while (<IN>) {
	if (/^([^:~]+)~(.*)~([^~]+)~$/) {
		($commit, $author, $date) = ($1, $2, $3);
	} elsif (/^:\S+\s+1\S+\s+\S+\s+\S+\s+\S\s+(.*)$/) {
		if ($attributions{$1} == -1) {
			$attributions{$1} = "$author, $date ($commit)";
			$remaining--;
			if ($remaining <= 0) {
				break;
			}
		}
	}
}
close IN;

for $f (@files) {
	print "$f	$attributions{$f}\n";
}

Sorting commits by commit message line count / changed lines ratio

With Git, it is easy to write long, meaningful commit messages. Sometimes, the messages contain even more lines than are touched by the patch! You can illustrate that by listing all commits with the ratio between the commit message line count and the amount of lines added/deleted by the commit.

In the following script, three core Git programs are used: `rev-list`, `cat-file` and (not so core) `diff`.

  • `rev-list` shows the object names of a range of commits. In this example, the range is defined by "next", i.e. the tip of the branch called "next" and all its ancestors. To prevent merge commits from being shown, `rev-list` is called with the option "--no-merges".
  • `cat-file` shows the raw content of an object. In this example, it shows the raw commit message. Such a message contains a header, defining which ancestors a commit has, and its tree, among other metadata. An empty line separates the header from the actual commit message.
  • `diff` shows the changes between two objects. In this example, it shows the changes between a commit and its parent.

There are two functions to get the line counts, `count_lines_in_message` and `count_changed_lines`, and the main loop. This loop reads one object name from `rev-list` after another, storing the information in a hash, which is put into the array "@array". After sorting "@array", the elements are shown, one per line.

#!/usr/bin/perl

use strict;

sub count_lines_in_message ($) {
        my $sha1 = $_[0];
        open COMMIT, "git cat-file -p $sha1 |" or die;
        my $header = 1;
        my $count = 0;
        while (<COMMIT>) {
                if (!$header) {
                        $count++;
                } elsif (/^$/) {
                        $header = 0;
                }
        }
        close(COMMIT);
        return $count;
}

sub count_changed_lines ($) {
        my $sha1 = $_[0];
        open DIFF, "git diff $sha1^..$sha1 |" or die;
        $count = 0;
        while (<DIFF>) {
                if (/^[\+\-]/ && !/^(\+\+\+|---)/) {
                        $count++;
                }
        }
        close(DIFF);
        return $count;
}

my @commits;

open IN, "git rev-list --no-merges next |" or die;
while (<IN>) {
        if (/^([0-9a-f]{40})/) {
                my $commit = $1;
                my $count_message = count_lines_in_message($commit);
                my $count_diff = count_changed_lines($commit);
                if ($count_diff == 0) {
                        $ratio = 1e9;
                } else {
                        $ratio = $count_message / $count_diff;
                }
                push(@commits, {
                        ratio => $ratio,
                        sha1 => $commit,
                        message => $count_message,
                        diff => $count_diff
                });
        }
}

@commits = sort {$b->{ratio} cmp $a->{ratio} } @commits;

foreach my $c (@commits) {
        print substr($c->{sha1}, 0, 8) . "... " . $c->{message} . " / "
                . $c->{diff} . " = " . $c->{ratio} . "\n";
}

Copying all changed files from the last N commits

Sometimes you need to send all changed files from the last commit or 3 somewhere - maybe email them to a fellow developer, maybe copy them to a network folder, maybe copy them to a folder controlled by some other source code manager that isn't Git, CVS or Svn. This script makes that simple:

#!/bin/bash

# Usage:
# ../copy_git_recent.sh "/destination/path" number_of_revisions
#
# Example:
# ../copy_git_recent.sh "/cygdrive/c/Windows Documents/SourceSafe/Project1" 4

for file in $(git diff-tree master~$2 master --name-only -r); do
  cp --parents "$file" "$1"
done

Setting the timestamps of the files to the commit timestamp of the commit which last touched them

Git lets you do things that might be considered less than useful. In some corner cases, they might make sense, nevertheless. For example, setting the timestamps of the files to the time when they were last updated by a commit, is usually idiotic. It breaks "make", and it completely breaks down when multiple machines are involved, since they do not necessarily share a common time source.

However, normalperson on IRC found a corner case where it might make sense (although you can think of scenarios where it breaks), and was nice enough to provide the script "git-set-file-times":

#!/usr/bin/perl -w
use strict;

# sets mtime and atime of files to the latest commit time in git
#
# This is useful for serving static content (managed by git)
# from a cluster of identically configured HTTP servers.  HTTP
# clients and content delivery networks can get consistent
# Last-Modified headers no matter which HTTP server in the
# cluster they hit.  This should improve caching behavior.
#
# This does not take into account merges, but if you're updating
# every machine in the cluster from the same commit (A) to the
# same commit (B), the mtimes will be _consistent_ across all
# machines if not necessarily accurate.
#
# THIS IS NOT INTENDED TO OPTIMIZE BUILD SYSTEMS SUCH AS 'make'
# YOU HAVE BEEN WARNED!

my %ls = ();
my $commit_time;

if ($ENV{GIT_DIR}) {
	chdir($ENV{GIT_DIR}) or die $!;
}

$/ = "\0";
open FH, 'git ls-files -z|' or die $!;
while (<FH>) {
	chomp;
	$ls{$_} = $_;
}
close FH;


$/ = "\n";
open FH, "git log -m -r --name-only --no-color --pretty=raw -z @ARGV |" or die $!;
while (<FH>) {
	chomp;
	if (/^committer .*? (\d+) (?:[\-\+]\d+)$/) {
		$commit_time = $1;
	} elsif (s/\0\0commit [a-f0-9]{40}( \(from [a-f0-9]{40}\))?$// or s/\0$//) {
		my @files = delete @ls{split(/\0/, $_)};
		@files = grep { defined $_ } @files;
		next unless @files;
		utime $commit_time, $commit_time, @files;
	}
	last unless %ls;

}
close FH;

Output a commit graph with GraphViz' "dot" tool

The GraphViz suite contains programs to render graphs nicely. Since the revision graph is a DAG (directed acyclic graph), you might want to look at the revision graph in this manner.

Call this script with rev-list parameters to get input for "dot".

#!/bin/sh

set -e

echo "digraph lattice {"

shape="shape=Mrecord, style=filled,"
git rev-list --pretty=format:"%H %h|%an:%s" "$@" |
        sed "s/[\"\{\}()<>]/\\\\&/g" |
        sed -n "s/^\([0-9a-f]\{40\}\) \(.*\)$/n\1 [$shape label=\"{\2}\"]/p"

git rev-list --parents "$@" |
        while read commit parents
        do
                for p in $parents
                do
                        echo "n$commit -> n$p"
                done
        done

echo "}"

Make some ASCII art from (part of your) history

You can often see git users illustrating their history using some funny ASCII art illustrations like

A - B - C
  \       \
    D - E - F

Maybe you want to have a similar diagram, but are too lazy to draw the graph? Instead just use this script:

#!/usr/bin/perl

if ($#ARGV < 1) {
	print STDERR "Usage: $ARGV0 <revision range>\n";
	exit(1);
}

open INPUT, 'git rev-list --parents ' . join(' ', @ARGV) . '|';

my %commits;
my @list;

sub add_parents ($$) {
	my $parents = $_[0];
	my $y = $_[1];
	foreach my $parent (split / /,$parents) {
		if (!defined($commits->{$parent})) {
			$commits->{$parent} = {
				y => $y++,
				sha1 => $parent
			};
		} else {
			if ($commits->{$parent}->{y} < $y) {
				$commits->{$parent}->{y} = $y++;
			} else {
				$y = $commits->{$parent}->{y} + 1;
			}
		}
	}
}

# expects output of `rev-list --parents --topo-order`
$i = 0;
while (<INPUT>) {
	if (/^([0-9a-f]{40}) ?(.*)$/) {
		$sha1 = $1;
		$parents = $2;
		if (!defined($commits->{$sha1})) {
			$commits->{$sha1} = {
				y => 0,
				sha1 => $sha1,
			};
		} else {
			$commits->{$sha1}->{index} = $#list;
		}
		$commits->{$sha1}->{parents} = $parents;
		$list[$i] = $commits->{$sha1};
		$commits->{$sha1}->{index} = $i++;
		add_parents($parents, $commits->{$sha1}->{y});
	}
}
close INPUT;

if (@list > 26) {
	print STDERR "Cannot draw more than 26 revs.";
	exit(1);
}

# make labels
$height = 0;
foreach my $i (0 .. $#list) {
	$list[$i]->{x} = $#list - $i;
	$list[$i]->{label} = chr(0x41 + $list[$i]->{x});
	if ($height < $list[$i]->{y}) {
		$height = $list[$i]->{y};
	}
}

# make a canvas
$width = $#list * 2 + 1;
$height = $height * 2 + 1;
@canvas = (' ' x $width . "\n") x $height;

sub set_cell ($$$) {
	my $x = $_[0];
	my $y = $_[1];
	my $c = $_[2];
	substr($canvas[$y], $x, 1) = $c;
}

sub get_cell ($$) {
	my $x = $_[0];
	my $y = $_[1];
	return substr($canvas[$y], $x, 1);
}

sub msg($) {
	my $info = $_[0];
	return $info->{label} . ": " . $info->{x} . ", " . $info->{y};
}

sub draw_line ($$) {
	my $commit1 = $_[0];
	my $commit2 = $_[1];
	my $x1 = $commit1->{x};
	my $y1 = $commit1->{y};
	my $x2 = $commit2->{x};
	my $y2 = $commit2->{y};
	if ($y1 == $y2) {
		for (my $i = $x1 * 2 - 1; $i > $x2 * 2; $i--) {
			set_cell($i, $y1 * 2, "-");
		}
	} else {
		my $is_straight = 0;
		my $factor = ($y2 - $y1) / ($x1 - $x2);
		my $i;
		if ($x1 - $x2 == $y2 - $y1) {
			$is_straight = 1;
			for ($i = $x1 * 2 - 1; $i > $x2 * 2; $i -= 2) {
				my $y = $y1 * 2 + ($x1 * 2 - $i) * $factor;
				my $c = get_cell($i, int($y));
				if ($c ne ' ' && $c ne '-') {
					$is_straight = 0;
					last;
				}
			}
		}
		if ($is_straight) {
			for ($i = $x1 * 2 - 1; $i > $x2 * 2; $i--) {
				my $y = $y1 * 2 + ($x1 * 2 - $i) * $factor;
				my $c = (get_cell($i, int($y)) ne ' ') ?
					'+' : '/';
				set_cell($i, int($y), $c);
			}
		} else {
			set_cell($x1 * 2 - 1, $y1 * 2 + 1, '\'');
			for ($i = $x1 * 2 - 2; $i > $x2 * 2 + 1; $i--) {
				set_cell($i, $y1 * 2 + 1, '-');
			}
			set_cell($x2 * 2 + 1, $y1 * 2 + 1, ',');
			for ($i = $y1 * 2 + 2; $i < $y2 * 2; $i++) {
				my $c = (get_cell($x2 * 2, $i) ne ' ') ?
					'+' : '|';
				set_cell($x2 * 2, $i, $c);
			}
		}
	}
}

# draw it
foreach my $info (@list) {
	$x = $info->{x} * 2;
	$y = $info->{y} * 2;
	set_cell ($x, $y, $info->{label});
	foreach my $parent (split / /,$info->{parents}) {
		if (defined($commits->{$parent}->{index})) {
			draw_line($info, $commits->{$parent});
		}
	}
}

print @canvas;

darcs-record style commit

git-darcs-record, emulate "darcs record" interface on top of a git repository. This is replacement for git commit --interactive

Reference: git commit / darcs record on Raphaël Slinckx blog

#!python
#!/usr/bin/env python
# coding: utf-8

# git-darcs-record, emulate "darcs record" interface on top of a git repository
#
# Usage:
# git-darcs-record first asks for any new file (previously
#    untracked) to be added to the index.
# git-darcs-record then asks for each hunk to be recorded in
#    the next commit. File deletion and binary blobs are supported
# git-darcs-record finally asks for a small commit message and
#    executes the 'git commit' command with the newly created
#    changeset in the index


# Copyright (C) 2007 Raphaël Slinckx <raphael@slinckx.net>
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.

import re, pprint, sys, os

BINARY = re.compile("GIT binary patch")
HEADER = re.compile("diff --git a/(.*) b/(.*)")
class Hunk:
	def __init__(self, lines, binary):
		self.diff = None
		self.lines = lines
		self.keep = False
		self.binary = binary
	
	def format(self):
		output = self.diff.header.modified + "\n"
		if self.diff.header.deleted:
			output = "Removed file: " + output
		if self.binary:
			output = "Binary file changed: " + output

		if not self.binary:
			output += "\n".join(self.lines) + "\n"

		return output

class Header:
	def __init__(self, lines):
		self.lines = lines
		self.modified = None
		self.deleted = False

		# Extract useful info from header from git
		for line in lines:
			if HEADER.match(line):
				match = HEADER.match(line)
				self.modified = match.group(1)
			if line.startswith("deleted "):
				self.deleted = True

		# Make sure we know what file we are modifying
		assert self.modified

class Diff:
	def __init__(self, header, hunks):
		self.header = header
		self.hunks = hunks
		# Put a reference to ourselves in the hunks
		for hunk in self.hunks:
			hunk.diff = self
		self.keep = False
	
	def filter(self):
		output = '\n'.join(self.header.lines) + "\n"
		for hunk in self.hunks:
			if not hunk.keep:
				continue
			output += '\n'.join(hunk.lines) + "\n"
		return output

	@classmethod
	def filter_diffs(kls, diffs):
		output = ""
		for diff in diffs:
			if not diff.keep:
				continue
			output += diff.filter()
		return output
	
	@classmethod
	def parse(kls, lines):
		in_header = True
		binary = False
		header = []
		hunks = []
		current_hunk = []
		for line in lines:
			if in_header and (line[0] not in (" ", "@", "\\") or line.startswith("+++ ") or line.startswith("--- ")) and not BINARY.match(line):
				header.append(line)
			elif BINARY.match(line):
				in_header = False
				binary = True
				header.append(line)
			elif line.startswith("@"):
				in_header = False
				if current_hunk:
					hunks.append(Hunk(current_hunk, binary))
				current_hunk = []
				current_hunk.append(line)
			else:
				current_hunk.append(line)
		if current_hunk:
			hunks.append(Hunk(current_hunk, binary))
		return Diff(Header(header), hunks)

	@classmethod
	def split(kls, lines):
		diffs = []
		current_diff = []
		for line in lines:
			if line.startswith("diff --git "):
				if current_diff:
					diffs.append(current_diff)
				current_diff = []
				current_diff.append(line)
			else:
				current_diff.append(line)
		if current_diff:
			diffs.append(current_diff)
		return [Diff.parse(lines) for lines in diffs]

def read_answer(question, allowed_responses=["Y", "n", "d", "a"]):
	#Make sure there is always a default selection
	assert any(r.isupper() for r in allowed_responses)

	while True:
		resp = raw_input("%s [%s] : " % (question, "".join(allowed_responses)))
		if resp in [r.lower() for r in allowed_responses]:
			break
		elif resp == "":
			resp = [r for r in allowed_responses if r.isupper()][0].lower()
			break
		print 'Unexpected answer: %r' % resp
	return resp


def setup_git_dir():
	global GIT_DIR
	GIT_DIR = os.getcwd()
	while not os.path.exists(os.path.join(GIT_DIR, ".git")):
		GIT_DIR = os.path.dirname(GIT_DIR)
		if GIT_DIR == "/":
			return False
	os.chdir(GIT_DIR)
	return True

def git_get_untracked_files():
	return [f.strip() for f in os.popen("git ls-files --others --exclude-from='%s' --exclude-per-directory=.gitignore" % (os.path.join(GIT_DIR, ".git", "info", "exclude")))]

def git_track_file(f):
	os.spawnvp(os.P_WAIT, "git", ["git", "add", f])

def git_diff():
	return os.popen("git diff -u --no-color --binary").readlines()

def git_apply(patch):
	stdin, stdout = os.popen2(["git", "apply", "--cached",  "-"])
	stdin.write(patch)
	stdin.close()
	output = stdout.read()
	stdout.close()
	os.wait()
	return output

def git_status():
	os.spawnvp(os.P_WAIT, "git", ["git", "status"])

def git_commit(msg):
	os.spawnvp(os.P_WAIT, "git", ["git", "commit", "-m", patch_name])

# Main loop ------------------------
if not setup_git_dir():
	print "Must be in a git (sub-)directory! Exiting..."
	sys.exit()

# Ask for new files ----------------
git_untracked_files = git_get_untracked_files()
git_track_files = []
all = False
done = False
for i, f in enumerate(git_untracked_files):
	if not all:
		print "Add file: ", f
		resp = read_answer("Shall I add this file? (%d/%d)" % (i+1, len(git_untracked_files)))
	else:
		resp = "y"

	if resp == "y":
		git_track_files.append(f)
	elif resp == "a":
		git_track_files.append(f)
		all = True
	elif resp == "d":
		done = True
		break

# Ask for each hunk of the diff
diffs = Diff.split([line[:-1] for line in git_diff()])
total_hunks = sum(len(diff.hunks) for diff in diffs)

n_hunk = 1
for diff in diffs:
	if done:
		break
	for hunk in diff.hunks:
		# Check if we are in override mode
		if not all:
			print
			print hunk.format()
			resp = read_answer('Shall I record this change? (%d/%d)' % (n_hunk, total_hunks))
		# Otherwise say 'y' to all remaining patches
		else:
			resp = "y"

		if resp == "y":
			diff.keep = True
			hunk.keep = True
		elif resp == "a":
			diff.keep = True
			hunk.keep = True
			all = True
		elif resp == "d":
			done = True
			break

		n_hunk += 1

# Add new files to track
for f in git_track_files:
	git_track_file(f)

# Generate a new patch to be used with git apply
new_patch = Diff.filter_diffs(diffs)
if new_patch:
	print git_apply(new_patch)

if new_patch or git_track_files:
	git_status()
	patch_name = raw_input("What is the patch name? ")
	git_commit(patch_name)
else:
	print "Ok, if you don't want to record anything, that's fine!"

Personal tools