git was originally developed to manage the sources of the Linux kernel, a monolithic code base. It has since developed many features that make it attractive as a general purpose SCM, and is now used by all sorts of projects, some of which are developed in a 'modular' style.
The rule of thumb for determining whether a source tree is monolithic or modular is the correlation of subdirectories to release 'tarballs'. The monolithic Linux kernel is released as a single tarball, whereas the modular Xorg project releases dozens of such packages, with each one potentially having its own release schedule.
Non-distributed revision control systems such as CVS or SVN do not enforce this distinction; large projects like KDE and GNOME host hundreds of subprojects in a single repository and developers only ever have to check out the module they are working on. Sometimes, release managers, documenters or translators will make single commits that modify files across several modules. Similarly, if a change is made to the public API of a shared library in one module, the developer may update applications in other modules to use the new API as part of the same commit.
Such flexibility is an implicit feature of centralized SCMs, but is much more difficult to implement in a distributed system like git. As a result, git currently lacks built-in subproject support, although gitweb does have a notion of subprojects.
Plans for subproject support
There are several possible directions for implementing subproject support in git, some of which have been discussed on the list. A good start might be to add scripts to git, cogito or a new porcelain that formalize the semantics of gitweb. It has been suggested that git branches embody the pattern of modules, and it might make sense to use this functionality for modular repositories. Others have attempted to take on the challenge of partially cloned repositories, an ambitious task considering that the semantics of such a feature are as yet undefined for distributed source management.
A prototype implementation of submodules was proposed by Martin Waitz. This prototype uses one parent repository to track other GIT repositories which act as submodules. This way the submodules retain all the advantages of normal repositories. For example they can be independently changed, merged and pulled/pushed to remote sites. But they are also part of the parent repository so that each version of the parent can specify a consistent tree even when it contains several submodules.
(as of March 26, 2007)
Martin Waitz implementation's object-level extensions have been well-reviewed, and are generally accepted as being the way to go at this point. Effectively, they allow tree objects to contain entries which have a special file mode (directory + symlink) which holds a commit from a subproject; the interpretation of this for the filesystem structure is that the commit's tree should be placed rooted at the path of the entry.
The initial implementation ran into scalability constraints when applied to vast superprojects (the use case where a couple hundred full-sized projects are all made subprojects of a superproject which is an entire operating system distribution, for example). This implementation has not been completed due to these problems.
A newer implementation is in the works which should be able to handle this case efficiently; this implementation does not require changes to the object-level extension proposed previously, so the initial implemention may be used for projects where scalability on this order is not important. The main difference is in where the objects in a subproject are stored.
- Obsolete: Notes on Subproject Support by Junio C Hamano in todo branch in git repository. This idea was abandoned.
- prototype implementation by Martin Waitz
-  A very basic patch to the pre and post commit hooks to track sub-repositories. No merge, clone, fetch, or anything else support - just notes the latest revision of the submodule in the supermodule. Read the whole thread to get the fixes to the patch.