Git Overview

My version control journey started with CVS, after that I looked at SVN, but never really used it. The shortcomings of centralized repositories were too obvious and with my increasing interest in Haskell I jumped on the distributed version control train with Darcs. I really, really liked it, but it had some nasty things too. After a while I was looking for something different and stumbled over Mercurial, again I was really happy with it but somehow my journey wasn’t over yet.

Recently I was watching two Git techtalks on YouTube:

The two talks are very different. The first one is more about why there’s place for yet another source code management with the rare opportunity to see Linus Torvalds giving a talk. The second one is about Git’s technicalities (commands etc.).

Basic Terms

Git distinguishes between the workspace, the index/staging area, the object store and remote repositories.

The workspace is where you do your actual work. It contains tracked files, untracked files and a special directory “.git”.

The index (or staging area) is used for preparing commits. You can add files to the next commit or even only parts of files for the next commit. All the changes you’d like to commit get into the index first. A commit takes everything in the index and persists it in the object store.

The object store isn’t really an issue here, and remote repositories are more or less clear what they are.

Quick Intro

For a more complete introduction see the Git tutorial (the cool stuff is in the next section).

Somewhere on the KDE Project I found a really nice Cheat-Sheet for Git.

To create a new repository just issue

  • git init

To start tracking files (or content) issue

  • git add <files>

To commit the added changes (note, only the state of the files when you issued
will be recorded) issue

  • git commit

To push changes to a remote repository (ssh, filesystem etc.) issue

  • git push <remote url>

To clone a complete repository onto your local disk issue

  • git clone <remote url>

I like to do some general configuration for all my projects, something like this:

git config --global "Your Name"
git config --global [email protected]
git config --global color.ui auto
git config --global color.interactive auto

This tells Git who you are and enables some nice color-features of Git.

To get the most out of Git learn one of these commands:

  • git help <command>
  • man git-<command>

So, git help config is equivalent to man git-config, similarily, git config... is equivalent to git-config.... The latter seems to be discouraged (since Git switches to a library approach or something), but is often used in tutorials or manuals.

That may be the most used commands with Git, the more interesting ones – and the ones for which you may want to switch to Git follow:


Adding/Removing changes to/from the Index

To add all changes in a file, simply use add, some examples:

  • add . adds all files in the current directory and its sub-directories to the index
  • add <file> adds a specific file to the index
  • add -u adds all changed files to the index (that is, it adds alls files Git knows about and that have changed)
  • add --patch lets you decide whether you want to include smaller changes within a file instead of all changes in the file(s) see The Thing About Git for a larger example. This is extremely useful if you accidentally made two unrelated changes to a file and you want to keep your commits clean (you do want to keep your commits clean, don’t you?!)
  • add -i is an interactive version of add with a text-based GUI combining all the above capabilities

If you accidentally added some changes to the index, simply use reset to clean it, or specify the file to remove from the index, like reset <file>. reset can do more than simply cleaning the index, you can use it to restore the state of the latest commit by using reset --hard. reset also takes a parameter telling it which commit to consider – instead of using the latest HEAD, use HEAD^ for the parent of the actual HEAD, or give it a SHA1 hash… . Take a look at this post for a better treatment of git reset and its possibilities.

Checking differences between commits/index/working directory

Sometimes you need to know what exactly the differences between two commits or the working directory and the index (or some commit) is.

This is actually fairly trivial with any revision control system, and of course also with git.

  • diff compares your working directory with the index
  • diff --cached compares your index with the latest commit (or any named commit you supply)
  • diff <commit> compares your working directory with the named commit (use HEAD for the latest commit)
  • diff <commit> <commit> compares (surprise) two named commits (for example, diff HEAD HEAD^ shows the differences from the last two commits)

To get a high-level view of changed files, Git offers the status command.

  • status shows a summary of changed files, files added to the index and files not tracked.
  • status -a shows what the index would look like if you issued add -u or a commit -a (that is, all changed files get commited or added to the index).
  • status -u {no, normal, all} using the -u parameter, Git’s behavior concerning untracked files can be changed (don’t show untracked files, show only files and directories, or show all untracked files recursively).

Another way to get an overview of your controlled content is by using ls-files:

  • ls-files shows all tracked files
  • ls-files -t shows files with some additional information (like cached, modified etc.)
  • ls-files --others shows untracked files

Somehow I find this more interesting than the status output, but this may come from my Mercurial experiences where hg status would output a very similar list as ls-files -t.

Most projects have some files which should not be tracked, for example binaries built from source, or backup-files from your editor… . Using .gitignore you can tell Git to always ignore these files (unless you order it to show you also ignored files, though).

There are several ways to tell Git what to ignore. The most common one is by creating a .gitignore file in the root of your project. This file could be added to the index and would be shared. The nice thing with this file is, that you can keep one in each sub-directory overriding the configuration of the parent directory. See gitignore for more.

The second way, local to your repository, is to edit the file .git/info/exclude to your needs. On Mac OS X, Git automatically adds .DS_Store files to that exclude file.

The third way is to specify a core.excludesfile configuration (for example git config --global core.excludesfile "/absolute/path/.gitglobalexclude") and create a global ignore file (I think an absolute path is needed, but a patch enabling relative paths was submitted).

The format of the files is simple:

  • !file never ignore this file (use !asdf.o if you excluded *.o, but you want to track asdf.o)
  • /file ignore the file in the root of the project
  • file ignore the file anywhere in the project
  • *.o ignore all files with an “.o” ending
  • dir/ ignore the directory “dir” but not the file “dir”

Commiting changes

So, let’s say we have a clean collection of changes in the index. The next step is to commit it.

The basic command is simple, just commit, but you can do more.

  • commit -a commit all changed files (adding every changed file to the index)
  • commit -m "commit message" supply the commit message at the command line
  • commit --amend edit the previous commit (that is, the new commit merges with the old one)

With Git it’s common to make branches for bigger experiments. Use a branch for example if you want to implement a new feature which requires many changes in your code-base, but it takes too long for a single commit.

So, to make a branch from the current HEAD, just issue checkout -b name. This creates a branch named “name” and switches to it. This is equivalent to branch name checkout name.

In this branch you can do whatever is needed to implement the new feature, but if necessary, you can switch back to the old branch (for maintenance or something). After you’ve completed your changes, just merge the branch with the new feature back to the original branch merge name. After that you can safely delete the “name” branch branch -d name.

The following picture shows two branches, Branch B is created during the working directory (red line) points to Branch A, we can switch between the branches with checkout and finally delete Branch B.

Of course there can be conflicts while merging the changes from the other branch into a branch which has evolved by itself. Either you resolve the conflicts all at once, or you keep your branch with the new feature more or less up to date with rebase. After each change to the master branch, you could issue a rebase master in the new branch to get the new changes of the master branch incorporated into your “name” branch.

Another very nice feature of rebase is to merge many commits into a single commit (or edit some of them). Suppose you made your changes within a new branch, now you’re ready but you’d like to have a single commit containing all changes of the branch. Just issue rebase -i master, and Git with alls commits leading from master to the current HEAD. Replace each “pick” with “squash” and Git merges all the commits into a single new one (you get to supply a new commit message). Look at git awsome-ness for a complete example.

The following diagram shows how commits are connected when a branch is made.

From the branch we can do a merge (first picture) or a rebase (second picture). Note that the merge automatically issues a commit if no conflicts arise!

Here a small illustration what you could do with a rebase command:

For smaller changes there’s also a special holding area for changes. Suppose you’re in the middle of a bigger change and you have to quickly fix something completely unrelated. Just issue stash and Git will put away all your current uncommited changes and present you with a clean working directory. Now make the quick-fix, commit and issue stash pop (equivalent to stash apply and stash drop) to continue working on your fancy new feature.

It’s also possible to stack stashes up, use stash list to see a list of stashes. If you come to the conclusion that your stashed changes deserve their own branch, issue stash branch foo to create a new branch “foo” and pop the latest stashed changes.

You never should temper with commits you already published to a remote repository (or other people)! If you find a mistake in a published commit use revert <commit> which records a special commit telling other people that something isn’t right with the specified commit.

Sharing or “backing up” your work

Sometimes you’re not the one who started the project, or you simply reinstalled or changed the computer. How do you get your repository back?

Nothing simpler than that: clone <url>. Unfortunately, clone cannot clone to a remote repository. I can’t clone my local project to my backup server – I have to either scp it to it or create a project there and pull the changes. After that I can use the standard command to publish or receive changes.

clone sets up all necessary configurations and you can simply issue pull to update your repository with remote changes or push to send your changes to the remote repository.

In a distributed environment it’s not uncommon that you might want to push or pull from different remote repositories. For that reason, Git allows you to configure shortcuts to other repositories. The main remote repository is called “origin”, and you usually push or pull the master branch. pull is a twofold operation, it first fetches the contents (using fetch) and then merges the new contents with your changes. Of course you could also use fetch, inspect the changes and merge afterwards.

So, how to you want clone the Git repository:

git clone git://

After the clone, we can look at the remote sites and their respective URLs with remote -v. To look which branches are available at the remote site, issue branch -r.

[email protected]:git % git remote -v 
origin	git://
[email protected]:git % git branch -r

Now let’s add another repository containing the Git project (note the git2 subdomain):

git remote add mirror git://

This tells Git to add the given URL with the shorthand “mirror” and to track all branches. If you’d like to track only some branches, for example only the master branch add -t master to the command.

[email protected]:git % git remote -v 
mirror	git://
origin	git://
[email protected]:git % git branch -r 

You can either switch to a remote branch with checkout origin/todo or create a new branch based on some remote branch checkout -b myownbrach origin/todo, the latter also sets up “tracking” which means that push and pull automatically use the branch “todo” at the remote repository “origin”.

Another very cool feature of Git (and of course others) is the ability to “cherry-pick”. That is, we could take the patches (commits) interesting to us and incorporate them into our own branch. To accomplish that task, there’s cherry-pick at our disposal. Simply choose the commit you’d like to import from another branch and call cherry-pick <commit> at the branch you’d like to incorporate the patch (commit). Works like a charm.

How I use Git

Note that I just switched to Git, but since Mercurial was very similar I know most of the concepts to some degree.

I use version control for most of my files, be it configuration files in my home directory which get shared across my laptop, desktop and work machine or letters I write. Of course Git isn’t the best way to do everything, but it’s a better way than not syncing my files at all.

I have three “work-machines” which synchronise to a server accessible by all of my machines. For machine-specific files (in my home-dir for example) I made a special directory “.machinedependent” and created sub-folders for each machine. In this directories I put the configuration files I’d like to be version controlled even if the are only interesting for one machine. Then I hard-link files and soft-link directories and I’m done. Git handles this quite nicely.

All of my coding projects are handled by Git and stored (for backup purposes and accessibility) on the mentioned server. I like the fact that my code is distributed across several machine nowhere near each other. Git-home-history has some capabilities to produce encrypted archives of your work too, but I didn’t dig into that by now.

This, of course, didn’t touch more than the bare surface of Git. Look at

for more information.

Converting from Mercurial to Git

I downloaded hg2git and converted all my repositories to Git (you should be in an empty directory and it may be an advantage to use zsh):

for i in path/to/repos/*
  mkdir $proj
  cd $proj
  git init -r path/to/repos/$proj
  git checkout default
  git checkout -b master
  git branch -d default
  cd ..

Missing some things from Mercurial

There are actually two features I somewhat miss in Git. First I cannot clone my local repository to a remote server without hassles (login to the remote server and create an empty project for example, or just scp the whole project over… something like that).

Second, Mercurial had very nice short-cuts. For example I could issue hg st instead of hg status, Git also has aliases, but you have to configure them on your own.

git config --global "commit -a" adds an alias “ci” to commit all updated files (instead of adding each changed file individually to the index).

Which one is the right one?

Well the question of “which version control software should I use” comes up sooner or later (for some people so late, they won’t even notice it). I cannot tell you which one is right, the choice between centralized source code management and a distributed one is the first step and gives different possibilities (distributed is the way to go, IMHO).

Anyway, you’ve got plenty to choose from.

For me only Darcs, Mercurial and (only recently) Git were interesting enough to dig deeper. I dumped Darcs because installing it was not so easy, I’m not root on all the machines I have access to and to compile a Haskell program into my home-dir was too much work for me. And of course I heard about some nasty bugs in it… .

Mercurial was awesome (take a look at the Mercurial TechTalk), but Git won me over for absolutely no reason (mabye because it was envisioned by Linus Torvalds, maybe because I like C way more than Python).

I think most of the time we are not the ones to choose the version control software, anyway.

Sometimes “pi = 3.14” is (a) infinitely faster than the “correct” answer and (b) the difference between the “correct” and the “wrong” answer is meaningless. And this is why I get upset when somebody dismisses performance issues based on “correctness”. The thing is, some specious value of “correctness” is often irrelevant because it doesn’t matter. While performance almost always matters. And I absolutely detest the fact that people so often dismiss performance concerns so readily.
— Linus Torvalds