Git: how my life has improved since last month when I used SVN

9 minute read

I've switched from SVN to Git (more git-svn actually) close to a month ago and that had to be a leap of faith. Rationally convincing someone that a DVCS is better is pretty hard because overall the life in SVN land is not that bad or does not appear to be. Note that I am using git-svn so I don't benefit from all the power of DVCSes. While nothing replace actually trying it, I thought it was worth the time to explain what I like about this new tool to help people jump too.

This is not a post on why merging is superior in Git compared to SVN (this is something you need to experience), it's a post on how Git is making my life easier.

This post is split is a few sections:

  • some intro
  • how to import a SVN repo into Git (feel free to skip this tutorial of you are interested in what I liked in Git compared to SVN)
  • use case: multitasking in isolation
  • use case: backporting bug fixes
  • use case: writing better commits and commit histories
  • resources
General

If I had to summarize, Git gives me more freedom than SVN. I am not constrained by the tool in any way:

  • it can follow whatever workflow I want
  • it is fast
  • as a net result my commits are clearer

I't hard for me to say that but since the move I do enjoy committing stuffs (I know, that's pretty scary).

The bootstrap

Here is a small tutorial section for people willing to import a project from SVN to Git and keep a bridge between the two. Due to a bug in git-svn for https imports, I am using Git 1.6.5.1 and not the 1.7.1 version.

mkdir project; cd project;
git svn init --trunk=my/svn/repo/project/trunk/ \
             --tags=my/svn/repo/project/tags \
             --branches=my/svn/repo/project/branches \
             my/svn/repo/project

You can optionally create a file containing a conversion between SVN logins and the committers names and email addresses

jdoe = John Doe <jdoe@foo.com>
agaulois = Asterix <asterix@gaule.fr>

In the logs, every time git-svn finds agaulois, it converts it to Asterix <asterix@gaule.fr>. If you want to do that, you need to create this file. Don't be afraid to miss a couple of logins, if you do, git-svn will stop and ask you to add it. In your Git repository directory, run

git config svn.authorsfile ~/dir/myauthors.txt

The next step is to fetch all the information. This is long, very long. The good news is that you can stop it and restart later.

git svn fetch

Once that is done, you are good to go. To update your Git repo with the new commits from SVN do

git svn rebase

To commit your set of local commits to SVN, do

git svn dcommit

Many people, especially in the open source community, consider DVCS as a bad thing because it encourages committers to keep their work locally and not share with others. In reality, it does not. People who share frequently will continue to do so, people who don't still don't and should be fired. Same as usual. In practice for me, I dcommit every 4 to 6 hours.

I do recommend to import one SVN project per Git repository. You will typically get several Git repos per SVN repo. The rule is import the biggest unit that you tag / branch in isolation in SVN. For example, for Hibernate, I've several Git repos:

  • Hibernate Core (which contains all the modules)
  • Hibernate Validator
  • Hibernate Search
  • JPA API
  • Bean Validation API
  • Bean Validation TCK

All of these have generally independent release cycles and version numbers. Apparently it is possible to aggregate Git repositories via the notion of superproject but I have not tried.

One golden rule: you cannot share a Git repository and pull changes back with someone else AND use it to commit in a SVN repository. That will be a mess because git-svn rewrites the commit unique identifiers. Forget sharing repos when you use git-svn unless you are abandoning SVN and are doing a one time import.

Multitasking in Isolation

The absolute coolest feature is the ability to work in total isolation on a given topic for very cheap. I am not necessarily talking about the ability to work offline on an island (though that's nice). I am talking about the ability to work on several subjects in parallel without complex settings.

Let's take an example. I was working on a new feature for Hibernate Search's query DSL. I branched master to dsl and started to work, including committing small chunks of work (more on this later). While working on it, I found a bug in the existing query engine. No problem, I literally stopped working on the new feature, put stuff aside (git stash). created a new branch off master named bug123 and fixed the bug. When I was done with the bug fix, I applied it on master and the dsl branch and resumed my work there. There is the workflow:

git checkout -b dsl #create the dsl branch and move to it
#work work commit work commit work
git stash #put not yet committed stuff aside
git checkout master

git checkout -b bug123 #create bug fix branch
#work work #fix bug 123
git commit
git rebase master #apply commits of master on bug123 (not necessary in this case as I did nothing in master)

git checkout master
git merge bug123 #merge bug123 and master
git branch -d bug123 #delete the useless branch

git checkout dsl
git rebase master # apply commits of bug123
git stash pop #reapply uncommitted changes
#work

It looks like a lot of operations but, it's very fluid and very fast!

What's the benefit? I've fixed a bug in isolation of my new feature even if the same files where impacted. I've committed the bug fix isolated: I can easily reapply it to maintenance branches (see below). Had I used SVN, I would have fixed the bug and committed "new feature + bug fix 123". I would not have backported the fix to our maintenance branch nor would have my co-workers because of the complexity to separate the new feature from the bug fix. In Git the process is so smooth that I even use it to bug fix typos in comments in isolation from my main work.

I should point out that switching branch is super fast and done in the same directory. You IDE quickly refreshes and you are ready to work in the same IDE window. For me that's a big plus over having to checkout a maintenance branch in a separate directory, set up my IDE and open a second IDE window to work in a different context of the same project: I work on five different projects on average, I can't afford a proliferation of IDE windows. With Git, the context switching comes with much less friction and saves me a lot of time.

Backporting bug fixes

In SVN land, to backport a bug, I either:

  • generate the patch off of the SVN commit, and apply it on a checkout of the maintenance branch
  • manually read the commit diff and select which change I want to apply (generally because somebody has committed the fix alongside a new feature or because it has committed the feature in 7 isolated commits)

In the first scenario (the easiest), it involves

  • generating the patch
  • saving it as a file
  • optionally checking out the maintenance branch (ie get a coffee)
  • opening my new IDE window
  • apply the patch
  • commit the change with a log message

In Git, you:

  • checkout the maintenance branch (2s)
  • run the cherry-pick command (git cherry-pich sha1) over the commit or commits you want to copy from the main branch (logs are copied automatically though you can change them if needed)

So easy you actually do it :)

Writing better commits and commit histories

A feature I do like is the ability to uncommit things and rewrite / rearrange them. This is something you only do on the commits you have not yet shared (in my case not yet pushed to SVN). That looks like a stupid and useless feature but it turns out I use it all the time:

  • I can commit an unstable work, explore a couple of approaches and come back if needed
  • I can fix a typo or bad log message
  • I can simplify the commit history by merging two or more commits (I typically merge commits I used as unstable checkpoints)
  • These operations typically require 5 seconds or less

The net effect of being able to do that is:

  • I write better log messages
  • I commit more often / in smaller pieces, making my changes more readable
  • If the pieces happen to be too small I merge them before synching with SVN

You can also do some more micro surgery. If you are changing code and realize that these are really two or three sets of changes and should be committed separately (changes from the the same file potentially). You can literally select which file / which line to commit. The tool GitX let's you do that very easily.

Git can do that because it does not track files, it tracks changes. You can stage some changes for commit (two new files and changes in three files), continue working on the same set of files and commit the state as defined when you initially staged it. Your subsequent changes can then be committed later. This is a subtle difference of approach (content management vs file management) but now that I have used it, I like it better. As a consequence, if you change a file, these changes won't be committed automatically next time you commit. You need to include them (that's what the -a option is for when you run git commit).

Resources

I absolutely recommend you to read Pro Git:

  • that's a top quality book
  • use case oriented
  • and it's also available for free online http://progit.org (though go buy it too, it's well deserved)

Aside from that, I do use

git help command

very often, their documentation is pretty good. Otherwise, Google is your friend, there are many resources out there.

I don't need / miss additional tools to work with Git. The command line is good enough and often less confusing (IntelliJ's integration confused me, so I don't use it unless I need to compare files). I do like to use GitX a graphical tool for two purposes:

  • it displays branches and commits graphically
  • it lets you easily stage specific lines of a file for later commit (the micro surgery tool)

That's it folks,

I hope you enjoyed the read and that I've encouraged you to give it a try. git-svn made the try a no brainer really. I've lost probably 16 hours to learn, try and understand Git. I'm confident I will get them back within the next three to four months (my ROI is covered :) ). You can also try it on any directory, I am now using Git to keep a revision history of all my presentations. Remember, no need to set up a server or anything complex. Run git init and you are good to go.

Disclaimer: this is not a thorough comparison, just the feedback of a one month old user.

Comments