Wednesday 9 December 2009

The #1 thing to learn if you're new to git

Now I'm not 'new new' to git anymore, I've been using it off and on for at least a couple of years, but I've been using cvs/svn for many more years than that. git has a steep learning curve[1], and have found the git-svn crash course invaluable. But there was one thing missing:

The #1 thing I wish I had learned when I was new to git:

...is that you must commit to git *far* more frequently than you do to svn.

From my thrashing about, I have discovered there are many more 'dangerous' commands in git than in svn, and it's really easy to get yourself into a 'stuck'[2] state.

git actually provides a whole set of tools that will help you get back out of whatever hole you've dug yourself into... but you're likely to end up having lost your working-copy changes[3] along the way. So the best practice now is to commit often - so that everything is in the repository: there is never a big working copy to lose.

I had a lot of trouble with that as it considerably changes the workflow I'd built up over the last decade or so. From cvs to svn and then adding agile on top, my workflow is now roughly:

  1. Checkout a fresh, clean copy of the repository (or svn up to achieve same effect)
  2. add your tests and make some changes
  3. check the rest of the tests still run - make changes until they do
  4. do an svn st to see if there are any files I've forgotten to svn add - add them
  5. do an svn diff and see all my changes - eyeball them to make sure there's nothing obviously wrong
  6. do an svn up to pull in latest changes by others and make sure the tests *still* pass
  7. svn commit

So at all times, everything would be *not* checked in - all of it just sitting in the local working copy until I was sure it was ready.

The problem is that, on the surface, git appears to support this workflow perfectly. All the svn-commands described above have git-equivalents (they're even called the same thing) and so you can (supposedly) transition smoothly over to git with only minimal effort. Even adding a branch, rather than hacking on master is not too far a departure from svn-style workflow, as branching is familiar in svn, and git just gives you a beezier easier interface.

So where does it break down?

Well, in my case, usually at the second-last point. A git-pull can completely mess you up if you get to a merge-conflicted state. You can't commit your working-copy because of the merged state, you often can't even properly diff because you've got a mush of the git pull's changes plus your own changes and no way to tell which is which. and there seems to be no obvious way to 'just commit the merge-conflict changes' or update the files that are conflicted and just *tell* git that they're not conflicted anymore... the way you can in svn. So at this point you're screwed.

What makes it worse is that at this point you often don't know exactly what commands you did to get you here - if you're anything like me, you've probably tried a whole bunch of stuff only partly understanding exactly what it does. Each command simply tells you in it's own way that you can't do that. You can look up what you're supposed to do to fix it - but generally find that's just another command that tells you that you can't do it either. So you feel like a truck that's stuck sideways in a narrow alley and can't even understand how it got here, let alone how to get itself back out.

Frustrating!

Underlying that is the, quite reasonable, fear that you may lose all your work[3] since your last commit...

and of course that's because we're used to the underlying 'don't commit until' mentality that we may not even be aware we are sporting.

don't commit until (perfect)

The workflow I described above is a perfect example of this mentality. It makes sense to hold back on committing anything until it all works. After all, you know that the moment you commit, the CI server will pull all your changes and let everybody on the team know that you just broke the build (again). So eventually you adopt a "don't commit until the tests pass" workflow, and keep everything in your working copy until everything's green before committing to the svn repository. Fostering this "don't commit until it's right" mentality is a natural consequence of not wanting to look like an idiot to your colleagues, and works in perfectly fine with the svn-based workflow.

but git doesn't work that way!

Or should I say that git doesn't *need* to work that way. After all, you still need to make sure that your tests pass and don't break on the CI server... but what I've found is that you need to get over the whole git commit thing. It may be named the same thing as svn commit - but it doesn't mean the same consequences (eg that your colleagues will all see that the feature's only half-complete and the tests are all spazzing out).

Instead, change the way you think: the command that *matters* now is actually git push. You can commit whatever you like to your local repository, even if it breaks the build; it's only when you push up to the remote repo that it must be working perfectly.

Any other problems with this?

Unfortunately, there are some other consequences to this small change in workflow. One of them being the fact that you can't do a 'git diff' that covers all your changes since last push. git status and git diff are *just* like svn status and svn diff - they check against the latest commit, not the latest 'push to remote', which means it's hard to do a complete check of all your changes before going 'live'... you have to just trust that all your smaller commits all add up to the Right Thing.

That makes me feel uncomfortable as I like to be sure. I know about human error - and I know that I'm as prone to it as the next guy...

Having to make a patch-against master and then read through *that* (which is far less clear to read than a diff) is not a good substitute, IMO. If anybody has a good way on how to mutate the workflow to accommodate this I'd love to know.

a new workflow?

I'm still working on this but so far I've got:

  1. Clone a fresh, clean copy of the repository (if I don't already have one)
  2. git checkout -b my_new_branch
  3. add tests and/or make some changes
  4. do a git diff and check this change - eyeball it to make sure there's nothing obviously wrong
  5. git commit (then repeatedly return to step 3 until 'done')
  6. check the rest of the tests run - make changes (and git commit) until 'done done'
  7. do a git pull origin master to pull in latest changes by others and make sure the tests *still* pass
  8. fix any merge-conflicts and commit the merge
  9. git push

This is still a work-in-progress, and I would appreciate informed opinions, advice or your own war stories.

Notes:
[1] IMO the git developers could learn a thing or two from Kathy Sierra... but that's another topic.[4]
[2] If you've ever got into a state where you can't run git commit because you're in a 'failed merge', you can't git pull because you get 'fatal: merging of trees' or 'Automatic merge failed; fix conflicts and then commit the result.'. You edit the files to un-conflict them and try to reapply your stash you suddenly get 'CONFLICT (content): Merge conflict in ...' again... After thrashing around for a while between git stash, git pull, updating merged files then trying to re-apply your stash before git committing... I can tell you where I wanted to stick git.
[3] If you're anything like me, you look on the words git reset --hard HEAD with some trepidation. You just can't quite believe that blowing everything away in your working copy is the only way out of a simple merge-conflict.
[4]...and please don't just tell me that git is open-source and I should just go hack on git myself if I hate it so much. In theory I absolutely agree with you, but in practice I can only work on one thing at a time - and right now I'm still working on Active Resource, some projects of my own, a novel...

10 comments:

Mike Harris said...

Does "git diff origin/master" not do what you want?

Björn Steinbrink said...

What's really wrong with the old workflow is that you pulled with uncommitted changes. "git pull" is _NOT_ "svn up", so don't use it like that.

"git pull" will allow to be used with uncommitted changes if the changes are to files that won't be affected by the merge, and that's how you got into the "messed up" state. You had some uncommitted changes to "foo", got a clean for for "bar", and a conflict for "goo". Now "git commit -a" or "git add -u" will obviously mess up the merge commit, because "foo" will be part of it. The solution is to simply use "git add goo", adding "goo" to the index, cleaning the "unmerged" state. Do that for all unmerged files, then just "git commit". As you never added the modified version of "foo" to the index, the file will remain with uncommitted changes.

Of course, the safe way is to never merge with uncommitted changes (there are some cases in which it's useful to merge although there are uncommitted changes though).


For the diff, "git diff origin/master..." (incl. the three dots) is probably what you're looking for.


And for sane workflows, read e.g. gitworkflows(7). I'd adjust your "new" one a bit:


1. git checkout -b topic master
2. work, commit, work, commit, ...
3. maybe use rebase -i to clean up the history
4. git checkout master
5. "git pull" to update master (should fast-forward)
6. git merge topic
7. git push


That way, instead of rather non-descriptive "Merge master from $url" merge commits, you get "Merge branch $topic" merges, and the first-parent ancestry is a lot nicer.

jawher said...

Hi,

In the new workflow you propose, did I miss something or there is a missing checkout master; merge master my_new_branch ?

Brian Cole said...

We need to switch from CVS internally and trying to decide whether git or svn is a better option. So thank you for this post, it helps.

From reading the comments and playing with git it appears "git rebase master" is the equivalent to "svn up" since whatever is in your local repo will be repositioned later in time then what is currently in the master repo.

Two questions:
- Why "git checkout -b topic master"? I though the local repo was already a branch, why make another?
- Once I make several commits to the local repo, how can I push to the master and mark that they are all part of the same bug/feature? I was hoping git rebase -i would let me combine some commits together to "clean up history", but I don't see how to do that.

Taryn East said...

@Djo
You haven't missed it. it may be one more thing in my continuing education. I've been doing a 'git pull origin master' on my branch. From what I gather this seems to be able to stand in for the 'git checkout master/git pull/git checkout mybranch' waltz.

From what I can tell it does a 'git pull' then a 'git merge' into your current branch...but leaves your master branch alone.

I could be wrong, of course.

What I've noticed when I do that is that if I do go back to the master branch, I can just do a 'git merge' without requiring the 'git fetch' - because all the objects are already there - they just haven't been merged in yet.

As to whether one is better than the other... well when I was following my previous (svn) workflow, I didn't want to checkout the master branch - because all my changes were working-copy (uncommitted), so I wanted a way that changed the local branch... now that my workflow has changed it doesn't matter as much.

I guess YMMV as to whether this is a Good Thing or otherwise... :)

Also happy to hear comments on whether this is all just off-the-wall blathering or if it actually makes any sense.

Cheers,
Taryn

Taryn East said...

@Bjorn.

Thank you thank you.

That was an extremely helpful explanation that I really appreciate!

Now I at least understand how I got into the sideways-truck state, and some great ideas on how to get out of it if it ever fouls up for me again! :)

As to the diff - certainly looks great so far. Thanks so much! I'm curious about the '...' - looks funny :) I'll have to go look it up to see what it means exactly.

The new workflow also looks interesting - though I'm told that "We Shouldn't Rebase" at our work. Apparently there was a nasty accident involving rebasing a local copy of a remote branch... so we just don't do it. I don't think we're too fussed about history. Most of our features are on separate branches so a branch will only have the history of the feature... plus any master-merges since the branch was created.

I like your 'git merge $branchname' idea though - good point that it is clear exactly what happens when.

So again, thank you - an extremely helpful comment!

Cheers,
Taryn

Myron Marston said...

I'm very comfortable with git at this point, having used it daily for about a year now. When I first started using it, I had a personal rule that I would make a copy of entire project directory (including the .git subdirectory) before doing anything with git that was new to me or that I wasn't 100% sure about. That way, when things went wrong, it was easy to start over with the copy.

I haven't done this for a good 8 months or so now, but it saved my bacon more than a few times. And it gave me the comfort of mind knowing I couldn't possibly permanently lose any work.

hedgehog said...

I'd 2nd Bjorn's suggested frequent commit/rebase then checkout master, etc. I've seen that recommended elsewhere too, I call it the dirty-dozen-commitment (TM), 12 (smallish) commits then a 'return to base', aka rebase :)

If you always work in a temp branch you are _only_ rebasing _your_ 'hack-session' before you merge _or_ share with others in the \ branch, which you rebased --onto - you can also reorganize (squash) your commits in the rebase phase - a second bite of the cherry - so your colleagues will gasp at your 'coherent' work.

I never rebase a branch others have been working on, or pulled from. they should be rebasing their private wip branches and merging that result to our common branch.

`git rebase --interactive`, ie you can abort after a couple of iterations if it seems things are going pear-shaped.

Two other points. a) Always start working in a new (local) branch, b) `git tag wip`, in case rebase goes awry (I haven't yet worked out what insurance the tag provides - `git rebase --abort` has always saved me part way through a rebase that is getting out of hand. I've also used the reset (below) when I'm really 'playing-around' with git.).

If you find yourself at the wheel of sideways-truck in an alley:
git reset ORIG_HEAD

will ease you heart-rate.

HTH

Taryn East said...

@hedge - great comments too! I am learning *so* much from having written this post!

Thanks for adding to my arsenal of Commands To Fix It when Things Go Wrong :)

re: "a) Always start working in a new (local) branch"
Definitely - I'd class this as the #2 thing to learn if you're new to git... :) especially coming from an svn background where the common thing to do is work on master because of the higher cost of branching. Definitely something that git Did Right.

With rebasing - at work, most of our branches aren't just local - they're remote with others working on it - which is why the "thou shalt not rebase" rule came into place... apparently it can sometimes mangle nastily if two people rebase onto a remote branch. Not entirely sure how that occurred, which is why we figured it's just an easier rule-of-thumb to Not Do That.

When working on local-only branches, though, sure thing; and you're right, I love the squash thing. Especially when some of your commits are roll-backs of stupid stuff that broke things :)

Taryn East said...

Just to keep the comments fresh... "git diff origin/master..." is (I think) almost-but-not-exactly what I want.

From what I can tell it gives all commits from the beginning of the branch... not from my last push.

But it's a start and now I can keep looking. :)