Tuesday, 22 December 2009

Login to rubyCAS from a form

Now, as with most places, the root url for our site points at the signup-form. It's there to tempt new users to join us and start creating a new dating site immediately.
But our existing users will usually hit this page first too. Again like most other sites, we don't want to force them to click 'login' and go to some other page to fill in their credentials; so we have a login box on the signup form so they are one click away from getting to the real site.

Now, when you look at the rubyCAS implementation - all you're given is a login_url method that you can put into a link and sends you to the rubyCAS login page.

At first glance, this looks a bit like you're forced to do the login waltz:
click login... which takes you to the rubyCAS-server... you type your credentials into the box... you get redirected back to the site you really want... and one-two-three, and one-two-three...

Luckily there's a way to get your site to send credentials to the CAS-server from a controller-action, and redirect them straight on to the page they really want - so you can look like you're seamlessly logging them in without even touching the rubyCAS-server... and no, you don't even need to set your app up as a proxy server[1]. :)

The following shows how to use the existing configured CAS-server details to create a login action that will:

  1. Send the given login details to the configured rubyCAS-server for the current environment
  2. If it fails, will re-show the 'new' page (assuming that the login-form is on the same page) with a 'login failed' notice
  3. If it worked, will redirect to a url that you provide... with the new CAS-ticket appended, thus properly logging the user in.
  # assuming that the signup-actions are in the same controller, 
  # you'll probably have a filter like this
  before_filter CASClient::Frameworks::Rails::Filter, :except => [:new, :create, :cas_login]
  def cas_login
    credentials = { :username => params[:login], :password => params[:password]}

    # this will let you reuse existing config
    client = CASClient::Frameworks::Rails::Filter

    # pass in a URL to return-to on success
    @resp = client.login_to_service(self, credentials, dashboard_url)
    if @resp.is_failure?
      # if login failed, redisplay the page that has your login-form on it
      flash.now[:error] = "That username or password was not recognised. Please try again."
      @user = User.new
      render :action => 'new'
    else
      # login_to_service has appended the new ticket onto the given URL 
      # so we redirect the user there to complete the login procedure
      return redirect_to(@resp.service_redirect_url)
    end
  end

At present, the two patches required to make this work are only available on my fork of rubycas-client, but I'm told they'll be pulled into master pretty soon.

Note on URLs

You have to pass in a url that the user can get to... and that is also protected by the CAS Filter. The login_to_service method really just generates a one-time ticket that can then be used on any URL that will accept that ticket (ie anything protected by the Rails::Filter) which is what does the 'real' logging-into the system. So make sure you redirect the user there... but only if it was successful (or they'll just end up at the rubyCAS-server page - which might not be a bad fall-back).

When considering what URL to start your user off with - remember that at the time we have to construct the url, we not only don't have a 'current_user' set up yet... we also don't know if the person asking for user Bob's page is actually user Bob. So if we want to redirect to the user's account page after login (eg users_path(@some_user) we have to a) guess the user's id from login, but also b) check that they could access the page *before* they successfully log in.

If you don't do this, you can potentially compromise security. For example, a malicious user might try a whole bunch of user logins. If the url is set for, say users_path(User.find_by_login(params[:login])) - that line will explode (with a routing error for trying :id = nil) for users that don't exist - conveniently telling our malicious user which logins exist (or not) on our system. This is generally considered a Bad Thing To Do.

A better idea is to have a generic url that doesn't rely on a specific id in the route. The action can pick up the current_user's id once the user is successfully logged-in. We use something like 'my_dashboard_url' or 'my_account_url' for this purpose.


Notes

[1] Setting up an app as a rubyCAS proxy server
This is something still in the eventual plans, but not yet. Mainly because it's a bit annoying that you have to have two apps running on separate servers simply so that one could be your proxy... that cross-dependency makes it a bit awkward to me. YMMV - the instructions are all in the rubycas-client rdoc


This is one article in a series on Rails single-sign-on with rubyCAS

Rails single-sign-on with rubyCAS

As our rails apps multiply, we began looking for a single-sign-on solution - to give some sense of a seamless 'application' to our users. We've picked rubyCAS for this as it's pretty much the only single sign-on solution that's already out there and running for ruby (and comes with both a server and client gem already written, which is handy).

We'd been running restful_authentication on our main app, and so our new CAS setup had to be able to handle all the existing stories before it was acceptable as an alternative. Over the next few articles, I'll share some of the more useful little snippets I came up with.

In a nutshell?

CAS is a protocol for Authentication. It is not exclusive to ruby, but the ruby-implementation consists of two halves: the server and client.

The server is an independent 'camping' application (ie rack-style ruby). The client is a gem/plugin that fits inside each of the applications that we are writing. The client has the code that allows us to use the rubyCAS-server as an authentication method. Most of the rest of the articles will deal with using the client.

How the two talk to each other is pretty complex, and most of it is beyond the scope of this article... and mostly you don't need to know the details to get it all working. Again, in a nutshell:

Users that try to hit one of our applications will be passed over to the rubycas-server for authentication which has a login form. They type in their login details. If this matches what's in our system, then they will be given a one-time ticket and sent back to the application they started at. The application will verify this ticket with the server, and get a proper ticket and put it into a cookie that will then allow them to be continually authenticated with that application from then on.

The single-sign-on part comes from when the user tries to then access one of the other applications running rubycas-client. The new app will see the user has the cookie, and send it back to the rubycas-server.. but gets redirected back to the new app before they get actually shown the login page with a new "it's ok, we know this guy" ticket. The new app sees this ticket and now user will be automatically logged into the other application as well.

If the user clicks logout, the app destroys it's own session, and sends the user to the rubycas-server logout page... which also destroys the ticket in its own database.

That's how it all hangs together.

If you really have a burning need to know more details, go check out the rubyCAS-server documentation which has full workflows.

So what do we want?

As I said, we're re-implementing the authentication that we had as restful_authentication... so we need to implement all the features we had there... but also single-sign-on to our new applications. What follows is a minimal wishlist for replacing existing functionality:

I probably won't tackle them in order... but should get to all of them eventually.

Friday, 18 December 2009

[Link] Bootstrapping passenger and Ruby Enterprise on Ubuntu

Michael Lang has provided a whole bunch of extremely useful scripts to help you install passenger and ruby enterprise on ubuntu. He's blogged about why he's provided the scripts - basically because he want everybody to bootstrap for his blogposts so they can start with the same environment set up before he goes into the details of a particular explanation.

I for one am simply grateful - but his comments are closed on the articles, so I guess I'll just have to point at him instead. :)

Thanks Michael!

Thursday, 17 December 2009

Pretty error messages using locales

Default error messages are a set of concatenated strings all lumped together. This is great as a bare-bones set of errors, but does lead to clunky, unhelpful phrases such as 'Email is invalid'.

While technically correct, this can be a bit confronting, and doesn't explain to a user what they can do to fix it. After all, we already know that it's in our best interests to be nice to our users and get them up to speed with a minimum of fuss. Our error messages would be much nicer if we could use plain English and actually explain what a user can do to fix the situation. eg 'Please enter a value for Email that is longer than 5 characters. It should have an @ in it and make sure that you haven't accidentally used a comma instead of a dot'. That's not only nice and friendly - but means a user knows exactly what to do to check if they've got it right, instead of just getting frustrated.

So, how do we go about being nice to our users?

Adam Hooper has written a great article on using full sentences in validation messages. His reasoning is based on Il8n requirements which aren't necessarily everybody's cup of tea... but it's not just useful for that. It also gives us an easy way to make our error messages pretty.

The quickie version (for people that don't care about Il8n) is outlined below.

Create the locale file

All your error messages will go into the file: config/locales/en.yml.

Type in your alternative error messages in exactly the format you'd like them to appear to the user. There are special patterns that you can use to match the attributes of the error - the two main ones being the attribute name (which is substituted wherever you put {{attribute}}, or the required length of a field eg when used in validates_length of (which is substituted for {{count}}) . Here's an example of some of the more common Active Record errors rebuilt in this way:

en:
  activerecord:
    errors:
      messages:
        # Default messages: so we have complete sentences
        confirmation: "{{attribute}} doesn't match the confirmation. Please retype them and make sure they match."
        accepted: "Please read and accept {{attribute}} before continuing"
        empty: "Please enter a value for {{attribute}}"
        blank: "Please enter a value for {{attribute}}"
        too_short: "{{attribute}} is too short (the minimum is {{count}} characters). Please enter a longer one."
        taken: "Your chosen {{attribute}} has already been taken. Please choose another"

Remove any partial-errors in your models

You'll probably find that you've unwittingly entered partial error-strings in your code. eg errors.add :attr_name, 'is bad because blah blah blah'. So your next step is to search for these and turn them into real sentences too, or your error messages will be a bit out of synch. eg:

# instead of
errors.add :region, "should be from your chosen country. Please check it an try again"
# use:
errors.add :region, "The chosen region should be from your chosen country. Please check it an try again"

update the error-box to use only the error message - and not the fieldname

The last step to get your application nice is to stop using the auto-generated error messages, which automatically jam the fieldname up against the now-full-sentence errors. (producing such lovelies as Region The chosen region should be from your chosen country. Please check it an try again :P

This just consists of writing a helper method (or partial) that will take any kind of object and spit out the errors on it however you like. eg

  <div class="errors">
    <% @obj.errors.each do |f,msg| -%> 
      <li> <%= msg %> </li>
    <% end -%>
  </div>

And technically you're done...

except for shoulda...

As of my writing, Shoulda was not playing nice with locale-based error messages. My fork on git has the fix and I'm sure it'll be pulled into shoulda sometimes soon (let me know if it has!).

To use my branch, you an either vendor a copy of my branch, or add it as a :git => line in your Gemfile.

If you don't use the fix, shoulda will break because should_validate expects all of the old error messages :(

Friday, 11 December 2009

Getting webistrano to deploy under passenger

What's your problem?

I'd been playing with webistrano on my own machine and using it to make deployments to staging - and that was all ticking along nicely. But then it was time to put webistrano up on our staging server - so as to let our sysadmins use it for deployments.

Downloading and installing it is very easy. Configuring it is a little more annoying, but once it was full of all the hosts/systems/rstages/roles and config_options from my existing setup iit shouldn't need any more updating.

Then I tried to deploy. At which point it promptly all fell over.

The same thing happened over and over. I'd click deploy and it'd start. The "running" spinner would spin... forever, the little ajax refresh constantly refreshing the Log... with nothing.
The deploy didn't actually deploy, and what's worse - didn't ever *stop*[1].

I did a deploy and watched top and a ruby process did appear briefly and then disappear - but no ruby process ever showed up in ps...
Not happy Jan :(

I couldn't even cancel the deploy because the cancel button only ever appears once the deployment is fully underway[2]. To deploy again, I had to break locks and do other unsavoury things and that felt really wrong.

So, what was happening and how did we eventually fix it?

Well, the suspicious thing to me was that nothing showed up in the log-section at all. Not even the "loading stage recipe X" that is the very first line of all the successful deploys on my own system.

Thus I figured on a permissions problem. I checked that the runner-user had access to svn, and to the staging server. This was interesting as staging was deploying on itself, we checked that it could ssh into itself happily... and sure enough it asked me to check the 'authenticity of the host' I was connecting to. Now webistrano is notorious for hanging on an un-expected question, so I figured I'd just have to add this to known_hosts and all would be well.

It probably was something that helped... but the deploys were still failing to spin up.

So I dug into the log and found it was chock full of the AJAX Deployment#show refreshes (they're so frequent!) But I eventually got back to the initial Deployment#create which is what should kick off the real deployment. The log for this shows that it goes along fine until right at the bottom, almost completely hidden amongst the noise is one line that rang alarm bells:
sh: ruby: command not found

So I checked permissions again, checked to be sure that ruby was installed, that we could see it in the $PATH as the deployment user, all those things.
I even did a capfile export and ran it with pure capistrano to make sure - and that worked just fine! So now I was really confused.

Finally digging into the webistrano application code, I discovered that the important LOC is in app/models/deployment.rb under def deploy_in_background. It's of the form: system("sh -c \"cd #{RAILS_ROOT} && ruby script/runner -e... etc etc. So I tried this on the command line. ruby -v worked for the deployment user.

I spun up script/console and tried system("sh -c \"ruby -v\"")
and that spat out the correct version and 'true'... so obviously rails could find ruby ok, but running in during deployment was still not happy

Then I copied the above code into the application layout... and it returned false instead of true. Something was happening when inside the running app that wasn't running from the command-line.

Then I discovered this blogpost claiming they also had the log message: sh: ruby command not found

So it seems that when the app is running - by default you're actually not inside a shell - so it's not loading your settings (such as $PATH) and thus not going to find important things (like ruby).

The solution?

Instead of sh -c we need it run under bash --login -c

This will force the process to load up your bash environment settings. The bad news is that you have to monkey-patch webistrano to get it working[3].

Given webistrano is a rails app, this isn't difficult - just annoying. There's only one spot that you need to change. That's the aforementioned deploy_in_background method. Change it there and touch tmp/restart.txt and your deployments should at least spin up now.

anything more?

There is still problem if your recipes also require some $PATH goodness. For example if you are running 'gem bundle' your shell will need to find gem... which means that the recipes need to run under bash too. Now it's a little easier to convince webistrano to do that.

You can override the shell used here by supplying a configuration option: default_shell to bash --login


Note: it's the --login that gets it to do what you want!

Also - don't forget that if you call sh -c explicitly in your recipes you'll need to change it there too.

Notes for webistrano devs:

[1]You should probably surround the deploy process in a timeout.
[2] The cancel button should appear immediately.
[3] It'd be nice if we could configure the shell-command under which webistrano runs

Wednesday, 9 December 2009

The #1 thing to learn if you're new to git

Now I'm not 'new new' to git anymore, I've been using it off and on for at least a couple of years, but I've been using cvs/svn for many more years than that. git has a steep learning curve[1], and have found the git-svn crash course invaluable. But there was one thing missing:

The #1 thing I wish I had learned when I was new to git:

...is that you must commit to git *far* more frequently than you do to svn.

From my thrashing about, I have discovered there are many more 'dangerous' commands in git than in svn, and it's really easy to get yourself into a 'stuck'[2] state.

git actually provides a whole set of tools that will help you get back out of whatever hole you've dug yourself into... but you're likely to end up having lost your working-copy changes[3] along the way. So the best practice now is to commit often - so that everything is in the repository: there is never a big working copy to lose.

I had a lot of trouble with that as it considerably changes the workflow I'd built up over the last decade or so. From cvs to svn and then adding agile on top, my workflow is now roughly:

  1. Checkout a fresh, clean copy of the repository (or svn up to achieve same effect)
  2. add your tests and make some changes
  3. check the rest of the tests still run - make changes until they do
  4. do an svn st to see if there are any files I've forgotten to svn add - add them
  5. do an svn diff and see all my changes - eyeball them to make sure there's nothing obviously wrong
  6. do an svn up to pull in latest changes by others and make sure the tests *still* pass
  7. svn commit

So at all times, everything would be *not* checked in - all of it just sitting in the local working copy until I was sure it was ready.

The problem is that, on the surface, git appears to support this workflow perfectly. All the svn-commands described above have git-equivalents (they're even called the same thing) and so you can (supposedly) transition smoothly over to git with only minimal effort. Even adding a branch, rather than hacking on master is not too far a departure from svn-style workflow, as branching is familiar in svn, and git just gives you a beezier easier interface.

So where does it break down?

Well, in my case, usually at the second-last point. A git-pull can completely mess you up if you get to a merge-conflicted state. You can't commit your working-copy because of the merged state, you often can't even properly diff because you've got a mush of the git pull's changes plus your own changes and no way to tell which is which. and there seems to be no obvious way to 'just commit the merge-conflict changes' or update the files that are conflicted and just *tell* git that they're not conflicted anymore... the way you can in svn. So at this point you're screwed.

What makes it worse is that at this point you often don't know exactly what commands you did to get you here - if you're anything like me, you've probably tried a whole bunch of stuff only partly understanding exactly what it does. Each command simply tells you in it's own way that you can't do that. You can look up what you're supposed to do to fix it - but generally find that's just another command that tells you that you can't do it either. So you feel like a truck that's stuck sideways in a narrow alley and can't even understand how it got here, let alone how to get itself back out.

Frustrating!

Underlying that is the, quite reasonable, fear that you may lose all your work[3] since your last commit...

and of course that's because we're used to the underlying 'don't commit until' mentality that we may not even be aware we are sporting.

don't commit until (perfect)

The workflow I described above is a perfect example of this mentality. It makes sense to hold back on committing anything until it all works. After all, you know that the moment you commit, the CI server will pull all your changes and let everybody on the team know that you just broke the build (again). So eventually you adopt a "don't commit until the tests pass" workflow, and keep everything in your working copy until everything's green before committing to the svn repository. Fostering this "don't commit until it's right" mentality is a natural consequence of not wanting to look like an idiot to your colleagues, and works in perfectly fine with the svn-based workflow.

but git doesn't work that way!

Or should I say that git doesn't *need* to work that way. After all, you still need to make sure that your tests pass and don't break on the CI server... but what I've found is that you need to get over the whole git commit thing. It may be named the same thing as svn commit - but it doesn't mean the same consequences (eg that your colleagues will all see that the feature's only half-complete and the tests are all spazzing out).

Instead, change the way you think: the command that *matters* now is actually git push. You can commit whatever you like to your local repository, even if it breaks the build; it's only when you push up to the remote repo that it must be working perfectly.

Any other problems with this?

Unfortunately, there are some other consequences to this small change in workflow. One of them being the fact that you can't do a 'git diff' that covers all your changes since last push. git status and git diff are *just* like svn status and svn diff - they check against the latest commit, not the latest 'push to remote', which means it's hard to do a complete check of all your changes before going 'live'... you have to just trust that all your smaller commits all add up to the Right Thing.

That makes me feel uncomfortable as I like to be sure. I know about human error - and I know that I'm as prone to it as the next guy...

Having to make a patch-against master and then read through *that* (which is far less clear to read than a diff) is not a good substitute, IMO. If anybody has a good way on how to mutate the workflow to accommodate this I'd love to know.

a new workflow?

I'm still working on this but so far I've got:

  1. Clone a fresh, clean copy of the repository (if I don't already have one)
  2. git checkout -b my_new_branch
  3. add tests and/or make some changes
  4. do a git diff and check this change - eyeball it to make sure there's nothing obviously wrong
  5. git commit (then repeatedly return to step 3 until 'done')
  6. check the rest of the tests run - make changes (and git commit) until 'done done'
  7. do a git pull origin master to pull in latest changes by others and make sure the tests *still* pass
  8. fix any merge-conflicts and commit the merge
  9. git push

This is still a work-in-progress, and I would appreciate informed opinions, advice or your own war stories.

Notes:
[1] IMO the git developers could learn a thing or two from Kathy Sierra... but that's another topic.[4]
[2] If you've ever got into a state where you can't run git commit because you're in a 'failed merge', you can't git pull because you get 'fatal: merging of trees' or 'Automatic merge failed; fix conflicts and then commit the result.'. You edit the files to un-conflict them and try to reapply your stash you suddenly get 'CONFLICT (content): Merge conflict in ...' again... After thrashing around for a while between git stash, git pull, updating merged files then trying to re-apply your stash before git committing... I can tell you where I wanted to stick git.
[3] If you're anything like me, you look on the words git reset --hard HEAD with some trepidation. You just can't quite believe that blowing everything away in your working copy is the only way out of a simple merge-conflict.
[4]...and please don't just tell me that git is open-source and I should just go hack on git myself if I hate it so much. In theory I absolutely agree with you, but in practice I can only work on one thing at a time - and right now I'm still working on Active Resource, some projects of my own, a novel...

Tuesday, 8 December 2009

How to monkey patch a gem

Is a gem you're using missing something small - something easy to fix?
You've made a patch and submitted it, but they just aren't responding?

I've found two ways to monkey patch it in place until they get around to pulling your patch in. They both have their pros and cons.

1) vendor the gem in place and hack

Do this if you give up on the gem people ever caring about your change (maybe the gem's been abandoned), if you're only using the gem in a single application; or the patch is only relevant to your one specific application; or if you want to put your changes into the repository for your application.

How:

  1. rake gem:unpack into vendor/gems
  2. sudo gem uninstall the gem from system if not in use for other apps
  3. add the gem code into the repository
  4. make your patch in place
  5. update the version (eg use jeweller and bump version or hand-hack the gempsec and rename the directory)
  6. add a config.gem declaration and depend on your version of the gem OR add a line to your Gemfile - and use vendored_at to point at it

Pros:

  1. you can keep track of what changes you've made to the gem in the same place as your application directory - thus it's good for changes that are very specific to your own application-code (ie aren't really relevant or shareable with the wider community or your other apps)
  2. it's pretty easy for a quick patch that you know is going to be pulled into the main gem shortly. It's easy to blow away the vendored version once the 'real' version is ready.

Cons:

  1. if you're not using gem bundle yet, it's annoying to get your application to use your custom gem
  2. it's not easily shareable between your applications if it's hidden in the vendor directory of only one - you may need some complicated extra-directory + symlinking to work...
  3. if the gem is ever updated upstream, you have to do nasty things to get the new version (hint: before upgrading, make a patch via your source control... then blow away the gem directory... then download the new gem version... then reapply your patch). :P

2) fork the github repo

If the gem is on github, you can fork the gem there - this is especially good if you're going to reuse your patched gem in multiple applications, and/or make your patches available.

How:

  1. Fork the gem OR create a github repo for the gem and push the code up there OR clone locally and create your own repo for it
  2. Make your changes, and commit them to the repository as usual
  3. In config.gem use :source => 'git::github.org/me/gemname' or gem bundle's Gemfile use :git => 'github.org/me/gemname' (or appropriate location)
  4. optionally: be nice and make a pull-request to the original repo

Pros:

  1. can easily pull changes from the upstream repository and integrate with your own patches
  2. good for sharing amongst multiple of your own applications
  3. makes your changes available to other developers that may be facing the same problem
  4. good for when the main gem is not longer under development (or only sporadically updated... or has developers that don't agree with your changes)

Cons:

  1. more work than a quick hack in vendor
  2. must be tracked separately to your own code
  3. you might not want to host outside of your own system (of course, nothing wrong with cloning then pushing to your own local repo, rather than github)

Conclusions?

We had a couple of the former and began to run into the issues stated. We discovered, of course, that quick hacks tend to turn into longer, more lasting changes so found that might as well have just done it 'properly' the first time and are now moving towards the latter solution - even setting up our own git-server for the gems we don't want to release back into the wild. YMMV :)