Tuesday 22 December 2009

Login to rubyCAS from a form

Now, as with most places, the root url for our site points at the signup-form. It's there to tempt new users to join us and start creating a new dating site immediately.
But our existing users will usually hit this page first too. Again like most other sites, we don't want to force them to click 'login' and go to some other page to fill in their credentials; so we have a login box on the signup form so they are one click away from getting to the real site.

Now, when you look at the rubyCAS implementation - all you're given is a login_url method that you can put into a link and sends you to the rubyCAS login page.

At first glance, this looks a bit like you're forced to do the login waltz:
click login... which takes you to the rubyCAS-server... you type your credentials into the box... you get redirected back to the site you really want... and one-two-three, and one-two-three...

Luckily there's a way to get your site to send credentials to the CAS-server from a controller-action, and redirect them straight on to the page they really want - so you can look like you're seamlessly logging them in without even touching the rubyCAS-server... and no, you don't even need to set your app up as a proxy server[1]. :)

The following shows how to use the existing configured CAS-server details to create a login action that will:

  1. Send the given login details to the configured rubyCAS-server for the current environment
  2. If it fails, will re-show the 'new' page (assuming that the login-form is on the same page) with a 'login failed' notice
  3. If it worked, will redirect to a url that you provide... with the new CAS-ticket appended, thus properly logging the user in.
  # assuming that the signup-actions are in the same controller, 
  # you'll probably have a filter like this
  before_filter CASClient::Frameworks::Rails::Filter, :except => [:new, :create, :cas_login]
  def cas_login
    credentials = { :username => params[:login], :password => params[:password]}

    # this will let you reuse existing config
    client = CASClient::Frameworks::Rails::Filter

    # pass in a URL to return-to on success
    @resp = client.login_to_service(self, credentials, dashboard_url)
    if @resp.is_failure?
      # if login failed, redisplay the page that has your login-form on it
      flash.now[:error] = "That username or password was not recognised. Please try again."
      @user = User.new
      render :action => 'new'
    else
      # login_to_service has appended the new ticket onto the given URL 
      # so we redirect the user there to complete the login procedure
      return redirect_to(@resp.service_redirect_url)
    end
  end

At present, the two patches required to make this work are only available on my fork of rubycas-client, but I'm told they'll be pulled into master pretty soon.

Note on URLs

You have to pass in a url that the user can get to... and that is also protected by the CAS Filter. The login_to_service method really just generates a one-time ticket that can then be used on any URL that will accept that ticket (ie anything protected by the Rails::Filter) which is what does the 'real' logging-into the system. So make sure you redirect the user there... but only if it was successful (or they'll just end up at the rubyCAS-server page - which might not be a bad fall-back).

When considering what URL to start your user off with - remember that at the time we have to construct the url, we not only don't have a 'current_user' set up yet... we also don't know if the person asking for user Bob's page is actually user Bob. So if we want to redirect to the user's account page after login (eg users_path(@some_user) we have to a) guess the user's id from login, but also b) check that they could access the page *before* they successfully log in.

If you don't do this, you can potentially compromise security. For example, a malicious user might try a whole bunch of user logins. If the url is set for, say users_path(User.find_by_login(params[:login])) - that line will explode (with a routing error for trying :id = nil) for users that don't exist - conveniently telling our malicious user which logins exist (or not) on our system. This is generally considered a Bad Thing To Do.

A better idea is to have a generic url that doesn't rely on a specific id in the route. The action can pick up the current_user's id once the user is successfully logged-in. We use something like 'my_dashboard_url' or 'my_account_url' for this purpose.


Notes

[1] Setting up an app as a rubyCAS proxy server
This is something still in the eventual plans, but not yet. Mainly because it's a bit annoying that you have to have two apps running on separate servers simply so that one could be your proxy... that cross-dependency makes it a bit awkward to me. YMMV - the instructions are all in the rubycas-client rdoc


This is one article in a series on Rails single-sign-on with rubyCAS

Rails single-sign-on with rubyCAS

As our rails apps multiply, we began looking for a single-sign-on solution - to give some sense of a seamless 'application' to our users. We've picked rubyCAS for this as it's pretty much the only single sign-on solution that's already out there and running for ruby (and comes with both a server and client gem already written, which is handy).

We'd been running restful_authentication on our main app, and so our new CAS setup had to be able to handle all the existing stories before it was acceptable as an alternative. Over the next few articles, I'll share some of the more useful little snippets I came up with.

In a nutshell?

CAS is a protocol for Authentication. It is not exclusive to ruby, but the ruby-implementation consists of two halves: the server and client.

The server is an independent 'camping' application (ie rack-style ruby). The client is a gem/plugin that fits inside each of the applications that we are writing. The client has the code that allows us to use the rubyCAS-server as an authentication method. Most of the rest of the articles will deal with using the client.

How the two talk to each other is pretty complex, and most of it is beyond the scope of this article... and mostly you don't need to know the details to get it all working. Again, in a nutshell:

Users that try to hit one of our applications will be passed over to the rubycas-server for authentication which has a login form. They type in their login details. If this matches what's in our system, then they will be given a one-time ticket and sent back to the application they started at. The application will verify this ticket with the server, and get a proper ticket and put it into a cookie that will then allow them to be continually authenticated with that application from then on.

The single-sign-on part comes from when the user tries to then access one of the other applications running rubycas-client. The new app will see the user has the cookie, and send it back to the rubycas-server.. but gets redirected back to the new app before they get actually shown the login page with a new "it's ok, we know this guy" ticket. The new app sees this ticket and now user will be automatically logged into the other application as well.

If the user clicks logout, the app destroys it's own session, and sends the user to the rubycas-server logout page... which also destroys the ticket in its own database.

That's how it all hangs together.

If you really have a burning need to know more details, go check out the rubyCAS-server documentation which has full workflows.

So what do we want?

As I said, we're re-implementing the authentication that we had as restful_authentication... so we need to implement all the features we had there... but also single-sign-on to our new applications. What follows is a minimal wishlist for replacing existing functionality:

I probably won't tackle them in order... but should get to all of them eventually.

Friday 18 December 2009

[Link] Bootstrapping passenger and Ruby Enterprise on Ubuntu

Michael Lang has provided a whole bunch of extremely useful scripts to help you install passenger and ruby enterprise on ubuntu. He's blogged about why he's provided the scripts - basically because he want everybody to bootstrap for his blogposts so they can start with the same environment set up before he goes into the details of a particular explanation.

I for one am simply grateful - but his comments are closed on the articles, so I guess I'll just have to point at him instead. :)

Thanks Michael!

Thursday 17 December 2009

Pretty error messages using locales

Default error messages are a set of concatenated strings all lumped together. This is great as a bare-bones set of errors, but does lead to clunky, unhelpful phrases such as 'Email is invalid'.

While technically correct, this can be a bit confronting, and doesn't explain to a user what they can do to fix it. After all, we already know that it's in our best interests to be nice to our users and get them up to speed with a minimum of fuss. Our error messages would be much nicer if we could use plain English and actually explain what a user can do to fix the situation. eg 'Please enter a value for Email that is longer than 5 characters. It should have an @ in it and make sure that you haven't accidentally used a comma instead of a dot'. That's not only nice and friendly - but means a user knows exactly what to do to check if they've got it right, instead of just getting frustrated.

So, how do we go about being nice to our users?

Adam Hooper has written a great article on using full sentences in validation messages. His reasoning is based on Il8n requirements which aren't necessarily everybody's cup of tea... but it's not just useful for that. It also gives us an easy way to make our error messages pretty.

The quickie version (for people that don't care about Il8n) is outlined below.

Create the locale file

All your error messages will go into the file: config/locales/en.yml.

Type in your alternative error messages in exactly the format you'd like them to appear to the user. There are special patterns that you can use to match the attributes of the error - the two main ones being the attribute name (which is substituted wherever you put {{attribute}}, or the required length of a field eg when used in validates_length of (which is substituted for {{count}}) . Here's an example of some of the more common Active Record errors rebuilt in this way:

en:
  activerecord:
    errors:
      messages:
        # Default messages: so we have complete sentences
        confirmation: "{{attribute}} doesn't match the confirmation. Please retype them and make sure they match."
        accepted: "Please read and accept {{attribute}} before continuing"
        empty: "Please enter a value for {{attribute}}"
        blank: "Please enter a value for {{attribute}}"
        too_short: "{{attribute}} is too short (the minimum is {{count}} characters). Please enter a longer one."
        taken: "Your chosen {{attribute}} has already been taken. Please choose another"

Remove any partial-errors in your models

You'll probably find that you've unwittingly entered partial error-strings in your code. eg errors.add :attr_name, 'is bad because blah blah blah'. So your next step is to search for these and turn them into real sentences too, or your error messages will be a bit out of synch. eg:

# instead of
errors.add :region, "should be from your chosen country. Please check it an try again"
# use:
errors.add :region, "The chosen region should be from your chosen country. Please check it an try again"

update the error-box to use only the error message - and not the fieldname

The last step to get your application nice is to stop using the auto-generated error messages, which automatically jam the fieldname up against the now-full-sentence errors. (producing such lovelies as Region The chosen region should be from your chosen country. Please check it an try again :P

This just consists of writing a helper method (or partial) that will take any kind of object and spit out the errors on it however you like. eg

  <div class="errors">
    <% @obj.errors.each do |f,msg| -%> 
      <li> <%= msg %> </li>
    <% end -%>
  </div>

And technically you're done...

except for shoulda...

As of my writing, Shoulda was not playing nice with locale-based error messages. My fork on git has the fix and I'm sure it'll be pulled into shoulda sometimes soon (let me know if it has!).

To use my branch, you an either vendor a copy of my branch, or add it as a :git => line in your Gemfile.

If you don't use the fix, shoulda will break because should_validate expects all of the old error messages :(

Friday 11 December 2009

Getting webistrano to deploy under passenger

What's your problem?

I'd been playing with webistrano on my own machine and using it to make deployments to staging - and that was all ticking along nicely. But then it was time to put webistrano up on our staging server - so as to let our sysadmins use it for deployments.

Downloading and installing it is very easy. Configuring it is a little more annoying, but once it was full of all the hosts/systems/rstages/roles and config_options from my existing setup iit shouldn't need any more updating.

Then I tried to deploy. At which point it promptly all fell over.

The same thing happened over and over. I'd click deploy and it'd start. The "running" spinner would spin... forever, the little ajax refresh constantly refreshing the Log... with nothing.
The deploy didn't actually deploy, and what's worse - didn't ever *stop*[1].

I did a deploy and watched top and a ruby process did appear briefly and then disappear - but no ruby process ever showed up in ps...
Not happy Jan :(

I couldn't even cancel the deploy because the cancel button only ever appears once the deployment is fully underway[2]. To deploy again, I had to break locks and do other unsavoury things and that felt really wrong.

So, what was happening and how did we eventually fix it?

Well, the suspicious thing to me was that nothing showed up in the log-section at all. Not even the "loading stage recipe X" that is the very first line of all the successful deploys on my own system.

Thus I figured on a permissions problem. I checked that the runner-user had access to svn, and to the staging server. This was interesting as staging was deploying on itself, we checked that it could ssh into itself happily... and sure enough it asked me to check the 'authenticity of the host' I was connecting to. Now webistrano is notorious for hanging on an un-expected question, so I figured I'd just have to add this to known_hosts and all would be well.

It probably was something that helped... but the deploys were still failing to spin up.

So I dug into the log and found it was chock full of the AJAX Deployment#show refreshes (they're so frequent!) But I eventually got back to the initial Deployment#create which is what should kick off the real deployment. The log for this shows that it goes along fine until right at the bottom, almost completely hidden amongst the noise is one line that rang alarm bells:
sh: ruby: command not found

So I checked permissions again, checked to be sure that ruby was installed, that we could see it in the $PATH as the deployment user, all those things.
I even did a capfile export and ran it with pure capistrano to make sure - and that worked just fine! So now I was really confused.

Finally digging into the webistrano application code, I discovered that the important LOC is in app/models/deployment.rb under def deploy_in_background. It's of the form: system("sh -c \"cd #{RAILS_ROOT} && ruby script/runner -e... etc etc. So I tried this on the command line. ruby -v worked for the deployment user.

I spun up script/console and tried system("sh -c \"ruby -v\"")
and that spat out the correct version and 'true'... so obviously rails could find ruby ok, but running in during deployment was still not happy

Then I copied the above code into the application layout... and it returned false instead of true. Something was happening when inside the running app that wasn't running from the command-line.

Then I discovered this blogpost claiming they also had the log message: sh: ruby command not found

So it seems that when the app is running - by default you're actually not inside a shell - so it's not loading your settings (such as $PATH) and thus not going to find important things (like ruby).

The solution?

Instead of sh -c we need it run under bash --login -c

This will force the process to load up your bash environment settings. The bad news is that you have to monkey-patch webistrano to get it working[3].

Given webistrano is a rails app, this isn't difficult - just annoying. There's only one spot that you need to change. That's the aforementioned deploy_in_background method. Change it there and touch tmp/restart.txt and your deployments should at least spin up now.

anything more?

There is still problem if your recipes also require some $PATH goodness. For example if you are running 'gem bundle' your shell will need to find gem... which means that the recipes need to run under bash too. Now it's a little easier to convince webistrano to do that.

You can override the shell used here by supplying a configuration option: default_shell to bash --login


Note: it's the --login that gets it to do what you want!

Also - don't forget that if you call sh -c explicitly in your recipes you'll need to change it there too.

Notes for webistrano devs:

[1]You should probably surround the deploy process in a timeout.
[2] The cancel button should appear immediately.
[3] It'd be nice if we could configure the shell-command under which webistrano runs

Wednesday 9 December 2009

The #1 thing to learn if you're new to git

Now I'm not 'new new' to git anymore, I've been using it off and on for at least a couple of years, but I've been using cvs/svn for many more years than that. git has a steep learning curve[1], and have found the git-svn crash course invaluable. But there was one thing missing:

The #1 thing I wish I had learned when I was new to git:

...is that you must commit to git *far* more frequently than you do to svn.

From my thrashing about, I have discovered there are many more 'dangerous' commands in git than in svn, and it's really easy to get yourself into a 'stuck'[2] state.

git actually provides a whole set of tools that will help you get back out of whatever hole you've dug yourself into... but you're likely to end up having lost your working-copy changes[3] along the way. So the best practice now is to commit often - so that everything is in the repository: there is never a big working copy to lose.

I had a lot of trouble with that as it considerably changes the workflow I'd built up over the last decade or so. From cvs to svn and then adding agile on top, my workflow is now roughly:

  1. Checkout a fresh, clean copy of the repository (or svn up to achieve same effect)
  2. add your tests and make some changes
  3. check the rest of the tests still run - make changes until they do
  4. do an svn st to see if there are any files I've forgotten to svn add - add them
  5. do an svn diff and see all my changes - eyeball them to make sure there's nothing obviously wrong
  6. do an svn up to pull in latest changes by others and make sure the tests *still* pass
  7. svn commit

So at all times, everything would be *not* checked in - all of it just sitting in the local working copy until I was sure it was ready.

The problem is that, on the surface, git appears to support this workflow perfectly. All the svn-commands described above have git-equivalents (they're even called the same thing) and so you can (supposedly) transition smoothly over to git with only minimal effort. Even adding a branch, rather than hacking on master is not too far a departure from svn-style workflow, as branching is familiar in svn, and git just gives you a beezier easier interface.

So where does it break down?

Well, in my case, usually at the second-last point. A git-pull can completely mess you up if you get to a merge-conflicted state. You can't commit your working-copy because of the merged state, you often can't even properly diff because you've got a mush of the git pull's changes plus your own changes and no way to tell which is which. and there seems to be no obvious way to 'just commit the merge-conflict changes' or update the files that are conflicted and just *tell* git that they're not conflicted anymore... the way you can in svn. So at this point you're screwed.

What makes it worse is that at this point you often don't know exactly what commands you did to get you here - if you're anything like me, you've probably tried a whole bunch of stuff only partly understanding exactly what it does. Each command simply tells you in it's own way that you can't do that. You can look up what you're supposed to do to fix it - but generally find that's just another command that tells you that you can't do it either. So you feel like a truck that's stuck sideways in a narrow alley and can't even understand how it got here, let alone how to get itself back out.

Frustrating!

Underlying that is the, quite reasonable, fear that you may lose all your work[3] since your last commit...

and of course that's because we're used to the underlying 'don't commit until' mentality that we may not even be aware we are sporting.

don't commit until (perfect)

The workflow I described above is a perfect example of this mentality. It makes sense to hold back on committing anything until it all works. After all, you know that the moment you commit, the CI server will pull all your changes and let everybody on the team know that you just broke the build (again). So eventually you adopt a "don't commit until the tests pass" workflow, and keep everything in your working copy until everything's green before committing to the svn repository. Fostering this "don't commit until it's right" mentality is a natural consequence of not wanting to look like an idiot to your colleagues, and works in perfectly fine with the svn-based workflow.

but git doesn't work that way!

Or should I say that git doesn't *need* to work that way. After all, you still need to make sure that your tests pass and don't break on the CI server... but what I've found is that you need to get over the whole git commit thing. It may be named the same thing as svn commit - but it doesn't mean the same consequences (eg that your colleagues will all see that the feature's only half-complete and the tests are all spazzing out).

Instead, change the way you think: the command that *matters* now is actually git push. You can commit whatever you like to your local repository, even if it breaks the build; it's only when you push up to the remote repo that it must be working perfectly.

Any other problems with this?

Unfortunately, there are some other consequences to this small change in workflow. One of them being the fact that you can't do a 'git diff' that covers all your changes since last push. git status and git diff are *just* like svn status and svn diff - they check against the latest commit, not the latest 'push to remote', which means it's hard to do a complete check of all your changes before going 'live'... you have to just trust that all your smaller commits all add up to the Right Thing.

That makes me feel uncomfortable as I like to be sure. I know about human error - and I know that I'm as prone to it as the next guy...

Having to make a patch-against master and then read through *that* (which is far less clear to read than a diff) is not a good substitute, IMO. If anybody has a good way on how to mutate the workflow to accommodate this I'd love to know.

a new workflow?

I'm still working on this but so far I've got:

  1. Clone a fresh, clean copy of the repository (if I don't already have one)
  2. git checkout -b my_new_branch
  3. add tests and/or make some changes
  4. do a git diff and check this change - eyeball it to make sure there's nothing obviously wrong
  5. git commit (then repeatedly return to step 3 until 'done')
  6. check the rest of the tests run - make changes (and git commit) until 'done done'
  7. do a git pull origin master to pull in latest changes by others and make sure the tests *still* pass
  8. fix any merge-conflicts and commit the merge
  9. git push

This is still a work-in-progress, and I would appreciate informed opinions, advice or your own war stories.

Notes:
[1] IMO the git developers could learn a thing or two from Kathy Sierra... but that's another topic.[4]
[2] If you've ever got into a state where you can't run git commit because you're in a 'failed merge', you can't git pull because you get 'fatal: merging of trees' or 'Automatic merge failed; fix conflicts and then commit the result.'. You edit the files to un-conflict them and try to reapply your stash you suddenly get 'CONFLICT (content): Merge conflict in ...' again... After thrashing around for a while between git stash, git pull, updating merged files then trying to re-apply your stash before git committing... I can tell you where I wanted to stick git.
[3] If you're anything like me, you look on the words git reset --hard HEAD with some trepidation. You just can't quite believe that blowing everything away in your working copy is the only way out of a simple merge-conflict.
[4]...and please don't just tell me that git is open-source and I should just go hack on git myself if I hate it so much. In theory I absolutely agree with you, but in practice I can only work on one thing at a time - and right now I'm still working on Active Resource, some projects of my own, a novel...

Tuesday 8 December 2009

How to monkey patch a gem

Is a gem you're using missing something small - something easy to fix?
You've made a patch and submitted it, but they just aren't responding?

I've found two ways to monkey patch it in place until they get around to pulling your patch in. They both have their pros and cons.

1) vendor the gem in place and hack

Do this if you give up on the gem people ever caring about your change (maybe the gem's been abandoned), if you're only using the gem in a single application; or the patch is only relevant to your one specific application; or if you want to put your changes into the repository for your application.

How:

  1. rake gem:unpack into vendor/gems
  2. sudo gem uninstall the gem from system if not in use for other apps
  3. add the gem code into the repository
  4. make your patch in place
  5. update the version (eg use jeweller and bump version or hand-hack the gempsec and rename the directory)
  6. add a config.gem declaration and depend on your version of the gem OR add a line to your Gemfile - and use vendored_at to point at it

Pros:

  1. you can keep track of what changes you've made to the gem in the same place as your application directory - thus it's good for changes that are very specific to your own application-code (ie aren't really relevant or shareable with the wider community or your other apps)
  2. it's pretty easy for a quick patch that you know is going to be pulled into the main gem shortly. It's easy to blow away the vendored version once the 'real' version is ready.

Cons:

  1. if you're not using gem bundle yet, it's annoying to get your application to use your custom gem
  2. it's not easily shareable between your applications if it's hidden in the vendor directory of only one - you may need some complicated extra-directory + symlinking to work...
  3. if the gem is ever updated upstream, you have to do nasty things to get the new version (hint: before upgrading, make a patch via your source control... then blow away the gem directory... then download the new gem version... then reapply your patch). :P

2) fork the github repo

If the gem is on github, you can fork the gem there - this is especially good if you're going to reuse your patched gem in multiple applications, and/or make your patches available.

How:

  1. Fork the gem OR create a github repo for the gem and push the code up there OR clone locally and create your own repo for it
  2. Make your changes, and commit them to the repository as usual
  3. In config.gem use :source => 'git::github.org/me/gemname' or gem bundle's Gemfile use :git => 'github.org/me/gemname' (or appropriate location)
  4. optionally: be nice and make a pull-request to the original repo

Pros:

  1. can easily pull changes from the upstream repository and integrate with your own patches
  2. good for sharing amongst multiple of your own applications
  3. makes your changes available to other developers that may be facing the same problem
  4. good for when the main gem is not longer under development (or only sporadically updated... or has developers that don't agree with your changes)

Cons:

  1. more work than a quick hack in vendor
  2. must be tracked separately to your own code
  3. you might not want to host outside of your own system (of course, nothing wrong with cloning then pushing to your own local repo, rather than github)

Conclusions?

We had a couple of the former and began to run into the issues stated. We discovered, of course, that quick hacks tend to turn into longer, more lasting changes so found that might as well have just done it 'properly' the first time and are now moving towards the latter solution - even setting up our own git-server for the gems we don't want to release back into the wild. YMMV :)

Thursday 26 November 2009

gem bundle with capistrano

[Update: There's a capistrano recipe in bundler now, if you just require "bundler/capistrano" apparently it automatically runs bundler install and installs the gems in shared/vendor_bundle]

Ok, so now you've converted your existing rails project to using gem bundle, and it runs fine on your development box, how do you update your capistrano recipes to do it automatically?

Current wisdom is varied. You can:

  1. check *everything* into your source control (very heavy!)
  2. check in only your Gemfile and the vendor/bundler_gems/cache (which holds all the *.gem files) into source control (still pretty heavy, but makes sure you don't need internet access every time you deploy - use 'gem bundle --cached' in your recipes
  3. symlink your vendor/bundler_gems/cache directory to a shared location, and then all you need to checkin is the Gemfile (quicker, still needs to unpack/bundle gems each time)
  4. symlink all your vendor/bundler_gems/ directories to a shared location, and then all you need to checkin is the Gemfile (quick and not too heavy)
  5. symlink the entire vendor/bundler_gems directory to a shared location (much quicker).

5: Symlink the lot!

This would be my preference. For a single project deployed on production, there should be no reason why we can't just symlink the whole bundled-gem directory, and let the Gemfile keep that directory up-to-date. This feels no different to using a shared file-attachments directory.

Sadly doesn't work due to this:
No such file or directory - MyProjectRailsRoot/vendor/bundler_gems/../../Gemfile
which will resolve to two directories above the *shared symlinked* directory :P

This is because the bundled_gems generates an 'environment.rb' that points back up the directory at the Gemfile that created it... by using a relative path that unfortunately hits the symlink as described above. It'd be nice to be able to tell gem_bundler to make that link absolute...

If anyone knows a way around this, please do let me know!

So we fall back on 3 or 4. My first attempt was to use 3:

3: ci the Gemfile, symlink the cache directory

This seems to be a reasonable compromise. Our deployments are getting pretty damn bloated as it is - with all those plugins and vendored rails etc etc... we don't want to add anything more if we can avoid it. Even gems. We have had no problem with downloading gemfiles, so there's no need to check them in, as long as the deployed directory has access to them when needed. Thus, checking the Gemfile, symlink the cache directory... maybe even symlink, the 'gems' directory sometime too.

So, how to do it?

First - prepare the bundler_gem directory for the first time login to your deployment server, and create the shared directory in, eg:

mkdir ../shared/bundler_gems 
mkdir ../shared/bundler_gems/cache

Next add the symlink and gem bundle commands to your capistrano deployment recipe:

# symlink to the shared gem-cache path then bundle our gems
namespace :gems do
  task :bundle, :roles => :app do
    run "cd #{release_path} && ln -nfs #{shared_path}/bundler_gems/cache #{release_path}/vendor/bundler_gems/cache && gem bundle"
  end
end

after 'deploy:update_code', 'gems:bundle'

And you *should* be able to deploy fine from there. My experience wasn't completely smooth, and I had to set up on the server manually the first time - but from then on the capistrano-task worked fine. Would love to hear your own experiences...

Hudson

If you're using hudson for Continuous integration, you can achieve the same effect by updating the Hudson build script adding:

# prepare gems and shared gem-cache path
ln -nfs <hudson config path>/bundler_gems/cache  vendor/bundler_gems/cache
gem bundle

Also - if you have test/development-environment-only gems... make sure you add your integration-server's environment to the Gemfile environment list, or Hudson won't be able to find, say, rcov or rspec - and then the build will break.

4: ci the Gemfile, symlink the sub-directories

I started with the solution described above, but it still takes a while unpacking all the gems. I'd much prefer to use the solution #5, but that fails due to the relative links in environment.rb. So the ugly compromise is to symlink everything *except* the environment.rb

It works and we can deploy ok with capistrano... but I'm looking for a better solution.

...and after a few deploys...

Well, it was a nice try, but after a few deploys suddenly we started getting breakages... the application couldn't find shoulda for some reason. Now we use a hacked version of shoulda and have vendored it as that's a quickie way to monkey patch a gem.

We told gem bundle where we'd vendored it, and it all seemed to work fine. Unfortunately it broke, and a quick look at the symlinked 'gems' directory tells us why:

rails-2.3.4 -> /var/www/apps/my_app/releases/20091201120443/vendor/rails/railties/
shoulda-2.10.2 -> /var/www/apps/my_app/releases/20091201120443/vendor/bundler_gems/dirs/shoulda

What you can't see are these lines blinking red. What you can see are that these gems are pointing at a specific revision's vendor directory... and not the latest one! Coupled with a :keep_releases = > 4... that release's directory is quite likely to disappear very quickly - and in this case, it already has. This makes these symlinks (set up by gem bundle during release 20091201120443) point to a vendor directory that no longer exists. So it's really not as much of a surprise that our app can't find shoulda anymore.

I think the problem comes along because of gem bundle's rather useful feature of not reloading a gem if it's already installed. It looks to see if the specification exists - if so, it assumes that it's been installed correctly - it doesn't check that the vendored location still exists. That unfortunately spells our doom when capistrano deploys: because gem bundle runs from the newly-created release-directory, that's where the symlink is initially set up. gem bundle doesn't then check it later - even though later its been swept away in a release-cleanup.

So - we're currently working on a fix. Two options present themselves:
1) find a way for gem bundle to symlink from 'releases/current'. This means it has to exist before we do a gem bundle... and that's dangerous because it lets users through into a not-yet-working release. Or
2) we could not vendor any gems - but set up our own gem-server. This will work, but a bit more of a hassle than we prefer for vendored gems.

Troubleshooting:

Git repository 'git://github.com/taryneast/shoulda.git' has not been cloned yet

The recipes online all tell you to use gem bundle --cached and that's a great idea if network connectivity is scarce. As it uses gems that are already in a cache (without going to fetch them remotely)... but it will fail if you don't already have the gems in the cache. SO it relies on you applying solution number 2 above.

There are two solutions:
The first is to just use gem bundle always in your recipes.
The other is to use gem bundle on your development directory - then check the cache into your source control instead of symlinking it to a shared directory. (ie use solution 2 above). This will work just fine, and if you add a new gem, you'll have to make sure you run gem bundle on your working directory before checking in.

I'm not sure of a nice way to get around this if you're using solution 3. It might be worth setting up a rake task that checks if the Gemfile changed and opportunistically runs gem bundle instead of the --cached version (in fact, it'd be nice if gem bundle --cached had a --fetch-if-not-present option!). If you have a solution that worked for you, let me know.

rake aborted! no such file to load -- spec/rake/spectask

I got this when running rcov. It just means you need to add a few extras gems to your Gemfile. We needed all of these. YMMV

  only [:development, :test, :integration] do
    gem 'rcov'  # for test coverage reports
    gem 'rspec' # rcov needs this...
    gem 'ci_reporter' # used somehow by rake+rcov
    gem 'ruby-prof' # used by Hudson for profiling
  end

:bundle => false with disable_system_gems

Looks like these two don't play nice. ie if you choose for a single gem to be unbundled - but have disable_system_gems set - it isn't smart enough to realise that this one gem is meant to be an exception to the 'disable' rule. If you have any unbundled gems, make sure you remove disable_system_gems - or your application simply won't find it.

Wednesday 25 November 2009

Convert from config.gem to gem bundler

Why gem bundler?

Our sysadmin hates rake gems:install

It seems to work for me, but all sorts of mayhem ensues when he tries to run it on production... of course I have a sneaking suspicion it's really our fault. After all - we tend to forget that we already happened to globally-install some gem while we were just playing around with it... which means we didn't bother putting it into the config.gem section of environment.rb... oops!

However, there's a new option on the horizon that looks pretty interesting, and is built to sort out some of the uglier issues involved in gem-herding: gem bundler

Yehuda has written a very thorough a tutorial on how to set up gem bundler. But I find it kinda mixes rails and non-rails instructions and it's not so clear on where some things go. I found it a little fiddly to figure out. So here's the step-by step instructions that I used to convert an existing Rails 2.3 project.

Step-by-step instructions

1: Install bundler

sudo gem install bundler

2: create the preinitializer.rb file

Yehuda gave some code in his tutorial that will load everything in the right order. You only need to copy/paste it once and it will then Just Work.

Go grab the code from his tutorial (about halfway down the page) and save it in a file: config/preinitializer.rb

Don't forget to add that file to your source control!

Update: If you're using Passenger, update the code to use:

module Rails
  class Boot
  #...
  end
end

Instead of:

class Rails::Boot
  #...
end

3. create the new gems directory

mkdir vendor/bundler_gems

Add this directory to your source control now - while it's still empty!

While you're at it, open up your source-control's ignore file (eg .gitignore) and add the following:

vendor/bundler_gems/gems
vendor/bundler_gems/specifications
vendor/bundler_gems/doc
vendor/bundler_gems/environment.rb

4. Create the Gemfile

Edit a file called Gemfile in the rails-root of your application (ie right next to your Rakefile)

At the top, add these lines (comments optional):

# because rails is precious about vendor/gems
bundle_path "vendor/bundler_gems"
# this line forces us to use only the bundled gems - making it safer to
# deploy knowing that we won't accidentally assume a gem in existence
# somewhere in the wider world.
disable_system_gems

Again: don't forget to add the Gemfile to your source control!

5. Add your gems to the Gemfile

Now comes the big step - you must copy all your config.gem settings from environment.rb to Gemfile. You can do this almost completely wholesale. For each line, remove the config. from the beginning and then, if they have a version number, remove the :version => and just put the number as the second param. I think an example is in order, so the following line:
config.gem 'rubyist-aasm', :version => '2.1.1', :lib => 'aasm'
becomes:
gem 'rubyist-aasm', '2.1.1', :require => 'aasm'

For most simple gem config lines, this should be enough so that they Just Work. For more complicated config.gem dependencies, refer to the Gemfile documentation in Yehuda's tutorial.

If you already have gems in vendor/gems You can specify that bundler uses them - but you have to be specific about the directory. eg:
gem 'my_cool_gem', '2.1.1', :path => 'vendor/gems/my_cool_gem-2.1.1'

Extra bonus: if you have gems that are *only* important for, say, your test environments, you can add special 'only' and 'except' instructions (or whole sections!) that are environment-specific and keep your staging/production environments gem-free eg:

gem "sqlite3-ruby", :require => "sqlite3" , :only => :development
only :cucumber do
  gem 'cucumber'
  gem 'selenium-client', :require => 'selenium/client'
end
except [:staging, :production] do
  gem 'mocha'          # http://mocha.rubyforge.org/
  gem "redgreen"       # makes your test output pretty!
  gem "rcov"
  gem 'rspec'
end

5a: Add rails

Now... at the top of the Gemfile, add:
gem 'rails', '2.3.4'
(or whatever version you currently use)... otherwise your rails app won't do much! :)

Obviously - if you've vendored rails you will need to specify that in the Gemfile way eg:
gem 'rails', '2.3.4', :path => 'vendor/rails/railties'

If you've opted *not* to disable_system_gems, you won't need this line at all. Alternatively, you could tell the Gemfile to use the system-version anyway thus:
gem 'rails', '2.3.4', :bundle => false

Also, I'd recommend adding the following too:

 gem 'ruby-debug', :except => 'production'  # used by active-support!

6. Let Rails/Rake find your gems:

Edit 'config/environment.rb' and at the bottom (just immediately after the Rails::Initializer.run block, add:

# This is for gem-bundler to find all our gems
require 'vendor/bundler_gems/environment.rb' # add dependenceies to load paths
Bundler.require_env :optional_environment    # actually require the files

7. Give it a whirl

From the rails root directory run gem bundle

The bundler should tell you that it is resolving dependencies, then downloading and installing the gems. You can watch them appearing in the bundler_gems/cache directory :)

and you're done!

...unless something fails and it can't find one - which means you probably forgot to add it to config.gems in the first place! ;)

PS: ok... so I've also noticed you sometimes have to specify gems that your plugins use too - so it may not be entirely your fault... ;)

PPS: if Rake can't find your bundled gems - check that config/preinitializer.rb is set up correctly!

Wednesday 21 October 2009

shoulda should play nice with Il8n errors

...but it doesn't.

It took me a while to figure out what was going wrong with it, but I discovered that ti was checking against the literal locale-based string (eg "Please enter a value for {{attribute}}") rather than the string that the error would really show up as (eg "Please enter a value for Name").

I've put a fix into my shoulda branch on github along with a new macro: should_ensure_length_at_most (the opposite of the existing should_ensure_length_at_least). If you wanna just patch an existing system, I've put up a pastie of my patch here.

Friday 4 September 2009

UUID over the wire

Need to use Active Resource on a remote object that has a UUID?

Last time I checked, Active Resource still overwrites the id with a to_i version of the uuid... this makes "123ABCDE456" turn into 123... not what we want.

But Hyperactive Resource (HyRes) works just fine.

UUID created by the remote system

Does the remote API create the UUID for you? If so - you're laughing. Just add: self.primary_key = :uuid to your HyRes object and you're set. HyRes doesn't do a to_i conversion, so the UUID will still be stored as a string.

Need to set UUID locally?

Install the 'uuidtools' gem then add a before_create callback (in your resource) thus:

# for uuid-generation
require 'uuidtools'
class Widget < HyperactiveResource
  self.site = 'http://localhost:3001' # etc...
  self.columns = [:uuid, :widget_type, :name, :description] # etc...

  self.primary_key = :uuid
  before_create :generate_uuid

  # helper to generate a uuid for each asset
  def generate_uuid
    self.uuid = UUIDTools::UUID.timestamp_create.to_s
  end
end

Friday 7 August 2009

Faking startup and shutdown with ActiveSupport::TestCase

Time-was we didn't just have setup and teardown in our Tests (which are run for every single test case)... we could also have a startup/shutdown pair that ran once per proper Test-case class (ie once at the beginning of all tests in a set).

This was a perfect place to put run-once-only things - especially useful if, say, you wanted to run ruby-prof over your tests. You could put the RubyProf.start in startup and have it print out a callgraph in the shutdown...

ActiveSupport::TestCase doesn't let us have our startup/shutdown anymore... so what to do?

Fake it with at_exit

  # quick-and-dirty class var to tell us if we've already registered the
  # at_exit RubyProf function.
  @@register_done = false
  # register the stop-ruby-prof stuff
  # We want it to spit out a RubyProf callgraph after it's run this test
  # case.
  # You can either: call this function directly in a single test case or
  # test-suite eg: "register_rubyprof"
  # If you pass a value to "RUN_RUBY_PROF" on the command-line - it will
  # also run this register function for you. eg:
  # RUN_RUBY_PROF=true rake test:units TEST=test/unit/widget_test.rb
  def register_rubyprof
    require 'ruby-prof'
    return if @@register_done
    p "starting RubyProf"
    RubyProf.start

    path_base = "#{RAILS_ROOT}/tmp/performance/ruby-prof"

    p "started! Now registering exit-function"
    at_exit do
      p "All done - stopping RubyProf" 
      result = RubyProf.stop

      p "Stopped! now printing call graph"
      timestamp = Time.now.strftime('%Y-%m-%d-%H-%M-%S')
      # try and make the callgraph filename meaningful to the test-run
      filename_base = "rubyprof_#{self.class.name}_#{timestamp}"

      # Print a call tree profile to text for kcachegrind to use
      printer = RubyProf::CallTreePrinter.new(result)
      printer.print(File.open("#{path_base}/#{filename_base}_calltree.txt", 'w'), 0)
      p "All Done. Good luck profiling!"
      end
    @@register_done = true
  end
  # will setup the reguster-rubyprof function if we have set the
  # RUN_RUBYPROF constant to true
  setup :register_rubyprof if RUN_RUBY_PROF

Thursday 6 August 2009

HyRes : ActiveResource that actually works!

It's three months on from my original post whinging about the lack of Validations in Active Resource.

At that time I put my money where my mouth was and forked a copy of the HyperactiveResource plugin, which provided a very crude, basic improvement over vanilla Active Resource.

So What have I achieved?

I've actually done a lot since then. I have implemented a lot of the TODOs that I wrote down as essential for us to get a good, solid basis for a workable system (see the done list below).

I'd class this plugin as working and functionally useful. Right now it's almost as useful as Active Record was a few years back... which is a vast improvement on the basic Active Resource interface.

I think there's still a lot of room for improvement, but the basics are there, and it *feels* like Active Record now - which before it definitely did not. Before it felt like you could play around with it for toy systems, or implement minor bits of functionality - but now you can really implement a fully-functional system on top of it. I know this - because we have done[1].

What's missing?

HyRes still has some niggling issues, and some stuff that would make it a much smoother migration for Active Record. I'm working on these and there is a perpetually-renewed TODO list on the HyRes README file

The funkier aspects of Active Record

I haven't yet re-implemented proper AR reflections - so the associations need some work... you can't do: widget.wodgets.find(...) and you can't use named_scope - which means that many basic plugins (eg restful_authentication) won't work out-of-the-box. But there's enough functionality there that pulling together a system that is totally non-Active Record is feasible.

API assumptions

Currently HyRes makes some assumptions about your API - it assumes a RESTful interface as the optimal configuration. If your API does not match the assumptions, there are some workaround available, but it might not be as useful.

An example is the validates_uniqueness_of method. This currently assumes that your API takes an array of conditions on the query-string, and that this will filter your returned set of findable objects.

If your API doesn't do this... the method currently defaults to fetching *all* objects and filtering on the rails-end... which is likely to be extremely slow (and may lead to timeouts). But it's there in case it's necessary. You may have to simply re-write that method with your own API-requirements in mind. I welcome alternative solutions to the problem...

testing...

Probably one of the nastier downsides atm is that HyRes doesn't have its *own* set of tests... We currently test it through the full suite of tests on our own system. This is mainly due to the fact that it's really quite hard to set up tests that don't rely on an existing API that's up and running. In our system we use a combination of mocks in our functional tests, backed by a fully-functional, running mock API for our integration tests... this is near-impossible for the HyRes plugin itself... so I'm still thinking about how I'll test it independently.

So what *does* it do?

The current list of implemented features (along with how to set up your API to use them) is also available on the HyRes README file but a quick snapshot of the list as of today is below. Note - in some cases, I still find it hard to believe that Active Resource didn't already implement these...

Columns

Because Active Resource isn't backed by a db... you can't use the table columns to determine the known set of attributes of a resource. ARes currently works by accepting any incoming data as an attribute, and using MethodMissing as an accessor for any known attribtues. This is fine for situations where you don't know what attributes will be returned by the remote system.

The downside is that if no value is returned for an attribute and you try to access it... it throws a MethodMissing exception (uuugly!).

If, however, you know what attributes to expect (because it's an agreed API-interface ), it'd be nice to be able to tell Active Resource which attributes we expect - and have it return any missing attributes an nil (rather than explosion).

Thus was born the columns= method.

Currently this method also works a little like attr_accessible - any attributes listed like this will be passed through on create/update... but nothing else will. This allows you to set temporary accessors (eg password and password_confirmation without them being passed over the wire.

Validations

directly pulls them in from ActiveRecord::Validations. For Rails 3.0 - we can use the ActiveModel::Validations component.

Note: validates_uniqueness_of is still experimental - as it requires a network-hit to actually determine whether you have any existing objects with the given field-value. You can't rely on uniqueness-of as there is no way of locking the remote-system's db - and thus somebody could have added a resource with that value in the meantime.

Callbacks

There are now callback hooks for before_validate, before_save etc - all the standard Active Record callbacks we've come to know and love. You can use them for callback chaining a la Active Record (still experimental - may have missed some, but you can definitely use validate :my_validation_method)

Finders

conditional finders

eg Widget.find(:all, :conditions => {:colour => 'blue'}, :limit => 5). This functionality working depends on your API accepting the above filter fields and returning something sensible. There's a lot of doco on how to set up your API (if you have that luxury) in the HyRes README

Dynamic finders and instantiators

eg find_by_X, find_all_by_X, find_by_X, find_last_by_X These rely on your API accepting filter fields as they are really just a bit of a convenient alias for find(:conditions => X). Dynamic finders take any number of arguments using _and_: find_by_X_and_Y_and_Z

We also have dynamic instantiators: find_or_create_by_X OR find_and_instantiate_by_X - both of which also take any number of args, just like the dynamic finders.

Dynamic finders/instantiators also take ! eg: find_or_create_by_X! will throw a ResourceNotValid exception if create fails

Other ActiveRecord::Base-like functions

  • update_attribute / update_attributes - actually exist now!
  • save! / update_attributes! / update_attribute! / create! - actually exist, now that we can do validation. They raise HyperactiveResource::ResourceNotSaved on failure to save/validate
  • ModelName.count (still experimental) - with optional finder-args. This works by first trying a /count request on your model object - and if the route fails it just pulls out all the objects and returns the length (ie - a nasty - but functional fallback).
  • updated collection_path that allows suffix_options as well as prefix_options = allows us to generate Rails-style named routes for collection-actions
  • no explosion for find/delete_all/destroy_all when none are to be found. (Active Record just returns nil - so should we)
  • ActiveRecord-like attributes= (updates rather than replaces)
  • ActiveRecord-like load that doesn't dup attributes (stores direct reference)
  • reload that does a full clear/fetch to reload (also clears associations cache, now that we have one)

Associations

A lot of the basic associations were in HyRes before I arrived - I've polished a few up (eg adding the belongs_to/has_many functions) and bugfixed

  • Resources can be associated with records
  • Records can be associated with records
  • Awareness of associations between resources: belongs_to, has_many, has_one (note - no HABTM yet)
    • Patient.new.name returns nil instead of MethodMissing
    • Patient.new.races returns [] instead of MethodMissing
    • pat = Patient.new; pat.address_id = 1; pat.address # returns the address object
  • Can fetch associations even with a nested route by using the ":nested" option on the nested resource's class. This command automatically adds a prefix-path, and will pre-populate the parent's id when you do an association collection_fetch.
  • Supports saving resources that :include other resources via:
    • Nested resource saving (creating a patient will create their associated addresses)
    • Mapping associations ([:address].id will serialize as :address_id)

What's next?

Well, one big thing that's next is that I'm planning on starting to work this stuff back into Rails itself. The brilliant component-architecture and upcoming Active Model changes for Rails 3 are just made for this. What I've been doing on HyRes can feed into that process to make Active Resource what we've always wanted - a real replacement-candidate for Active Model. For this purpose I have a fork of rails - but work will progress slowly (as I'm working on it in my spare time).

Until then, I'll keep on upgrading HyRes as needed for our own corporate requirements. Below, I've listed some features that Active Record provides and that I'd *personally* like to see implemented next... but this is certainly not an exhaustive list. (and I'm very open to other ideas...)

One thing I'd love to see is other people using HyRes for their own projects. I'd love to hear feedback on how it works (or not) as that will feed the project and feed into the work that eventually goes back into Rails. Better-still, if you're willing to work on implementing some of the still-missing features... you would be most welcome. Feel free to check out the Hyperactiveresource project on github and have a go.

Still TODO

  1. Testing - inside the plugin! (currently we test via our actual web-app's set of exhaustive tests)
  2. MyModel.with_scope/named_scope/default_scope
  3. MyModel.find(:include => ...)
  4. attr_protected/attr_accessible
  5. MyModel.calculate/average/minimum/maximum etc
  6. Reflections. There should be no reason why we can't re-use ActiveRecord-style reflections for our associations. They are not SQL-specific. This will also allow a lot more code to automatically Just Work (eg an Active Record could use has_many :through a HyRes)
  7. has_and_belongs_to_many
  8. Split HyRes into Base and other grouped functions as per AR
  9. validates_associated
  10. write_attribute that actually hits the remote API ???
  11. a default format for when it doesn't understand how to deal with given mime-formats? One which will just pass back the raw data and let you play with it?
  12. cache the raw (un-decoded) data onto the object so we don't have to do a second fetch? Or at least allow a universal attribute to be set that turns on caching

Notes

[1] The subject of an upcoming-blogpost will be on the work we've done: "Rewriting monolithic legacy systems in Rails"

Wednesday 29 July 2009

CSV views with FasterCSV and csv_builder

We were asked to add a "Download as CSV" link to the top of each of our reporting actions. We already had a "revenue_report" controller-action with it's own html view template... and just wanted to add something similar in CSV.

FasterCSV seemed to provide some great CSV-generation functionality... but it builds the csv and spits it out right there in the controller.

uuuuugly!

We shouldn't put view-generation code into the controllers - that splits up our nicely-decoupled MVC stack!

I've also seen some ideas about putting a to_csv method into an individual model. Now, if I'm leery about putting view-generation in the controller... I'm even less impressed by putting it into the model! But I can see the benefits of simplicity.

However - I'm still not sure it fits with our requirements. A @widget.to_csv call works just fine for the WidgetsController#show action... and probably even the WidgetsController#index action (where you can run @widgets.map &:to_csv or similar)... but most of our reports span multiple associations and the data is grouped, filtered, and summarised. Each of these sets needs headings explaining what it is and laying it out nicely in a way that makes sense to the user reading the CSV file. Putting that kind of visual-only functionality in your model is where it really starts to get ugly again. I am left with one big thought:

Why can't I just put my CSV-template into the views directory?

enter: csv_builder plugin

csv_builder comes with extremely minimal documentation... but it's not that hard to use. Here's a step-by-step with examples to follow:

  1. install fastercsv and csv_builder
  2. add a .csv option to the respond_to part of your action
  3. add your <actionname>.csv.csvbuilder template into your views directory
  4. Add a "Download as CSV" link for your user

Step 1: Install FasterCSV & csv_builder

sudo gem install fastercsv
./script/plugin install git://github.com/econsultancy/csv_builder.git

Don't forget to restart your server!

csv_builder has already set up a format-extension handler for csv so you don't need to add it to the mime-types yourself, just start using it in your actions.

Step 2: In your controller action:

csv_builder will work out of the box even if you just add the simplest possible respond_to statement:

  respond_to |format| do
    format.html # revenue_report.html.erb
    format.xml  { render :xml => @payments }
    format.csv  # revenue_report.csv.csvbuilder
  end

With nothing more than the above, csv_builder will grab out your template and render it in csv-format to the user.

You can tweak the FasterCSV options by setting some @-params in your action. The csv_builder gem README explains how to use those, but the most useful would be to set @filename to give your users a meaningful filename for their csv-file. You can also specify things like the file-encoding, but go look at the gem README to find out the details

Here's an example of a full controller action:

# GET /sites/1/revenue_report
# GET /sites/1/revenue_report.xml
# GET /sites/1/revenue_report.csv
def revenue_report
  @site = Site.find(params[:id])
  @payments = @site.revenue_report_payments
  respond_to do |format|
    format.html # revenue_report.html.erb
    format.xml  { render :xml => @payments }
    format.csv do
      # timestamping your files is a nice idea if the user runs this action more than once...
      timestamp = Time.now.strftime('%Y-%m-%d_%H:%M:%S')
      # passing a meaningful filename is a nice touch
      @filename = "revenue_report_for_site_#{@site.to_param}_#{timestamp}.csv"
    end # revenue_report.csv.csvbuilder
  end
end

Step 3: add a template into your views directory

The final step is to add a .csv.csvbuilder template to your views directory. This is just like adding an html.erb template for your action, except that the file will have the extension .csv.csvbuilder. AFAIK csv_builder can only support template that have the same name as your action, so in my example the template would be called revenue_report.csv.csvbuilder

Your view template is where you can stash away all the FasterCSV guff that will build your csv file. This works the same way as any other FasterCSV-generated content - but just make sure you generate it into a magic array that is named 'csv'. Here's an example:

  # header row
  csv << ["Revenue report for site: #{@site.name}"]
  csv << [] # gap between header and data for visibility

  # header-row for the data
  csv << ["Date", "", "Amt (ex tax)", "Amt (inc tax)"]

  row_data = [] # for scoping
  @payments.each do |payment|
    row_data = [payment.created_at.to_s(:short)]     
    row_data << "" # column gap for visibility
    # note you can use the usual view-helpers
    row_data << number_to_currency(payment.amount_ex_tax)
    row_data << number_to_currency(payment.amount_inc_tax)

    # and don't forget to add the row to the csv
    csv << row_data.dup # it breaks if you don't dup
  end # each date in date-range
  csv << [] # gap for visbility

  # and the totals-row at the very bottom
  totals_data = ["Totals", ""] # don't forget the column-gap
  totals_data << @payments.sum &:amount_ex_tax
  totals_data << @payments.sum &:amount_inc_tax
  csv << totals_data

Step 4: add a CSV link

Something like the above code will generate a nice plain csv file and spit it out at the user in the correct encoding... but now we need something to actually let the user know that they can do this.

This is pretty simple - it just requires adding the format to your usual path-links. eg:

<%= link_to 'Download Report as CSV', site_revenue_report_path(:id => @site.to_param, :format => :csv) -%>

Testing

Testing for non-html formats still seems pretty crude. You have to manually insert the "accepts" header in the request before firing off the actual request - then manually check the response content_type. It'd be nice if we could just add :format => :csv to the request... anyway, here's a sample shoulda-style test case:

  context "CSV revenue_report" do
    setup do
      @request.accept = 'text/csv'
      get :revenue_report, :id => @site.to_param
    end

    # variables common to all formats of revenue-report
    should_assign_to :site
    should_assign_to :payments

    should_respond_with :success
    should "respond with csv" do
      assert_equal 'text/csv', @response.content_type
    end
  end # context - CSV for revenue report

Gotchas:

Don't forget to add config.gem!"

I fond I needed config.gem 'fastercsv' and a require 'csv_builder' in my environment.rb. You may need a gem-listing for both (esp if you're using bundler)

Rails 3 compatitbility?

The original csv_builder plugin is not rails 3 compatible and is in fact no longer being supported. But it has officially changed hands, to the newer csv_builder gem. This is Rails-3 compatible and not backwards compatible - though he maintains a 2.3.X branch for people to submit bugs.

csvbuilder or csv_builder

While the plugin is named csv_builder, be careful to name your files with the extension: csv.csvbuilder or you'll spend hours pulling your hair out about a Missing template error while it fruitlessly searches for your csv.csvbuilder file!

All your report data must be generated in the controller!

This makes you more MVC-compliant anyway, but if you had any collections being fetched or formatted in the view... now's the time to move them into the controller action as your CSV-view will need them too.

duplicate inserted arrays

You'll notice that the example template has a set of data being added to the csv array... you'll also notice that each row is generated on the fly - then I save a *duplicated* version into the csv array. If you don't duplicate it... you may do funky things with overwriting the row each time. In my case I also needed to declare the temporary array outside the loop in order to preserve scope. YMMV - I'd appreciate any Ruby-guru answer as to why this doesn't work without that.

don't reset csv

Don't do this: csv = [] it breaks everything!

Rendering as html

I've noticed there's a pathalogical condition where everything is going well - and it's even getting o the template... but it's *still8 rendering html.

I eventually figured out that it's actually rendering the layout around the csv... and it occurs when you've explicitly stated a layout (eg layout 'admin') somewhere at the top of your controller. To fix it, you just need an empty render without layout false eg

  respond_to do |format|
    format.csv do
      timestamp = Time.now.strftime('%Y-%m-%d_%H:%M:%S')
      @filename = "revenue_report_for_site_#{@site.to_param}_#{timestamp}.csv"
      render :layout => false  # add this to override a specifically-stated layout
    end # revenue_report.csv.csvbuilder
  end

Tuesday 28 July 2009

rails gotchas: undefined method `expects'

If you've just installed rails edge and run the tests, they may fail with: NoMethodError: undefined method `expects' for ... etc etc over and over again (along with a few other errors).

Install the mocha gem via rubygems to get rid of a lot of these errors.

It was the line NoMethodError: undefined method `stub' for ... that clued me in.

It seems that rails requires a short list of gems/packages (beyond rails itself) simply to run its own tests... yet there is no rake gem:install task available to help you figure out which ones... and they aren't in the "contributing to rails" guide. I'll be submitting a patch shortly...

Following on from this I got:

MissingSourceFile: no such file to load -- openssl

and farther down the list:

LoadError: no such file to load -- net/https

Unfortunately, these errors occur when Ruby hasn't been installed with openssl-support. If you're running ubuntu, you can just run apt-get install libopenssl-ruby.

Monday 27 July 2009

40% speedup using Nokogiri!

Cut to the chase...

To cut your XML-processing time dramatically, sudo gem install nokogiri then add the following to config/environment.rb inside the Rails initializer.

config.gem "nokogiri"
ActiveSupport::XmlMini.backend='Nokogiri'

Back-story and pretty pics

The problem

So, our site makes heavy use of ActiveResource [1], meaning that most of our data is located remotely.

Not surprisingly, some of our pages are *really* slow, so I landed the task of speeding them up. Apart from page-caching (not possible), fragment caching (only helps on the *second* hit), or some complicated messy idea of data-caching locally (tedious and likely to be evil); my first thought was to reduce the number of network hits. Clearly that's a high pain point, especially on our heavy pages that have many resource fetches.

Before I dove into performance hacks and updating the business logic into twisty little data reuse-patterns for network-hit reduction... I decided to actually try profiling.

I've been setting up a ruby-prof and kcachegrind recently[2]... and figured I should at least give that a look-at to see if my assumptions are correct.

I'm really glad I did, because when I ran it over our heaviest action, I saw that all the highest-weight method-calls led back to some form of ReXML parsing.

Searching on the ReXML components showed that the heaviest ReXML method took up a whopping 1 million process cycles. When our total process-cycles came to 5.8 million - that's a significant chunk of time spent in that one library.

As I mentioned - our site makes heavy use of ActiveResource, and one *big* problem with ActiveResource is that all your objects are parsed and re-parsed as xml for every fetch of data... so, in hindsight, it's fairly obvious that our site would spend a *lot* of time in the XML-parsing library. Any speedup in that department would help us immensely.

The solution?

We've recently been to Rails underground, and one of the lectures[3] had a slide comparing the speed of ReXML to several other ruby XML-parsing libraries[4]. Nokogiri came out as a clear winner in the speed department. The loser was equally clear... that being the Rails-default: ReXML

So, switching out the library would be an obvious speed win.

As it turns out - it's really easy to do this. Just install the gem, and require it in your Rails initializer using the instructions at the top of this post

But did it really help?

It seemed faster... but can we prove it?

From ReXML to Nokogiri - 40% speedup

Yup.


Notes

[1] through the HyperactiveResource plugin
[2]I'll be giving a talk at LRUG on 12/08/2009 on how to use and interpret ruby-prof and kcachegrind
[3]During the talk by Maik Schmidt on Sneaking Ruby & Rails Into Big Companies
[4] I'm not sure, but it's possibly the one from this page comparing Ruby-XML performance benchmarks

Sunday 26 July 2009

Rails underground

Just a quick post to say I attended the Rails underground conference and it rocked!

There were some fantastic talks by a wide range of speakers. My favourite being Jim Weirich's talk on the search for the Grand Unified Theory of Software Development. But I also enjoyed Yehuda's keynote talk on the future of Rails 3, Obie Fernandez's talk on setting up a rails business, and the panel discussion with all of the above, and a video-feed from DHH.

I also met some great people, including Desi McAdam, who introduced me to dev-chix, Geoffrey Grosenbach who chatted to me about my plans for my HyRes plugin and Yehuda Katz who I also chatted with about my plugin - and how to go about bringing it into Rails 3.

Friday 17 July 2009

Adding prompt to select_tag

Rails's select method is pretty advanced. You can pass in all sorts of funky options to format your selector nicely, including :include_blank => true and the related :prompt => 'Pick one or else!'

select_tag, however, seems to have fallen behind in the usefulness-stakes and implements neither of the above... even though it does an extremely similar thing.

So here's a quick-hack function you can drop into ApplicationHelper to implement that functionality for you.

  # override select_tag to allow the ":include_blank => true" and ":prompt => 'whatever'" options
  include ActionView::Helpers::FormTagHelper
  alias_method :orig_select_tag, :select_tag
  def select_tag(name, select_options, options = {}, html_options = {})
    # remove the options that select_tag doesn't currently recognise
    include_blank = options.has_key?(:include_blank) && options.delete(:include_blank)
    prompt = options.has_key?(:prompt) && options.delete(:prompt)
    # if we didn't pass either - continue on as before
    return orig_select_tag(name, select_options, options.merge(html_options)) unless include_blank || prompt

    # otherwise, add them in ourselves
    prompt_option  = "<option value=\"\">" # to make sure it shows up as nil
    prompt_option += (prompt ? prompt.to_s : "") + "</option>"
    new_select_options = prompt_option + select_options
    orig_select_tag(name, new_select_options, options.merge(html_options))
  end

Wednesday 15 July 2009

Gotcha: UTC vs Local TimeZones with ActiveResource

So... your database is filled with datetime data and it's all configured to localtime, not UTC... We also have this you-beat nifty ability to set all our datetime-handling functionality to a given timezone by setting, say: config.time_zone = 'London' in config/environments.rb... or do we?

If you also use ActiveResource (or the new, actually-working HyperactiveResource), you'll find that suddenly you're getting a UTC-local timezone issue once more.

The problem is that the xml that comes back from a remote API is converted into a Date/DateTime using the core-extension to the Hash.from_xml method... which has the following LOC:

"datetime"     => Proc.new  { |time|    ::Time.parse(time).utc rescue ::DateTime.parse(time).utc }

The fix

You need to do two things. Firstly. Hack that line[1] and replace it with:

"datetime"     => Proc.new  { |time|    ::Time.zone.parse(time) rescue ::DateTime.zone.parse(time) },

Secondly... somehow it doesn't pick up the timezone even though it's been helpfully added in via the config... so you need to open up config/environments.rb (or create a rails initializer) and put:
Time.zone = 'London'[2]
in there (outside the rails initialization block).

Notes:
[1]To hack rails, you can either
a) hack on your own rails gem = risky... will be overwritten the next time you sudo gem update or
b) rake rails:freeze:edge - which means you have your rails in your own vendor/rails directory... but means you have to rake rails:update manually... up to you which you hate more.

[2]Obviously substituting your own timeZone as appropriate here. See the TimeZone doc for what you can pass in.

Thursday 11 June 2009

On the premature death of Array.to_a

hmm... after finally discovering that Array.to_a doesn't double-array now I'm getting warnings that Array.to_a is being obsoleted (with the unhelpful message: warning: default `to_a' will be obsolete). Aunty google tells me I should now use Array(thing)!

Can I just say (from a purely aesthetic POV):
yuk!

When you use to_a you can just read your code left-to-right eg:

Thing.do_stuff(args).to_a.do_something_else

You get a clear idea of what order your messages are passed in.
but having to use a stonking great Array() around one section means you have to jump back to the left again!

Array(Thing.do_stuff(args)).do_something_else

You have to read left-right-left-right :P
It's not as pretty.

So why exactly are they obsoleting Array.to_a ???

Tuesday 9 June 2009

acts_as_good_style

Originally published in 2007, this article is updated on a semi-regular basis and republished when I add something new...

Rails Idiom

There are a number of small pieces of rails idiom that contribute to better rails code. I've been collecting them at work as they come to me (often by reading other people's code). I know that a lot of this can be gleaned simply by reading the "Agile rails" book... but apparently this would actually take effort, and thus doesn't occur as frequently as I'd like. A lot more is just basic good common sense... but we all know how common that is. In any case, I'll probably put this up on my website soon and keep adding to it.

Note - I don't always have a justification for these things... as with everything - it's up to you to pick out the bits that work for you.

Basic good sense

Clarify your code

This is something that I learned from Joel Spolsky from his article on making wrong code look wrong. He suggests that code smells should stand out a mile off. Doing so will make it much less likely that we'll miss one and leave a hole for a bug to form in.

He further suggests that we should change the way we code to systematically remove all habits (even benign ones) that muddy the waters. If we make it really obvious what is right and what is potentially wrong - then it'll be easier for us to discriminate between the right and wrong bits of the code. This follows from basic psychology. By analogy: it's easier to notice black when all you can see is black or white - but adding many shades of grey makes it harder to detect. If we make a habit of writing only white (or pale-grey)... it's harder for us to miss a black spot.

Joel goes on to discuss a technique for using Hungarian Notation (in the way it was orignally intended) to increase visbility of safe/unsafe strings - which is meant to be just an example of one technique. (So don't take it as the main essence of his message).

Many of the things I bring up in this Idiom-Guide are techniques that draw from this philosophy.

Don't comment-out dead code

Don't comment out code that's no longer needed - just delete it. It'll still be there in the SVN repository if we ever need it again.
Dead code just clutters up the project and slowly makes it less readable/maintainable. :(

Avoid Magic global variables where at all possible

The "don't use magic globals" argument has been around for a while, but here's the rails take on it.

The Rails equivalent of a magic global variable is the "@variable" (though I've also seen some nasty uses of the params hash). I know that instance variables are essential for passing data from the controller to the view. But they should not be relied upon for anything else! eg do not set them and assume they exist when calling methods on a model - or in a library.

From a given piece of code that uses said variable, you cannot tell where the variable originates. If it were simply passed as a parameter to the method, there would be an obvious line of command. Instead, you have to trawl through all the earlier functionality to figure it out or resort to a recursive grep on the rails directory.

This becomes a problem if you have to maintain the code. Given a magic global variable: Can I delete the value? Can I change it without affecting later code (that relies magically on its value)? If I run across a bug whereby it comes in with a value I didn't expect - how do I find which method set the value incorrectly?

On balance - it's much better to simply pass an explicit parameter through the usual method-call interface, so at least you have a chance of tracing the line of command in future.

Don't add ticket/bug comments into the code

I don't wanna have to wade through hideous piles of comments referencing some obscure (fixed) ticket/bug such as:

# DRN: ticket ARC256 -- remove currency symbol
# end ticket ARC256

It wastes time and eyeball-space. This stuff can conveniently go into the SVN log message - where it'll be picked up by the appropriate ticket in the "subversion comments" section. This way it'll only get read by people who actually care about the ticket/bug in question.

Alternatively - what we get is a buildup of comments referencing random tickets that are probably no longer relevant. Especially because the ticket numbers mean nothing when taken outside the context of our ticketing system.
Think about it this way: how long would we keep that comment in the code? when does it get removed? whose job is it to remove the ticket comment or do they all just build up until we have more ticket-comments than code? How unmaintainable!

Please just don't do it.


Ruby idiom

Don't use .nil? unless you need to

There are very few times when you are specifically checking for actual nil results. Mostly you are just checking to see if an object has been populated, in which case use: if foo don't bother with if !foo.nil?
This also brings me to:

Use unless where appropriate

Don't check if !foo instead use: unless foo

This is just the ruby-ish way to do things, and it reads better in many circumstances... given that code is meant to be read by humans, this is a Good Thing.

Do not use and/or - instead use &&/||

The words "and" and "or" mean something quite different to the symbols: &&/||
Unless you understand *exactly* what you are doing - you should use the latter (&& / ||) instead.

Any expression that contains and/or is almost certainly inelegent and should probably be rewritten.

do-end vs {} blocks

{} should be used for single-line blocks and do-end for multi-line blocks... this is a "taste" thing - but it seems to be fairly common practice now.

Use single-line conditionals where possible

if you're only doing one thing based on a conditional, use: foo_something if x
rather than:

if x
   foo_something
end

Obviously this doesn't work if you're doing more than one thing or when you have an else-clause, but it's useful for reducing space eg:
return render(:action => :edit) unless @successful
which also brings me to:

Nested function calls need parentheses

In a nested function call eg :
return render :action => :edit
Rails can make a good guess at whether :action => :edit is the second parameter of the "return" function, or the first parameter of the "render" function... but it's not wise to rely on this and it will generate warnings. It's much better to parenthesise parameters to any nested function eg:
return render(:action => :edit)

However:

Don't parenthesise where unnecessary

When parentheses are not required for the sake of clarity, leave them out.
do_something foo, :bar => :baz


Basic Rails security

h( x-site-scripting )

In your view templates, surround all user-input data with the "h" function. This html-encodes anything within it - which will prevent cross-site scripting attack.

Do this even if the source is trusted!

If you get into the habit of making it part of the <%=h opening, it will just become natural. An example:

  <p> Name: <%=h @object.name -%>
  <br />Description: <%=h @object.descr -%>
  <br /><%= image_tag image_url_for(h @object.image.file_name) -%> 
  <br /><%= link_to "Edit object", edit_object_path(@object) -%>
  </p>

:conditions => ['protect_from = ?', sql_injection]

Most SQL queries are built up by Rails automatically. But adding conditions to your finds or your paginators brings in a slight, but real possibility of introducing sql-injection attacks.
To combat this, always pass everything in as variables. eg:

:conditions => ['name like ? AND manager_id = ? OR is_admin = TRUE', 
@user.name, @user.id]

Generators, Plugins and Gems... oh my!

Generators

use scaffold_resource

When you create a new model, generate a scaffold_resource, even if you aren't going to use the view pages.
It generates wonderful tests as well as your model and migration for free.

It's worth it!

It's also worth it to go through the hassle of entering all of the attributes on the command-line. Again because it generates tests, fixtures, migrations, views etc for you.
For usage: script/generate scaffold_resource --help

-x and -c

If you pass the "-c" flag to a generator (script/generate whatever), it will automatically svn add the generated files. This isn't necessary, but it's a really useful thing so you don't have to hand-add every file individually.

There is a similar flag (-x) for when you install plugins. This will automatically add the plugin to the project via svn:externals.

Plugins

Don't use a plugin unless it does exactly what you want

It's so easy to write functionality in Rails that there's no point in using somebody else's plugin unless it does *exactly* what you want it to do.
Especially don't bother using a plugin if you have to hack it to pieces and pull out half the useful functionality. You are just creating a maintenance nightmare in your future.

That being said, if a plugin has half the functionality you need... it's a great place to learn the bits you need to use.


DB and migrations

Use migrations

But you weren't going to hand-hack the db anyway were you?
If you don't use migrations, then capistrano can't deploy the changes automatically. That only means more work for you - and means we can't auto roll-back if something horrible goes wrong.

Don't go back and edit old migrations!

I now have a policy that states "migrations only go forwards". This comes from several bad experiences where a developer (sometimes me) has gone back to edit an old migration and totally locked up the system for everybody else.

You may know that a deployment requires you to "migrate down then back up again". But sometimes that either gets forgotten, or is actually harder to do in practice than in theory eg if you're using capistrano to auto-deploy your site, it's harder to get it to do twisty migrations than just to "move forwards one".

Besides, which, it's a hassle for you to tell everyone on the dev team "before you do an svn up, migrate down to VERSION X, then svn up, then migrate back up again"... what if somebody doesn't want to lose their existing data? what if they're an svn-junkie and have already done an svn up before you get to them? what if somebody is not in today and you have to catch them before they come in tomorrow?

It's way too easy to get your bodgied-up back-hacked migrations into a wedged state where a new person cannot back up or go forward, and often cannot even start from scratch (eg from a fresh copy of a database). This becomes a serious problem for you if you ever have to change servers (or set up a new testing server or whatever) and have to run all the migrations from 0 to get the db ready.

Even if it all runs as you expect, it's three extra steps to remember than just "rake db:migrate", which raises the spectre of Human Error.

It's annoying to make it work, prone to error and bloody awful to fix when it breaks. All in all much easier just to write a new migration to change the existing stuff.

Extra migrations don't actually hurt you - they really don't take up that much more space in your SVN repository - they're worth having them around for the knowledge that anyone, anywhere will have a completely up-to-date system even if they start from complete scratch.

Add messages to data migrations

All of the basic ActiveRecord migrations tend to spit out their own status messages. Data migrations don't, and it's often useful to know what your migration is doing - especially if you've got a long data migration that would otherwise sit there in silence, looking like it's hung your machine.
So drop in a few status messages eg say "migrated #{orders.size} orders, moved widgets over to new wodget style"

Don't add excessive status messages to migrations

Obviously the above messaging can go too far. Don't write out a line for every piece of data you migrate or you'll get heaps of scrolling text rather than an idea of which migration we're up to - that useful data can get lost in the noise. Just a status message showing how much has been done or what stage we're up to in the migration will do.

Use "say" in migrations

Instead of using "p" use "say" - it gets formatted nicely.


MVC-specific idiom

If you are unsure about what MVC is all about - I suggest you do a bit of basic research on the subject. It's really important to keeping the code clean in a Rails site. Rails is completely founded onthe principle of keeping the MVC layers separate - if you begin tying them together, your run the risk of causing the code to become brittle - losing the flexibility that is the reason why you're using Rails in the first place.

Basic MVC principles

Keep view code out of the model

Your model should be independant of your views. A widget doesn't need to know whether it's being displayed as html, XML or via a screen-reader. It is an independantly-addressable object that contains "data and some knowledge of how to maniupulate that data".

Despite this I consistently see things like methods named "show_edit_link" which apparently tell the html whether this model's edit link should be visible on the html list page. That sort of logic should be in the view only. The model can have methods that, say, determine if they should be "editable" by the current user... and the code that determines if the edit link is displayed could check that method, but should remain inside the view.

If there is common code in your views - then it can be pulled into the helper file for that model. If it's used across multiple models, then it should go in application_helper.rb. It should never appear in the model.

The other common booboo I see are "display" functions that include html (eg "format field X to display nicely"). This is also very wrong! It means that, later, when you want to change html styling across the entire site you have to dig around inside every file in the site to change it... rather than simply changing the views - which is How It Should Be. From having to do just such a nightmareish site-upgrade, I can tell you that the fewer files you need to change - the easier your life will be in the long run.

All view code should remain in the views - with commonality pulled out into helpers.

Data manipulation code should go in the models - not views or controllers

Ok, so you have a report that you want to display that pulls data out of your widget... and your report view pulls it out bit by bit and does some calculations on it - then displays it.

So what happens when your client asks you to now do that report in PDF format... or CSV as well?

If all your data-manipulation methods are on the model itself, then you only need to pass the model object into the view and the view can then pull the data out in the same way for each view style . The alternative is to have a messy set of views/controllers where the same code is duplicated, violating the DRY principle and adding to your future maintenance headaches.

All data-manipulation code should remain on the model that owns the data.

Model idiom

Group relationship commands at top of model

One of the most important and useful things to know about a model is what it is related to - eg the "belongs_to"/"has_many" (etc) relationships.
If these calls get scattered throughout the model object, they will get lost amongst the noise and it will be hard to tell whather a given relationship has been declared...
Thus it's Good Practice to group them all together - and to put them right at the "top" of the class. eg:

class Widget < ActiveRecord::Base
  belongs_to :widget_class
  has_many :widget_parts
  has_one :pricing_plan

  #... rest of the model goes here
end

IMO this goes also for composition objects, "acts_as_<whatever>" declarations and auto_scope declarations - all of which declare the "shape" of the model and its relationships with other objects.

Group validations

The second-most important group of information about a model is what consists of valid data for the model. This will determine whether or not the data is accepted or rejected by ActiveRecord - so it's useful to know where to find it. So again, do not scatter it all about the model - put it near the top... preferably just below the "relationship" data. EG:

class Widget < ActiveRecord::Base
  belongs_to :widget_class
  has_many :widget_parts
  has_one :pricing_plan

  validates_presence_of :widget_class, :pricing_plan
  validates_numericality_of :catalogue_number, :integer_only => true
  def validate
     errors.add_to_base "should have at least one part" if widget_parts.blank?
  end

  #... rest of the model goes here
end

Validations should obey the DRY principle...

Please save my eyeballs from having to read the same thing over and over, and obey the DRY principle. You can write:

# please use:
validates_presence_of :field_one, :field_two, :field_three, :field_four
# instead of:
validates_presence_of :field_one
validates_presence_of :field_two
validates_presence_of :field_three
validates_presence_of :field_four

You can also use "validates_each" if you have several things that perform ostensibly the same validation... but it isn't an included one. eg:

# please use:
validates_each [:field_one, :field_two, 
                :field_three, :field_four]  do |model, attr, val|
  model.errors.add(attr, "should be a positive number") unless val.nil? || 0 <= val
end
# instead of:
def validate
  errors.add(:field_one, " should be a positive number")  unless field_one.nil? || 0 <= field_one
  errors.add(:field_two, " should be a positive number")  unless field_two? || 0 <= field_two
  errors.add(:field_three, " should be a positive number")  unless field_three? || 0 <= field_three
  errors.add(:field_four, " should be a positive number")  unless field_four? || 0 <= field_four
end

Read the section on validations in the book!!!

There are lots of really useful standard validations available in Rails... PLEASE check for whether one exists before writing your own hack into "def validate"... you might be surprised at how many are provided absolutely free of charge!
One of the most useful being "validates_each"... so you don't have to repeat the same thing over and over and over... remember that DRY principle?[1]

Another is "validates_format_of" which allows you to perform several validations at once - and give a consistant error message eg checking if a percentage falls within the range (0-100) with an optional percentage sign:[2]

  validates_inclusion_of [:my_percentage, :your_percentage], :in => 0..100, 
                       :message => "should be a percentage (0-100)"
  validates_format_of [:my_percentage, :your_percentage], :with => /^(\d{1,3})%?/, 
                       :message => "should be a percentage (0-100)"

If a validation doesn't refer to a single field... put it on base.

When it refers to more than one field, feel free to use

errors.add_to_base <whatever>
don't do something silly like: errors.add(nil, ...)
That being said - if you can squeeze it onto a field, please do - as it really helps the user to quickly find where the problem is in their form.

Give the poor users some useful feedback when validating

Don't tell them "Your weights should add up to 100"... tell them "your weights add up to 82%, your weights need to add up to 100% to be valid". It actually helps them see exactly what's missing.

When you update your validations, update your fixture data

This one has got me a couple of times. The problem is that fixtures are loaded into the database without going through validations - so your test data could be wrong - even when rake test passes flawlessly :(

Name your callbacks and declare them as handlers

If you have a number of callbacks (eg before_save/after_create hooks), it's better if you name them as a method, and just refer to them near the top of the model. eg:

# instead of:
class Widget < ActiveRecord::Base
  belongs_to :widget_class
  has_many :widget_parts
  has_one :pricing_plan

  validates_presence_of :widget_class, :pricing_plan
  def before_save
     # adds default pricing if none was specified
     pricing_plan ||= PricingPlan.find_default
  end
  #rest of model here
end
# try using:
class Widget < ActiveRecord::Base
  belongs_to :widget_class
  has_many :widget_parts
  has_one :pricing_plan

  validates_presence_of :widget_class, :pricing_plan
  before_save :add_default_pricing

  # handler to add default pricing if none specified
  def add_default_pricing
     pricing_plan ||= PricingPlan.find_default
  end
  #rest of model here
end

This example is too simple to easily show it, but there are a number of reasons why this is a good idea.

  • Firstly, you've named the method, which gives you an idea of what it's for = easier to maintain.
  • Secondly, if you need to add multiple of the same hook, you don't need to mash them all into the same callback method - which also aids readability.
  • Thirdly, by keeping each call as a single line, you can list them all near the top of the model class without taking up heaps of room (especially annoying in the case that the methods start getting really long). This helps you to quickly see what is and is not being done, without having to read through the details of the methods... or, even worse, seeing the first "after_create" hook and thinking it is the only one, when another is hidden later in the class.
  • Fourthly, if the method ever gets more complicated, or has to change, your method is encapsulated - the call to the method is independant of any other callback handlers of the same type.

Use Rails helpers for simple find calls

Rails comes with lots of helpful convenience functions. Read the book and learn about them! The find helpers are a good example. Something like the following is an excellent candidate.

  # instead of
  thing = Widget.find(:first, :conditions => "name='wodget'")
  # use the find_by_<whatever> method
  thing = Widget.find_by_name('wodget')

It shortens the code and makes it clear exactly what you're trying to find.

View idiom

Use -%>

Make sure you put a minus sign before closing your %>. This reduces whitespace in the resulting html - which makes it easier for us to read when we need to diagnose a display problem. eg use

<%= render :partial => :foo -%>
<% if some_condition -%>
   <div id="bar">
       <%= bar_fields -%>
  </div>
<% end -%>

h( x-site-scripting )

See this topic in the RailsSecurity section under XSiteScripting

Apply the DRY principle

Templates can have common code too - even across resource-boundaries.
Eg pagination links, or a common header/footer. Common view code should be factored out into a partial template and can be shared by a single resource-type, or across multiple resources.

Shared partials go in /app/views/shared

If a partial is to be used in a common layout, or across multiple resources (eg a common header for all resources reached after login), then the partial should be stored under the "/app/views/shared" directory.

Suppose you have a partial named "_common_footer.rhtml" in the shared directory. You can access it by referring to the partial thus:

<%= render :partial => 'shared/common_footer' -%>

Don't write a string inside a code snippet

This is silly: <%= " | " -%>
In a template, you can just type the pipe (or whatever) without the extra FrouFrou around it.

Controller idiom

Never, never, never use @params or @flash

These were deprecated some time ago. Instead use the similarly named methods: params or flash
Incidentally - you should be reading and taking heed of the warnings that the script/server is giving you about these things...

redirect when moving on

When you are moving on to another stage in the process, you should use redirect. eg you have successfully updated a page and are now moving to the index page.
When you are coming back to the same page, you can use render (eg the user entered an error and you are redisplaying the 'edit' page).

The reason we do this is that a redirect will change the URL in the address bar to the new page. If you just render a different page-action, the user will still have the URL of the previous page, even though they are now looking at a different page. This is Bad because if the previous page was, say, a form-POST... and the user clicks "refresh"... Bad Things will happen. Whereas if you redirect, then the user will have the URL of the new page and it's perfectly fine for them to hit refresh to get the page they're on.

Make private methods obvious!

Private controller methods aren't visible as actions. It can be a problem if somebody doesn't notice your little word "private" and starts to add methods below it without realising - and then wonders why Rails can't find their shiny new action.

So make sure that it's obvious and difficult to make the mistake.

Firstly, put private methods only at the bottom of the file!
Why bother going through spaghetti code with continual calls to private then public then private again... etc and having to figure out exactly what level of visibility we are right now? sort them and put the private ones all together at the bottom.

Secondly, use a line like this:

    private ###################################################################################

To make it painfully obvious to all who follow that below this line thar be dragons...


Testing idiom

All testing

Learn your basic assertions

Another "please read the book" request - there are lots of useful assertions already written for you. Using the appropriate ones makes your life easier as the error messages are more appropriate/meaningful for your test case. I especially find it irritating to see things like assert foo == bar when assert_equal not only has a better error message, but also means you're unlikely ever to accidentally use only one '=' - causing the test to silently pass when it should have failed.

Put the expected value first in assert_equals

The error message reads better that way.

Don't cram multiple tests into a single test case

Do not write generic tests with a dozen things to test all in a single test case.
Example shown below. Note - the real code example I took this from actually had another 6 tests crammed into it...

There are (at least) two reasons for not doing this:

Firstly - if the first test here breaks... it doesn't keep running the other tests. If it breaks, you can only tell that the first test isn't working... if you split them out into individual tests, rake will keep running the other tests - this is extremely helpful as you will then know whether all the validations are failing - or just that one.

Secondly - naming tests to mean what they're testing actually helps you find out what's going on when it breaks. A generic "test all sorts of stuff" name doesn't help you much, but a "test that widgets can't be created if the size is negative" does. You won't be thinking about this when you're writing that test case (and are fully immersed in the code you're testing), but six months later when something breaks your test... you will want to know what it's actually breaking and every bit of help (no matter how small) counts.

# don't use:
  def test_create_should_not_accept_invalid_parameters
    assert_no_difference Widget, :count do
        wid = create_widget(:vat_percentage => -1)
        assert wid.errors.on(:vat_percentage)
    end

    # invalid price range
    assert_no_difference Widget, :count do
        wid = create_widget(:price_range_min => -1)
        assert wid.errors.on(:price_range_min)
    end
    assert_no_difference Widget, :count do
        wid = create_widget(:price_range_max => -1)
        assert wid.errors.on(:price_range_max)
    end
    assert_no_difference Widget, :count do
        wid = create_widget(:price_range_max => 100, :price_range_min=>200)
        assert wid.errors.on(:price_range_max)
    end

    # invalid size range
    assert_no_difference Widget, :count do
        wid = create_widget(:size_min => -1)
        assert wid.errors.on(:size_min)
    end
    assert_no_difference Widget, :count do
        wid = create_widget(:size_max => -1)
        assert wid.errors.on(:size_max)
    end
    assert_no_difference Widget, :count do
        wid = create_widget(:size_max => 100, :size_min=>200)
        assert wid.errors.on(:size_max)
    end
  end
# instead try:
  def test_create_should_fail_on_negative_percentage
    assert_no_difference Widget, :count do
        wid = create_widget(:vat_percentage => -1)
        assert wid.errors.on(:vat_percentage)
    end
  end
  def test_create_should_fail_on_negative_min_price
    assert_no_difference Widget, :count do
        wid = create_widget(:price_range_min => -1)
        assert wid.errors.on(:price_range_min)
    end
  end
  def test_create_should_fail_on_negative_max_price
    assert_no_difference Widget, :count do
        wid = create_widget(:price_range_max => -1)
        assert wid.errors.on(:price_range_max)
    end
  end
  def test_create_should_fail_on_backwards_price_range
    assert_no_difference Widget, :count do
        wid = create_widget(:price_range_max => 100, :price_range_min=>200)
        assert wid.errors.on(:price_range_max)
    end
  end
  def test_create_should_fail_on_negative_min_size
    assert_no_difference Widget, :count do
        wid = create_widget(:size_min => -1)
        assert wid.errors.on(:size_min)
    end
  end
  def test_create_should_fail_on_negative_max_size
    assert_no_difference Widget, :count do
        wid = create_widget(:size_max => -1)
        assert wid.errors.on(:size_max)
    end
  end
  def test_create_should_fail_on_backwards_size_range
    assert_no_difference Widget, :count do
        wid = create_widget(:size_max => 100, :size_min=>200)
        assert wid.errors.on(:size_max)
    end
  end

Group tests in a meaningful way

Test files start out with only a handful of tests... but this rapidly changes. Very soon we get vast numbers of tests covering all sorts of actions.
To reduce the potential maintenance nightmare, group related tests together and put in a comment to indicate why they are grouped together. Any new test should be either put with related groupings or a new grouping created for it.

As an example: the most common set of tests revolves around resource CRUD - in which case it makes sense to group tests with related CRUD actions (eg "edit" and "update") near one another.
Another example is for the "user" object. users go through stages (process of creating a new user, the activation process, CRUD for an ordinary/active user, then destroying/reviving a user), so tests in these stages are grouped together.

If you're using rspec or shoulda, you are provided with a natural grouping mechanism (describe and context respectively). Use that to your advantage and make your future life easier.

Use named fixture records

Don't use bare fixture ids in your tests (eg get :edit :id => 1), instead use named references (eg @quentin.id).

This is a bit of a gotcha as the automatically-generated tests often use bare ids. However, these should be avoided where at all possible.

There are three main benefits:
Firstly, it makes the code more readable, with fewer magic numbers and more meaningful variable names.
Secondly, if we must reorder the fixture records (and thus change the ids) - at least we know which record we really wanted and we don't need to scour the tests looking for the ids to change.
Thirdly - if you're using auto-generated ids (rather than specifically passing them in) you won't know the id - and assuming that it's 1 sets you up to fail.

Similarly, don't use magic strings - especially for things like testing login/authentication. This will eventually bite you. It's much better to specify @quentin.login, @quentin.email (or whatever) so you don't have to go through your tests updating at a later time.

Don't forget to reload

The number of times I've been caught out by this... when you change something and are testing that it got saved into the database correctly, make sure you use @foo.reload. It's especially important if you're checking that you get the right validation errors on your ActiveRecord... you then want to test that it *didn't* change in the database too.

Common functions can be made into methods

If you find yourself performing the same action over and over, don't forget that you can DRY it up by putting it into a common function at the bottom of the test file.

  def setup
    @default_widget_hash = {:name => 'My widget', :size => 42, :wodget => wodgets(:my_wodget)}
  end
  def test_should_create_widget
    assert_difference Widget, :count do
      w = create_widget
      assert !w.new_record?, "should not have errors: #{w.errors.full_messages.to_sentence}"
    end
  end
  def test_widget_should_require_name_on_create
    assert_no_difference Widget, :count do
      w = create_widget(:name => nil)
      assert w.errors.on(:name), "Should have required name"
    end
  end
  def create_widget(options = {})
    Widget.create(@default_widget_hash.merge(options))
  end

Really common tests can be made into assertions

If you find yourself testing the same thing in multiple unit/functional tests, you can DRY that up too it up by putting it into the test helper as a new assertion. Don't forget to name it "assert_X" (where X is descriptive).

Make sure you name it starting with "assert_". This convention allows you to see which methods will have assertions in them (and thus possibly cause your test to fail), as opposed to general methods that simply set-up/tear-down information.

  # Assert that the handed object (thing) has no errors on it.
  def assert_no_errors(thing, msg = "")
    fail("should have passed an object") unless thing
    e = thing.errors.full_messages
    assert e.empty?, "Should not have error: #{e.to_sentence} #{msg}"
  end

Unit tests

Test your CRUD actions

It may seem like overkill to test CRUD here *and* in functional tests, but this is the place to test validations and any things that will affect basic CRUD-actions (eg if you have a method that stops your Widget from being deleted after it's been submitted for approval... then you need to try deleting a submitted one and make sure it fails!

If you only leave it up to your controller tests to check this stuff, then you are getting your data back filtered through one more layer of abstraction and will have to sort out whether it's a model-bug or a controller-bug. It's better to be sure of your model before adding an extra layer on top. It really makes your life easier in the long run!

Sadly scaffold_resource no longer writes these tests for you. :P

Test your validations

Each validation should be tested by trying to create your model object with that one set to nil. Trust me, it's worth it in the long run!

Functional tests

Test your validations

Create a functional test for each validation - again, try to create an object, nulling out the attribute-under-test. I know this seems to double-up on the unit tests - but what you're testing here is that the model is correctly getting back an error on the required attribute. This will be what is displayed on the user's screen - so they can fix the error before moving on. You also need to test that the template is still on the "edit" page. If it's not, then all the ActiveRecord Errors will be in vain as your user will never get to see them.

Test authentication-levels

If you have different levels of authentication (eg administrators get to see everyone's stuff, but you can only see your own stuff) then you must test both ways... test that you can see what you should see, but also test that you can't see stuff that you shouldn't see! You don't want your users peeking into your admin accounts just because you missed the "unhappy path" tests.
...and don't forget to test what happens when you're not logged on too.

Tests need more than just "assert_response :success"

Unfortunately, "assert_response :success" only checks that *something* happened - not that it was the right something.

To be more precise, "assert_response :success" asserts that rails successfully returned an "HTTP 200" response (as opposed to a 404 or a redirect). Unfortunately it does not check that you received the response you were expecting... (though I wish it were that simple) - sadly you do have to actually check that it not only returned a response - but that it was the correct response with the correct variables etc.

In the case of a "get" request - you should check that we are rendering the correct template, and that the "assigns" variables have been correctly populated.
In the case of a request that does something - check all the variables afterwards to make sure they changed in the way that you expected (eg that the variables actually were updated).


Notes

[1] Why is it that the DRY principle is the one I seem to have to harp on the most? Repeated endlessly reaching a pinnacle of irony.

[2] check my earlier post on "validates_percentage_of" for this example in context.