Wednesday, 29 July 2009

CSV views with FasterCSV and csv_builder

We were asked to add a "Download as CSV" link to the top of each of our reporting actions. We already had a "revenue_report" controller-action with it's own html view template... and just wanted to add something similar in CSV.

FasterCSV seemed to provide some great CSV-generation functionality... but it builds the csv and spits it out right there in the controller.

uuuuugly!

We shouldn't put view-generation code into the controllers - that splits up our nicely-decoupled MVC stack!

I've also seen some ideas about putting a to_csv method into an individual model. Now, if I'm leery about putting view-generation in the controller... I'm even less impressed by putting it into the model! But I can see the benefits of simplicity.

However - I'm still not sure it fits with our requirements. A @widget.to_csv call works just fine for the WidgetsController#show action... and probably even the WidgetsController#index action (where you can run @widgets.map &:to_csv or similar)... but most of our reports span multiple associations and the data is grouped, filtered, and summarised. Each of these sets needs headings explaining what it is and laying it out nicely in a way that makes sense to the user reading the CSV file. Putting that kind of visual-only functionality in your model is where it really starts to get ugly again. I am left with one big thought:

Why can't I just put my CSV-template into the views directory?

enter: csv_builder plugin

csv_builder comes with extremely minimal documentation... but it's not that hard to use. Here's a step-by-step with examples to follow:

  1. install fastercsv and csv_builder
  2. add a .csv option to the respond_to part of your action
  3. add your <actionname>.csv.csvbuilder template into your views directory
  4. Add a "Download as CSV" link for your user

Step 1: Install FasterCSV & csv_builder

sudo gem install fastercsv
./script/plugin install git://github.com/econsultancy/csv_builder.git

Don't forget to restart your server!

csv_builder has already set up a format-extension handler for csv so you don't need to add it to the mime-types yourself, just start using it in your actions.

Step 2: In your controller action:

csv_builder will work out of the box even if you just add the simplest possible respond_to statement:

  respond_to |format| do
    format.html # revenue_report.html.erb
    format.xml  { render :xml => @payments }
    format.csv  # revenue_report.csv.csvbuilder
  end

With nothing more than the above, csv_builder will grab out your template and render it in csv-format to the user.

You can tweak the FasterCSV options by setting some @-params in your action. The csv_builder gem README explains how to use those, but the most useful would be to set @filename to give your users a meaningful filename for their csv-file. You can also specify things like the file-encoding, but go look at the gem README to find out the details

Here's an example of a full controller action:

# GET /sites/1/revenue_report
# GET /sites/1/revenue_report.xml
# GET /sites/1/revenue_report.csv
def revenue_report
  @site = Site.find(params[:id])
  @payments = @site.revenue_report_payments
  respond_to do |format|
    format.html # revenue_report.html.erb
    format.xml  { render :xml => @payments }
    format.csv do
      # timestamping your files is a nice idea if the user runs this action more than once...
      timestamp = Time.now.strftime('%Y-%m-%d_%H:%M:%S')
      # passing a meaningful filename is a nice touch
      @filename = "revenue_report_for_site_#{@site.to_param}_#{timestamp}.csv"
    end # revenue_report.csv.csvbuilder
  end
end

Step 3: add a template into your views directory

The final step is to add a .csv.csvbuilder template to your views directory. This is just like adding an html.erb template for your action, except that the file will have the extension .csv.csvbuilder. AFAIK csv_builder can only support template that have the same name as your action, so in my example the template would be called revenue_report.csv.csvbuilder

Your view template is where you can stash away all the FasterCSV guff that will build your csv file. This works the same way as any other FasterCSV-generated content - but just make sure you generate it into a magic array that is named 'csv'. Here's an example:

  # header row
  csv << ["Revenue report for site: #{@site.name}"]
  csv << [] # gap between header and data for visibility

  # header-row for the data
  csv << ["Date", "", "Amt (ex tax)", "Amt (inc tax)"]

  row_data = [] # for scoping
  @payments.each do |payment|
    row_data = [payment.created_at.to_s(:short)]     
    row_data << "" # column gap for visibility
    # note you can use the usual view-helpers
    row_data << number_to_currency(payment.amount_ex_tax)
    row_data << number_to_currency(payment.amount_inc_tax)

    # and don't forget to add the row to the csv
    csv << row_data.dup # it breaks if you don't dup
  end # each date in date-range
  csv << [] # gap for visbility

  # and the totals-row at the very bottom
  totals_data = ["Totals", ""] # don't forget the column-gap
  totals_data << @payments.sum &:amount_ex_tax
  totals_data << @payments.sum &:amount_inc_tax
  csv << totals_data

Step 4: add a CSV link

Something like the above code will generate a nice plain csv file and spit it out at the user in the correct encoding... but now we need something to actually let the user know that they can do this.

This is pretty simple - it just requires adding the format to your usual path-links. eg:

<%= link_to 'Download Report as CSV', site_revenue_report_path(:id => @site.to_param, :format => :csv) -%>

Testing

Testing for non-html formats still seems pretty crude. You have to manually insert the "accepts" header in the request before firing off the actual request - then manually check the response content_type. It'd be nice if we could just add :format => :csv to the request... anyway, here's a sample shoulda-style test case:

  context "CSV revenue_report" do
    setup do
      @request.accept = 'text/csv'
      get :revenue_report, :id => @site.to_param
    end

    # variables common to all formats of revenue-report
    should_assign_to :site
    should_assign_to :payments

    should_respond_with :success
    should "respond with csv" do
      assert_equal 'text/csv', @response.content_type
    end
  end # context - CSV for revenue report

Gotchas:

Don't forget to add config.gem!"

I fond I needed config.gem 'fastercsv' and a require 'csv_builder' in my environment.rb. You may need a gem-listing for both (esp if you're using bundler)

Rails 3 compatitbility?

The original csv_builder plugin is not rails 3 compatible and is in fact no longer being supported. But it has officially changed hands, to the newer csv_builder gem. This is Rails-3 compatible and not backwards compatible - though he maintains a 2.3.X branch for people to submit bugs.

csvbuilder or csv_builder

While the plugin is named csv_builder, be careful to name your files with the extension: csv.csvbuilder or you'll spend hours pulling your hair out about a Missing template error while it fruitlessly searches for your csv.csvbuilder file!

All your report data must be generated in the controller!

This makes you more MVC-compliant anyway, but if you had any collections being fetched or formatted in the view... now's the time to move them into the controller action as your CSV-view will need them too.

duplicate inserted arrays

You'll notice that the example template has a set of data being added to the csv array... you'll also notice that each row is generated on the fly - then I save a *duplicated* version into the csv array. If you don't duplicate it... you may do funky things with overwriting the row each time. In my case I also needed to declare the temporary array outside the loop in order to preserve scope. YMMV - I'd appreciate any Ruby-guru answer as to why this doesn't work without that.

don't reset csv

Don't do this: csv = [] it breaks everything!

Rendering as html

I've noticed there's a pathalogical condition where everything is going well - and it's even getting o the template... but it's *still8 rendering html.

I eventually figured out that it's actually rendering the layout around the csv... and it occurs when you've explicitly stated a layout (eg layout 'admin') somewhere at the top of your controller. To fix it, you just need an empty render without layout false eg

  respond_to do |format|
    format.csv do
      timestamp = Time.now.strftime('%Y-%m-%d_%H:%M:%S')
      @filename = "revenue_report_for_site_#{@site.to_param}_#{timestamp}.csv"
      render :layout => false  # add this to override a specifically-stated layout
    end # revenue_report.csv.csvbuilder
  end

Tuesday, 28 July 2009

rails gotchas: undefined method `expects'

If you've just installed rails edge and run the tests, they may fail with: NoMethodError: undefined method `expects' for ... etc etc over and over again (along with a few other errors).

Install the mocha gem via rubygems to get rid of a lot of these errors.

It was the line NoMethodError: undefined method `stub' for ... that clued me in.

It seems that rails requires a short list of gems/packages (beyond rails itself) simply to run its own tests... yet there is no rake gem:install task available to help you figure out which ones... and they aren't in the "contributing to rails" guide. I'll be submitting a patch shortly...

Following on from this I got:

MissingSourceFile: no such file to load -- openssl

and farther down the list:

LoadError: no such file to load -- net/https

Unfortunately, these errors occur when Ruby hasn't been installed with openssl-support. If you're running ubuntu, you can just run apt-get install libopenssl-ruby.

Monday, 27 July 2009

40% speedup using Nokogiri!

Cut to the chase...

To cut your XML-processing time dramatically, sudo gem install nokogiri then add the following to config/environment.rb inside the Rails initializer.

config.gem "nokogiri"
ActiveSupport::XmlMini.backend='Nokogiri'

Back-story and pretty pics

The problem

So, our site makes heavy use of ActiveResource [1], meaning that most of our data is located remotely.

Not surprisingly, some of our pages are *really* slow, so I landed the task of speeding them up. Apart from page-caching (not possible), fragment caching (only helps on the *second* hit), or some complicated messy idea of data-caching locally (tedious and likely to be evil); my first thought was to reduce the number of network hits. Clearly that's a high pain point, especially on our heavy pages that have many resource fetches.

Before I dove into performance hacks and updating the business logic into twisty little data reuse-patterns for network-hit reduction... I decided to actually try profiling.

I've been setting up a ruby-prof and kcachegrind recently[2]... and figured I should at least give that a look-at to see if my assumptions are correct.

I'm really glad I did, because when I ran it over our heaviest action, I saw that all the highest-weight method-calls led back to some form of ReXML parsing.

Searching on the ReXML components showed that the heaviest ReXML method took up a whopping 1 million process cycles. When our total process-cycles came to 5.8 million - that's a significant chunk of time spent in that one library.

As I mentioned - our site makes heavy use of ActiveResource, and one *big* problem with ActiveResource is that all your objects are parsed and re-parsed as xml for every fetch of data... so, in hindsight, it's fairly obvious that our site would spend a *lot* of time in the XML-parsing library. Any speedup in that department would help us immensely.

The solution?

We've recently been to Rails underground, and one of the lectures[3] had a slide comparing the speed of ReXML to several other ruby XML-parsing libraries[4]. Nokogiri came out as a clear winner in the speed department. The loser was equally clear... that being the Rails-default: ReXML

So, switching out the library would be an obvious speed win.

As it turns out - it's really easy to do this. Just install the gem, and require it in your Rails initializer using the instructions at the top of this post

But did it really help?

It seemed faster... but can we prove it?

From ReXML to Nokogiri - 40% speedup

Yup.


Notes

[1] through the HyperactiveResource plugin
[2]I'll be giving a talk at LRUG on 12/08/2009 on how to use and interpret ruby-prof and kcachegrind
[3]During the talk by Maik Schmidt on Sneaking Ruby & Rails Into Big Companies
[4] I'm not sure, but it's possibly the one from this page comparing Ruby-XML performance benchmarks

Sunday, 26 July 2009

Rails underground

Just a quick post to say I attended the Rails underground conference and it rocked!

There were some fantastic talks by a wide range of speakers. My favourite being Jim Weirich's talk on the search for the Grand Unified Theory of Software Development. But I also enjoyed Yehuda's keynote talk on the future of Rails 3, Obie Fernandez's talk on setting up a rails business, and the panel discussion with all of the above, and a video-feed from DHH.

I also met some great people, including Desi McAdam, who introduced me to dev-chix, Geoffrey Grosenbach who chatted to me about my plans for my HyRes plugin and Yehuda Katz who I also chatted with about my plugin - and how to go about bringing it into Rails 3.

Friday, 17 July 2009

Adding prompt to select_tag

Rails's select method is pretty advanced. You can pass in all sorts of funky options to format your selector nicely, including :include_blank => true and the related :prompt => 'Pick one or else!'

select_tag, however, seems to have fallen behind in the usefulness-stakes and implements neither of the above... even though it does an extremely similar thing.

So here's a quick-hack function you can drop into ApplicationHelper to implement that functionality for you.

  # override select_tag to allow the ":include_blank => true" and ":prompt => 'whatever'" options
  include ActionView::Helpers::FormTagHelper
  alias_method :orig_select_tag, :select_tag
  def select_tag(name, select_options, options = {}, html_options = {})
    # remove the options that select_tag doesn't currently recognise
    include_blank = options.has_key?(:include_blank) && options.delete(:include_blank)
    prompt = options.has_key?(:prompt) && options.delete(:prompt)
    # if we didn't pass either - continue on as before
    return orig_select_tag(name, select_options, options.merge(html_options)) unless include_blank || prompt

    # otherwise, add them in ourselves
    prompt_option  = "<option value=\"\">" # to make sure it shows up as nil
    prompt_option += (prompt ? prompt.to_s : "") + "</option>"
    new_select_options = prompt_option + select_options
    orig_select_tag(name, new_select_options, options.merge(html_options))
  end

Wednesday, 15 July 2009

Gotcha: UTC vs Local TimeZones with ActiveResource

So... your database is filled with datetime data and it's all configured to localtime, not UTC... We also have this you-beat nifty ability to set all our datetime-handling functionality to a given timezone by setting, say: config.time_zone = 'London' in config/environments.rb... or do we?

If you also use ActiveResource (or the new, actually-working HyperactiveResource), you'll find that suddenly you're getting a UTC-local timezone issue once more.

The problem is that the xml that comes back from a remote API is converted into a Date/DateTime using the core-extension to the Hash.from_xml method... which has the following LOC:

"datetime"     => Proc.new  { |time|    ::Time.parse(time).utc rescue ::DateTime.parse(time).utc }

The fix

You need to do two things. Firstly. Hack that line[1] and replace it with:

"datetime"     => Proc.new  { |time|    ::Time.zone.parse(time) rescue ::DateTime.zone.parse(time) },

Secondly... somehow it doesn't pick up the timezone even though it's been helpfully added in via the config... so you need to open up config/environments.rb (or create a rails initializer) and put:
Time.zone = 'London'[2]
in there (outside the rails initialization block).

Notes:
[1]To hack rails, you can either
a) hack on your own rails gem = risky... will be overwritten the next time you sudo gem update or
b) rake rails:freeze:edge - which means you have your rails in your own vendor/rails directory... but means you have to rake rails:update manually... up to you which you hate more.

[2]Obviously substituting your own timeZone as appropriate here. See the TimeZone doc for what you can pass in.