Showing posts with label xml. Show all posts
Showing posts with label xml. Show all posts

Monday, 27 July 2009

40% speedup using Nokogiri!

Cut to the chase...

To cut your XML-processing time dramatically, sudo gem install nokogiri then add the following to config/environment.rb inside the Rails initializer.

config.gem "nokogiri"
ActiveSupport::XmlMini.backend='Nokogiri'

Back-story and pretty pics

The problem

So, our site makes heavy use of ActiveResource [1], meaning that most of our data is located remotely.

Not surprisingly, some of our pages are *really* slow, so I landed the task of speeding them up. Apart from page-caching (not possible), fragment caching (only helps on the *second* hit), or some complicated messy idea of data-caching locally (tedious and likely to be evil); my first thought was to reduce the number of network hits. Clearly that's a high pain point, especially on our heavy pages that have many resource fetches.

Before I dove into performance hacks and updating the business logic into twisty little data reuse-patterns for network-hit reduction... I decided to actually try profiling.

I've been setting up a ruby-prof and kcachegrind recently[2]... and figured I should at least give that a look-at to see if my assumptions are correct.

I'm really glad I did, because when I ran it over our heaviest action, I saw that all the highest-weight method-calls led back to some form of ReXML parsing.

Searching on the ReXML components showed that the heaviest ReXML method took up a whopping 1 million process cycles. When our total process-cycles came to 5.8 million - that's a significant chunk of time spent in that one library.

As I mentioned - our site makes heavy use of ActiveResource, and one *big* problem with ActiveResource is that all your objects are parsed and re-parsed as xml for every fetch of data... so, in hindsight, it's fairly obvious that our site would spend a *lot* of time in the XML-parsing library. Any speedup in that department would help us immensely.

The solution?

We've recently been to Rails underground, and one of the lectures[3] had a slide comparing the speed of ReXML to several other ruby XML-parsing libraries[4]. Nokogiri came out as a clear winner in the speed department. The loser was equally clear... that being the Rails-default: ReXML

So, switching out the library would be an obvious speed win.

As it turns out - it's really easy to do this. Just install the gem, and require it in your Rails initializer using the instructions at the top of this post

But did it really help?

It seemed faster... but can we prove it?

From ReXML to Nokogiri - 40% speedup

Yup.


Notes

[1] through the HyperactiveResource plugin
[2]I'll be giving a talk at LRUG on 12/08/2009 on how to use and interpret ruby-prof and kcachegrind
[3]During the talk by Maik Schmidt on Sneaking Ruby & Rails Into Big Companies
[4] I'm not sure, but it's possibly the one from this page comparing Ruby-XML performance benchmarks

Wednesday, 15 July 2009

Gotcha: UTC vs Local TimeZones with ActiveResource

So... your database is filled with datetime data and it's all configured to localtime, not UTC... We also have this you-beat nifty ability to set all our datetime-handling functionality to a given timezone by setting, say: config.time_zone = 'London' in config/environments.rb... or do we?

If you also use ActiveResource (or the new, actually-working HyperactiveResource), you'll find that suddenly you're getting a UTC-local timezone issue once more.

The problem is that the xml that comes back from a remote API is converted into a Date/DateTime using the core-extension to the Hash.from_xml method... which has the following LOC:

"datetime"     => Proc.new  { |time|    ::Time.parse(time).utc rescue ::DateTime.parse(time).utc }

The fix

You need to do two things. Firstly. Hack that line[1] and replace it with:

"datetime"     => Proc.new  { |time|    ::Time.zone.parse(time) rescue ::DateTime.zone.parse(time) },

Secondly... somehow it doesn't pick up the timezone even though it's been helpfully added in via the config... so you need to open up config/environments.rb (or create a rails initializer) and put:
Time.zone = 'London'[2]
in there (outside the rails initialization block).

Notes:
[1]To hack rails, you can either
a) hack on your own rails gem = risky... will be overwritten the next time you sudo gem update or
b) rake rails:freeze:edge - which means you have your rails in your own vendor/rails directory... but means you have to rake rails:update manually... up to you which you hate more.

[2]Obviously substituting your own timeZone as appropriate here. See the TimeZone doc for what you can pass in.

Tuesday, 28 April 2009

Playing nice with XML and HTTP

So we've got a setup that has a remote API that we're accessing using HyperactiveResource (an extended version of ActiveResource). Now, I'm using Rails to simulate the remote API (for the purposes of testing), and I've come across some annoying behaviour.

One issue is that standard rails routing for a RESTful interface will direct a badly-constructed (or non-existent) URL to a real action... let me demonstrate thus:

Real member path: /users/1.xml Routed to: :controller => 'users', :action => 'show', :id => '1'
Real named collection path: /users/count.xml Routed to: :controller => 'users', :action => 'count'
Non-existent path: /users_party_on.xml Routed to: "Bad Request" handler
Non-existent path2: /users/party_on.xml Routed to: :controller => 'users', :action => 'show', :id => 'party_on'

If I called the last URL with curl, I'd expect to be routed to the "Bad request" handler and receive some sort of error-like http-status and an XML message explaining that no route exists or something similar... what I get instead is a horrible big *html* page telling me it couldn't find the user with an id of "party_on" (unsurprisingly).

So what do I want to have happen? I'd rather this stuff was caught in the router. It'd be nice if there were a way to tell the router that your :controller/:action/:id is only valid for a certain formatting of the :id field. If anybody out there on teh Intarwebs knows how to do that, please tell me now!

Unfortunately, it doesn't seem to do this... and in any case, the router/dispatcher also doesn't seem to return XML to an XML-request... it only seems to know how to handle HTML-based errors (by spitting back the public error pages[1]).

So instead, what I need is to return a "URL not found"-style xml error at the appropriate time.

Most of my controllers have a "find_" function on the member-functions (ie just @thing = Thing.find(paramd[:id])). Now, since the bad URLs tend to converge on the "show" action - this seems as good a place to put a bad-request filter as any. I'll also incorporate it with the 404-code that also seems missing when a doesn't exist (or is not accessible by this person).

So this calls for a helper-method as below, as a hack to fix this lack of proper routing.

  # convenience method for extracting the expected model name from the
  # controller name
  # Note: expects the model to be rails-standard eg "ThingsController"
  # should map to the Thing model
  def model_name
    self.controller_name.singularize.camelize
  end

  # use this to skip out early and return better http status codes for XML
  # requests.
  #
  def find_thing
    the_id = params[:id]
    # ids should be numeric. If they're not - we accidentally got through
    # the router with an unrecognised action - because Railsy named-routes
    # that *don't* exist, look like the "show" action with a bad id.
    if !the_id.blank? && !the_id.to_i.is_a?(Numeric)
      # skip out with a 400 early...
      respond_to do |format|
        format.xml do
          return render :xml => 'Error: URL not recognised', :status => :bad_request 
        end
      end
    end
    begin
      thing = model_name.constantize.find(the_id.to_i)
    rescue ActiveRecord::RecordNotFound
      # skip out with a 404 early...
      respond_to do |format|
        format.xml do
          return render :xml => 'Error: resource not found', :status => :not_found 
        end
      end
    end
    thing
  end

Due to how controller before_filters work[2], you should use this code thus:

class UsersController < ApplicationController
   before_filter :find_user, :except => [:new, :create, :index, :count]
   # actions all go here
   # ...
protected
  def find_user
    @user = find_thing
  end
end

Caveat: if you have non-standard naming of models/controller - the model_name.constantize will not work... so you may want to modify this to pass in an optional klass param.

Notes:
[1] and when exactly are they going to make these into templates so we can use the standard layout rather than hand-coding it for each one?
[2] ie, I'm too slack to figure out exactly how to do: @thingy = in a block passed to the before_filter command. Again - if you know how, let me know.