Monday, 27 July 2009

40% speedup using Nokogiri!

Cut to the chase...

To cut your XML-processing time dramatically, sudo gem install nokogiri then add the following to config/environment.rb inside the Rails initializer.

config.gem "nokogiri"

Back-story and pretty pics

The problem

So, our site makes heavy use of ActiveResource [1], meaning that most of our data is located remotely.

Not surprisingly, some of our pages are *really* slow, so I landed the task of speeding them up. Apart from page-caching (not possible), fragment caching (only helps on the *second* hit), or some complicated messy idea of data-caching locally (tedious and likely to be evil); my first thought was to reduce the number of network hits. Clearly that's a high pain point, especially on our heavy pages that have many resource fetches.

Before I dove into performance hacks and updating the business logic into twisty little data reuse-patterns for network-hit reduction... I decided to actually try profiling.

I've been setting up a ruby-prof and kcachegrind recently[2]... and figured I should at least give that a look-at to see if my assumptions are correct.

I'm really glad I did, because when I ran it over our heaviest action, I saw that all the highest-weight method-calls led back to some form of ReXML parsing.

Searching on the ReXML components showed that the heaviest ReXML method took up a whopping 1 million process cycles. When our total process-cycles came to 5.8 million - that's a significant chunk of time spent in that one library.

As I mentioned - our site makes heavy use of ActiveResource, and one *big* problem with ActiveResource is that all your objects are parsed and re-parsed as xml for every fetch of data... so, in hindsight, it's fairly obvious that our site would spend a *lot* of time in the XML-parsing library. Any speedup in that department would help us immensely.

The solution?

We've recently been to Rails underground, and one of the lectures[3] had a slide comparing the speed of ReXML to several other ruby XML-parsing libraries[4]. Nokogiri came out as a clear winner in the speed department. The loser was equally clear... that being the Rails-default: ReXML

So, switching out the library would be an obvious speed win.

As it turns out - it's really easy to do this. Just install the gem, and require it in your Rails initializer using the instructions at the top of this post

But did it really help?

It seemed faster... but can we prove it?

From ReXML to Nokogiri - 40% speedup



[1] through the HyperactiveResource plugin
[2]I'll be giving a talk at LRUG on 12/08/2009 on how to use and interpret ruby-prof and kcachegrind
[3]During the talk by Maik Schmidt on Sneaking Ruby & Rails Into Big Companies
[4] I'm not sure, but it's possibly the one from this page comparing Ruby-XML performance benchmarks


Anonymous said...

If performance or memory usage is important for XML tasks, check out vtd-xml

Taryn said...

Looks like it's making a lot of claims, but I have to say the following throws me:
"available in C, C# and Java"

Do you have any examples of use in a ruby/rails context?