Tuesday, 11 March 2008

Performance tips > indexing matters

As I've mentioned in a previous post, I've been doing a lot of performance tweaking recently. By far and away the biggest gains I've garnered have been by tweaking the SQL/DB in various ways. One of the biggest things I learned was to use indexes appropriately. This means two things:

  1. Put an index on anything important
  2. Get the field order right

So what's important?

From the perspective of an index, immportant means anything that you will need to fetch repeatedly - especially if that fetch is in a tight loop. This is one place where RubyProf is your friend. You can also take a look at the SQL queries scrolling up the screen while watching the console output to get a rough idea of the most important queries.

What about the field order?

Indexes work better if you put the "most discerning" fields before the less-discerning ones. As an example, suppose you have a (completely spurious) table as follows:

owner_name pet_name toy_name
Bob Fido Bone
Bob Fido Rubber chicken
Bob Whiskers Jingly ball
Bob Whiskers paper on string
Jane Woofy chew toy
Jane Woofy bouncy ball
Jane Kitty paper bag
Jane Kitty laser pointer

Assume for some reason it's important to have an index on all three fields above. The field that is most unique is the "toy_name" - doing a lookup on that field will get the best bang for your buck, so it should come first in your index, followed by the next-most discerning field. In the example data above it's the pet_name, but maybe you know your intended audience a bit better. If you know that a lot of people are calling their dogs 'fido' and their cats 'kitty', you might index on the owner_name. The most likely circumstance is that you'll have the owners listed in a users list and there's actually an owner_id - in which case you'd definitely set that as the second key for the index.

Thus you'd end up with:
add_index :pet_toys, [:toy_name, :owner_id, :pet_name], :name => :idx_pettoys_by_name_oid_pet

Is it really that important?

The contract I'm currently working on requires us to display reports with tens of thousands of orders and trades in them, cross-referenced against multiple calculated benchmark values. It's imperative that the indexes are set up to pull that data out of the db in the most efficient way possible.

I acheived an order of Magnitude speedup just by changing around the ordering of the indexes! This was worth it for us - YMMV

Tuesday, 4 March 2008

assert_event_logged > Getting stuff back out of the Ruby Logger

So we wanted to test whether our code actually puts things into the log when it should - eg to log important events such as "User X just got archived by User Y" or "Someone tried to login as User X but failed!"

Unfortunately, the Ruby Logger seems to be write-only. There's no way to actually get back the messages that you carefully put into it. So how to test it?

Asking around, I got a few ideas for how to go forward. The best option would be to have a separate "audit log" for special events that need to be independantly auditable. However this would take Time(tm) (of which we seem to be in short supply these days). The next best option was to use mocha... but that,, to my mind, doesn't really test the actual logger, which I want to be certain actually works the way I expect.

So I began to think... hmmm - monkey-patch.

What follows is quick and dirty... I had intended to go for elegeant, but I'm satisfied that it at least tests the actual logger object rather than having to do something dodgy by creating a temporary, second log and reading it in as a file :P

The monkey--patches

Put these into test_helper.rb somewhere up the top (not inside class test::Unit::TestCase)

# monkey patch ruby's "Logger" to make it accessible to tests
class Logger
  alias old_add add
  def add(lvl, *args)
    @last_logged ||= {} 
    @last_logged[lvl] ||= []
    @last_logged[lvl] << args[1]
    old_add(lvl, *args)
  end
  def last_logged(lvl = nil)
    @last_logged ||= {}
    return @last_logged.values.flatten unless lvl
    @last_logged[lvl]
  end
  def clear_log_messages
    @last_logged = {}
  end
end

Put this stuff inside class test::Unit::TestCase:

  # a way of actually getting the usual rails logger object
  def logger
    RAILS_DEFAULT_LOGGER
  end
 # assert that the log has received the given message at the given log
  # level. Can pass in matching message as a string or a regular expression.
  def assert_event_logged(regex = nil, lvl = nil, match = true, &the_block)
    logger.clear_log_messages
    regex = /#{regex}/ if regex.is_a? String

    yield the_block

    last_logged = logger.last_logged(lvl)

    if match
      assert !last_logged.blank?, "should have logged some message at given level #{lvl.inspect}"
      assert last_logged.detect {|msg| msg =~ regex }, "should have matched the given argument #{regex} at the given level: #{lvl.inspect} Instead got: #{last_logged.join('\n')}"
    else
      return if last_logged.blank? # trivial case
      assert last_logged.all? {|msg| msg !=~ regex }, "should not have matched the given argument #{regex} at the given level: #{lvl.inspect}. Instead got: #{last_logged.join('\n')}"
    end
  end

  # assert that this specific event was not logged
  # ie - just an alias for the opposite of event_logged
  def assert_event_not_logged(regex = nil, lvl = nil, &the_block)
    assert_event_logged(regex, lvl, false) { yield the_block }
  end

  # check that no events were logged at the given level    
  def assert_no_event_logged(lvl = nil, &the_block)
    logger.clear_log_messages

    yield the_block

    last_logged = logger.last_logged(lvl)
    assert last_logged.blank?, "should not have logged a message at the given level at the given level: #{lvl.inspect}"
  end

Here are some basic tests that will test the above assertions are working correctly... and stand as basic examples of use:

  def test_should_show_msg_logged
    msg = "forcing the logger to log"
    assert_event_logged(msg) do 
      logger.debug msg
    end
  end
  def test_should_show_message_logged_at_a_given_log_level
    msg = "forcing the logger to log"
    assert_event_logged(msg, Logger::DEBUG) do 
      logger.debug msg
    end
    assert_event_logged(msg, Logger::INFO) do 
      logger.info msg
    end
  end
  def test_should_show_msg_logged_matches_given_regex
    msg = "forcing the logger to log"
    assert_event_logged(/forcing/, Logger::INFO) do 
      logger.info msg
    end
    assert_event_logged(/oogley/, Logger::INFO, false) do 
      logger.info msg
    end
  end
  def test_should_show_msg_logged_matches_given_regex
    assert_no_event_logged() do 
      # do something that doesn't log anything
      three = 1 + 2
    end
  end
  def test_should_not_match_msg_that_doesnt_match
    assert_event_not_logged("yada yada", Logger::INFO) do 
      logger.info "blah blah"
    end
  end
  def test_should_not_match_msg_logged_at_different_log_level
    assert_no_event_logged(Logger::INFO) do 
      # do something that doesn't log anything
      three = 1 + 2
      # log something at a log level different to that specified
      logger.debug "should not get this one"
    end
  end

PS - if anyone wondered where I've been for the past months I've been pretty busy performance-tuning the site in preparation for some sort of slightly-more-official-launch-than-just-to-special-customers-only...

So haven't had much chance to actually do anything interesting, but hopefully we're actually *that* much closer to really actually launching the site to Real People (tm).