Tuesday, 11 March 2008

Performance tips > indexing matters

As I've mentioned in a previous post, I've been doing a lot of performance tweaking recently. By far and away the biggest gains I've garnered have been by tweaking the SQL/DB in various ways. One of the biggest things I learned was to use indexes appropriately. This means two things:

  1. Put an index on anything important
  2. Get the field order right

So what's important?

From the perspective of an index, immportant means anything that you will need to fetch repeatedly - especially if that fetch is in a tight loop. This is one place where RubyProf is your friend. You can also take a look at the SQL queries scrolling up the screen while watching the console output to get a rough idea of the most important queries.

What about the field order?

Indexes work better if you put the "most discerning" fields before the less-discerning ones. As an example, suppose you have a (completely spurious) table as follows:

owner_name pet_name toy_name
Bob Fido Bone
Bob Fido Rubber chicken
Bob Whiskers Jingly ball
Bob Whiskers paper on string
Jane Woofy chew toy
Jane Woofy bouncy ball
Jane Kitty paper bag
Jane Kitty laser pointer

Assume for some reason it's important to have an index on all three fields above. The field that is most unique is the "toy_name" - doing a lookup on that field will get the best bang for your buck, so it should come first in your index, followed by the next-most discerning field. In the example data above it's the pet_name, but maybe you know your intended audience a bit better. If you know that a lot of people are calling their dogs 'fido' and their cats 'kitty', you might index on the owner_name. The most likely circumstance is that you'll have the owners listed in a users list and there's actually an owner_id - in which case you'd definitely set that as the second key for the index.

Thus you'd end up with:
add_index :pet_toys, [:toy_name, :owner_id, :pet_name], :name => :idx_pettoys_by_name_oid_pet

Is it really that important?

The contract I'm currently working on requires us to display reports with tens of thousands of orders and trades in them, cross-referenced against multiple calculated benchmark values. It's imperative that the indexes are set up to pull that data out of the db in the most efficient way possible.

I acheived an order of Magnitude speedup just by changing around the ordering of the indexes! This was worth it for us - YMMV

4 comments:

Anonymous said...

Warning: composite indexes can give you a false sense of security. With specific reference to your example, the query "SELECT * FROM pets WHERE pet_name = 'Fido'" will not use your index at all.

If you want to use a composite index, you have to include all its fields in the conditions, as in "SELECT * FROM pets WHERE toy_name = 'x' AND owner_name = 'y' AND pet_name = 'Fido'." If you want to use the fields individually, you'll need to declare indexes specifically for them, too

Anonymous said...

An index can indeed be used without searching for all of its components. One example is a "covering" index, where the data you need is in the index, so you don't have to go back to the table to read it.

This -might- depend on the dbms, but I hope not. It's pretty standard.

///ark

Anonymous said...

Sorry, I should have taken the time to go a bit deeper into this. You can search on some of the fields individually or in partial combination, but it depends on the order they appear in the index. In the original example, "toy_name='x'" would use the index, but "pet_name='y'" wouldn't (I verified this in MySQL 5.0.22 on Mac OS X)

Taryn East said...

Hi Ben - yes you're absolutely right. I guess I assumed that - which is a bad idea ;)
We have multiple indexes on tables used extremely frequently - with all the different ways of getting the data out.


Mark - definitely! It can be really useful if you can do that. It only covers queries belonging to a single object, but that can be really useful in a lot of places.