Monday, 19 November 2007

Rails gotchas: Kamikaze data migrations

This is a pernicious little gotcha I've encountered a couple of times, generally by using somebody else's plugins. The problem with it is that it usually doesn't actually affect the developer - just anybody else that uses the system. This can lead to the developer refusing to believe that there is a problem.[1]

So what am I talking about here?

I'm talking about a certain type of data migration that explodes as you run it, sometimes even causing the database to get stuck in a nasty halfway-inbetween state where you can neither continue migrating up... or roll back to a previous stable state.

"But migrations are transactional" I hear you cry. Sure they are. That doesn't stop this particular error from occurring... and *still* leaving the db in a nasty stuck state - honest, just gimme a chance to explain why.

Picture this: You've been happily developing away building the next generation of your category-killer Web 2.0 Widget application when you suddenly realise that your widgets need to refer to pieces instead of parts. So, like a good agile developer you:

  • Create a new Piece resource with all the bells and whistles.
  • Create and run the migration that a) creates the pieces table b) data migrates widgets so they point at pieces instead of parts c) drops the parts table.
  • Update all references in your code from pieces to parts.
  • Remove all the part code from controllers/views etc.

All goes smoothly. rake test runs flawlessly and you check in your code. You continue on your merry way, adding more functionality and reflecting on your brilliance...

...until a user of your code emails you to say that your migration breaks and has left their db in some wierd state that doesn't let them migrate up or down.

The problem is that when you (the developer) ran the migration, you still had the Part class in app/models and the Widget class still had references like "has_one :part" in them. When your user downloads your system, however, it may be several changes later... and the system has no recollection of a Part class or how they relate to Widgets. So when you have a data migration such as:

   Widget.find_all.each { |w| make_piece_from_part(w.part) }

It won't know what a part is - or how you get one off your widget class. The nasty part is that often you will find this kind of data migration stuffed in the middle of the schema migrations thus:

def self.up
  create_table :pieces do |t|
     #... creates a table here
  add_column :widgets, :piece_id, :integer, :default => nil
  Widget.find_all.each { |w| make_piece_from_part(w.part) }
  remove_column :widgets, :part_id
def self.up
  add_column :widgets, :part_id, :integer, :default => nil
  Widget.find_all.each { |w| make_part_from_piece(w.piece) }
  remove_column :widgets, :piece_id
  drop_table :pieces

I'm told that migrations are transactional - ie if it fails, then everything rolls back to the state before the migration... this doesn't seem to work when data migrations are involved. If the data part of the migration falls over, it stops halfway through the migration with a nasty backtrace, but doesn't perform any rollback. So at that point it's already added the pieces table and the piece_id column to the widgets table, but it hasn't removed the part_id column and the schema is still pointing at the migration number before this one. So if you try to run the migration again, it will fail on the first line, saying something like:

== MovePartsToPieces: migrating ============================================
-- create_table(:pieces)
rake aborted!
Mysql::Error: Table 'pieces' already exists: CREATE TABLE pieces (<piece-table insert statement here>) ENGINE=InnoDB

(See full trace by running task with --trace)

Unfortunately it also won't migrate you back. The bad data migration failed before it had a chance to update the current migration level - so migrating back gives you entirely unhelpful "finished in 0.000183" seconds message (or similar), without doing anything at all.

Recovering from the stuck state

Recovering is possible, but it takes some small amount of hacking in the migration. Basically you need to force Rails to run an empty migration so that it does nothing except to put the version number up. Then you can force it to run the bits of the down migration that cancel the bits that it did the first time you tried running the up migration. In detail:

  1. Comment out every line of code in self.up
  2. Comment out the appropriate lines in self.down that correspond with stuff that never got run when the migration fell over the first time - ie leave anything that cancels out actions that were *successfully* done when you did the up migration. in the example above, you'd comment out the downward data migration and the addition of the piece_id column - leaving only the removal of the part_id column and the drop table - which would leave you back in the state you started in.
  3. Save the file
  4. Now you can run rake db:migrate. This should should leave you with a database that at least has the right version number for you to be able to migrate back down again.
  5. Now run the down migration with rake db:migrate VERSION=<one before this one>.
  6. Now go fix your migration so this doesn't happen to anybody else.

Fixing this kind of data-migration nightmare is the only time that I recommend you go back and edit old migrations rather than creating a new one.

  • [1]Ok, so the real reason I'm writing this post is partly so that I can point said unbelievers at it the next time they swear that nothing's wrong and that I should just quit fussing about.


Jeremy said...

Unfortunately, Rails migrations are not transactional by default, presumably because MySQL (which used to be the default database) doesn't support transaction DDL. Postgres does (not sure about any others), so if you're using it you can use this plugin.

Taryn said...

Neat plugin!

Sadly we're using MySQL. We've tried to convince the client to go postgres, but MySQL is entrenched due to the legacy apps. But I'll have a look into the plugin and see if we can use something there anyway :)