Ruby-coloured glasses: November 2014

Sunday, 30 November 2014

Link: run in with turbolinks

A quick link today to an article on turbolinks. It includes a couple of different levels of work-around snippets when adding javascripty functionality to your own links so it's pretty useful.

Rails 4: My First Run-in with Turbolinks

Monday, 24 November 2014

Generate a subset of pseudo-random permutations in non mega-huge time

One of the neat things about sending packages via sendle is that you don't have to print out a label or even know the address of your recipient (this is done so the recipient can keep their address private if they want). Instead, we give the sender a pair of unique code-names to put onto the package. So instead of J Random Shop-assistant at EvilCorp getting your full address, instead you write something like "From: Golden Lion To: Red Tiger".

It's fun, as well as effective - like secret-agent names.

but it does present us with an interesting problem - how do we generate unique codenames for potentially many, many packages.

Codenames are assigned for every package, not just for every person - that way your humble shop assistant at EvilCorp can't get any info on your shopping habits based on your codename cropping up time and again either. So that means we need lots and lots of codenames, and they must be pretty randomised too.

As mentioned, our codenames are basically of the form "<adjective> <noun>", and we have a medium-sized stock of adjectives and nouns to build from.

With 100 adjectives and 50 nouns, that's a total of 5000 possible codephrases... but the codephrase generated could be used as either the "from" or "to" (as long as it's not reused for a while), so choosing a pair of such codephrases from a set of 5000 (without repetition) gives us 5000*4999 (or 24995000) possible permutations.

That's a lot.

It takes up a fair bit of db space, and it would take quite some time to generate all the possibilities and randomise them too.

So we decided not to do that... but instead, to generate a much smaller sub-set, on the fly (ie when we got close to running out).

Which leaves us with a different problem... how to do that effectively.

The solution that was in place before I arrived, was done in SQL (because it's always faster to let the db do it, right?), and involved doing a four-way cross-join on the adjectives+nouns... and then ordering them randomly... and using a limit to only actually keep a set amount... and then only inserting into our db those that weren't already there (done using an insert where select = 0)

Sadly, when performance testing, this was discovered to be exceedingly slow... ie when it was triggered, it would take 15min or so to run, causing all other "create a package" processes to block waiting for a new code to be generated... which was unacceptable. Sure, we could run this overnight instead, but hey, what if a Good Problem occurred and we had so many orders one day that we ran out in between overnight-runs?

So instead I came up with a solution that would allow us to quickly generate a small subset of new, randomised permutations without killing throughput as badly. The solution takes about 4 seconds to generate 2000 new permutations, which is enough to get by until it's needed again.

In order to explain what it is, I first need to explain a few non-solutions we hit upon to begin with and why they didn't work for us

Solution 1: random indexing into the arrays

It seems like such a simple problem - we have a small set of adjectives and a small set of nouns, just randomly pick one of each and see if it's not already one that we have... create it if not. so the first solution was just to use "rand" on the size of each array to pick one of each by index.

This works reasonably well when the data are sparse... but when you start filling in those gaps, a lot of time is spent thrashing about randomly trying to find a pair that hasn't already been chosen. And there was no way of simply saying "you've already picked that pair" so it would pick the same pairs over and over and over and keep trying them against the set that had already been created.

This became obviously ridiculous in the specs where I chose two nouns and two adjectives... and it spent roughly 16min thrashing back and forth between "blue dog" and "red cat" trying to randomly guess the last remaining pair ("red dog").

I didn't want that to happen while angry users were waiting to post their parcel in the middle of the xmas rush... we needed a solution that would guarantee that it would not endlessly repeat useless, already-tried pairs.

Solution 2: random ordering of the arrays

The next obvious idea was: shuffle the two arrays so they are (pseudo)randomly sorted.... then just loop through them grabbing the next one in line to try and find_or_create.

eg, using a simple nested loop like:

   adjectives.shuffle.each do |adj|
      nouns.shuffle.each do |noun|
         Code.find_or_create(:code => "#{adj} #{noun}")
      end
   end

This solution works. It guarantees that you only try each option once, and it's very quick. The downside is that the randomness isn't random enough.

Now lets say that after shuffling, your first adjective is "grey"... you now loop through every noun and attach the adjective "grey" to it... and end up with a run of very similar code-phrases like: "grey dog", "grey cat", "grey bird", "grey elephant", "grey tiger", "grey wolf", "grey hippo"... not really very random. Plus, don't forget I'm using pairs of code-phrases. So we'd be getting pairs of code-phrases such as "from: grey dog to: grey cat"

Now right now, we don't have anything set up where the user could potentially abuse this obviously guessable sequence to get ahold of somebody else's package... however someday we may...

but what we were more concerned about was that if all the packages have code words that are very similar... there is grand scope for confusion and mis-labelling - which has happened to some of our customers before, and causes everyone no end of heartache and pain.

What we want is not only for individual code-phrases to be random, but for the sequence to also be kind of random too...

Solution 3: rolling loops

So I came up with a solution I've called "rolling loops".[*]

Let i and j be the loop counters. Normally, you'd match i against every value of j, before then incrementing i again.... but we want both i and j to change every single iteration of j... but if we increment both i and j for each step of the nested loop, how do we roll on to the next loop?

Well, what we *really* want is to loop through every possible permutation of the two arrays... so how about I expand out some of those and show you how to get to the solution.

Lets suppose you have two arrays that are just the numbers 0..2 The normal way to loop gives you a complete set of permutations:

[[0,0] [0,1] [0,2]] [[1,0] [1,1] [1,2]] [[2,0] [2,1] [2,2]]

If we used these pairs of numbers as indexes into the two arrays of words.. we'd get every single pairing of words, but as you can see, in the first set, every word-pair would begin with the first adjective, and in the second set, every code-pair would begin with the section adjective and so on - just as described as a problem above.

But there's another way of grouping the permutations:

[[0,0] [1,1] [2,2]] [[0,1] [1,2] [2,0]] [[0,2] [1,0] [2,1]]

In the first subset, we have numbers that are the same as each other, in the second set, we have numbers that have a difference of 1 (mod the size of the array), and the third set are of diff 2 (mod size)

This guarantees to roll through every permutation, and for every step in the loop it changes both i and j - it's only the *difference between i and j* that changes over the longer timeframe.

That may seem complex when I describe it, but it's really not. If we draw the permutations as a square:

   0,0  1,1  2,2
   0,1  1,2  2,0
   0,2  1,0  2,1

The usual simple nested loop starts at the top left of the square and moves down the column before hopping onto the next column... but the rolling loops simply goes across the rows first instead.

Now at first glance, this may not seem very much more random... but in practice (with the nouns and adjectives all randomised before beginning) it is much better. It will eventually loop around again and the adjectives will start to come in the same order - but we have around 100 adjectives - and it's extremely rare for us to have one customer send 100 parcels all at once.

Benefits of this solution:

it's quick (ie it doesn't first enumerate all permutations before randomising
it touches every permutation eventually (rather than random-search and maybe it never finds the final one)
you can stop the loop at any time (eg when you have populated your next 1000 code-pairs) and restart by reshuffling the arrays into a new random arrangement
each code-pair is only checked *once* for whether or not it already exists - rather than randomly finding the same codepair over and over forever)
it produces a different i and j for every single iteration even of the inner loop

There's only one more tweak that my colleague suggested he said that to create a product of each of the adj/noun arrays is actually relatively quick - so in *our* project the final rolling-loop is only applied to two copies of the full product of all nouns and adjectives (randomised separately)

Here's my implementation which will keep generating pairs of code-phrases until we have the specified number.

   def populate_codes_until(final_count)
    current_count = self.count

    # generate every permutation of nouns and adjs 
    # then shuffle and duplicate
    froms = adjs.product(nouns).shuffle
    tos = froms.shuffle
    
    max_val = froms.size - 1
    
    begin
      # here we are iterating the *step size* 
      0.upto(max_val) do |step_size|
        # here we simply step through both arrays
        0.upto(max_val) do |i|

          # figure out j based on step-size and mod
          j = (i + step_size) % (max_val + 1)

          self.find_or_create_by(:from => froms[i].join(" "), :to => tos[j].join(" ")) do
            current_count += 1 # actually created a new one
          end

          return if current_count == final_count
        end
      end
    end
  end

[*]

yes, it's quite possible that this solution already exists somewhere and has a snazzier, and more widely-accepted, name... but I couldn't find it when I went googling, therefore it didn't exist :)

If you do happen to find it, let me know!

Thursday, 20 November 2014

Link: Fundamental Guidelines Of E-Commerce Checkout Design

"Here is the harsh reality of e-commerce websites: according to recent e-commerce studies, at least 59.8% of potential customers abandon their shopping cart (MarketingSherpa puts it at 59.8%, SeeWhy at 83% and MarketLive at 62.14%). The main question is why do customers abandon their shopping cart so often?"

Fundamental Guidelines Of E-Commerce Checkout Design condenses a large study into usability into 11 really useful guidelines to make your checkout experience much more likely to convert.

Here's a condensed, tl;dr version:

But I totally recommend reading the whole thing - it's not that long and chock full of useful insights

1. Your Checkout Process Should Be Completely Linear: Non-linear "steps within steps" confuse people
2. Add Descriptions To Form Field Labels: Just because it's obvious to you what it means, doesn't mean it is to somebody else. be unambiguous
3. Avoid Contextual Words Like “Continue”: continue... shopping? or continue... on to the next step. Also don't use "back" or "proceed"
4. Visually Reinforce All Sensitive Fields On The Payment Page: People are leery of putting their CC details into a page that looks thrown together
5. Don’t Use An “Apply” Button In Your Form: it's often not pressed, even when necessary, or mistaken for the main form-submit. use AJAX instead
6. Format Field For Expiration Date Exactly As It Appears On Credit Card: needs no explanation
7. Use Only One Column For Form Fields: multiple, competing interpretations are present for multi-column pages -> all necessary or alternatives?
8. Use Shipping Address As Billing Address By Default: dammit, make people's lives easier, not more difficult. most people ship and bill to the same address
9. Use Clear Error Indications: you'll just have to see the article to get a plethora of ideas of what not to to here
10. Registration Should Be Optional: yes, a thousand times yes! I don't need to register with your company if I buy something from a bricks+mortar store, why force me to online?
11. Don’t Require Seemingly Unnecessary Information: I don't want to be spammed by you, don't force me to give you fake data it'll only hurt both of us

The study found that all of these things led to people abandoning their shopping carts before converting... so they're worth looking into even if you don't care about people and just want to improve your bottom line. but hey, you could also improve people's lives and make their experiences less confusing, less frustrating and smoother. and isn't that what technology is meant to be for?

Friday, 14 November 2014

Link: Advisory locks in postgres

Advisory locks in postgres

"PostgreSQL provides various lock modes to control concurrent access to data in tables. Advisory locks provide a convenient way to obtain a lock from PostgreSQL that is completely application enforced, and will not block writes to the table."

This is not like row-locking your tables, this is to help you ensure concurrency for certain parts of your application. eg that you only enter a complex, resource-intensive part of your codebase once across all your servers. Given that ruby basically isn't very threadsafe - this provides a mechanism for concurrency-locking that is available across your plausibly widely-distributed servers by basically storing the semaphore in the mutually-accessible database.

Saturday, 8 November 2014

gotcha: redirecting through root-url fails to preserve anchor tag

Interesting (and annoying).

If you redirect to your root-path and pass it an anchor tag like "show-help" (in some controller) eg with

redirect_to root_path(:anchor => 'show-help')

It first redirects correctly to /#show-help but it then further redirects to your actual root url... and removes the anchor tag eg /my_dashboard

My expectation would be that it would preserve the anchor tag... eg the final URL would be: /my_dashboard#show-help

But of course, the browser doesn't actually send anchor tags to the server at all... which means that after the first request (to '/'), Rails never gets the anchor tag back again to pass onto the redirected url, which is why it gets dropped.

This is an issue if you really want to open the url with an anchor tag.

The main option you have is to not use root_url, but directly use the actual url you need eg: my_dashboard_url(:anchor => 'show-help'). It's not the best, but at least it works.

Sunday, 2 November 2014

gotcha: running `rake db:migrate:redo` and testing

So... rails *tells* me that rake db:test:prepare is deprecated, and that just running the specs will keep the db up-to-date... but it isn't always so.

If you jigger with a brand new migration and run a rake db:migate:redo - the test db can't deal with that by itself. You either have to run rake db:migate:redo RAILS_ENV=test or rake db:test:prepare to propagate those changes.