Creating associations on your Rails models prevents N+1 queries

31 Aug 2021

It's fairly common when working on existing or legacy codebases to find models that have instance methods defined on them that return ActiveRecord::Relation collections. This post explains how this can be avoided to improve the performance of your application.

Much like with other posts I'm going to be referring to the simple Rails app cars which, at the time of writing consists of two models Car and Garage. A Garage can have many cars parked in it and each car has a color. We're going to create an association on the Garage model that can be eager loaded and return all green cars.

The query

So we've been tasked with being able to determine all green cars in a specific garage. The query for that will look something like this:

garage = Garage.first

garage.cars.where(color: :green) #=> A collection of green cars

But we don't want to have to write that #where query every time we want to find some green cars so let's write a method!

A quick solution

The most obvious implementation of this, and one you are likely to see in many Rails applications that you work on is to create an instance method on theGarage model class like so:

class Garage
  has_many :cars

  def green_cars
    cars.where(color: :green)
  end
end

This will work, allowing us to do the following:

garage = Garage.first

garage.green_cars #=> A collection of green cars

It will also allow us to iterate over all of our garages and get all of the green cars that are currently parked in each:

Garage.all.map { |garage| garage.green_cars }

However, this is going to result in N+1 queries! Executing the command above results in a database hit for every single garage:

SELECT "garages".* FROM "garages"
SELECT "cars".* FROM "cars" WHERE "cars"."garage_id" = $1 AND "cars"."color" = $2 /* loading for inspect */ LIMIT $3  [["garage_id", 1], ["color", "green"], ["LIMIT", 11]]
SELECT "cars".* FROM "cars" WHERE "cars"."garage_id" = $1 AND "cars"."color" = $2 /* loading for inspect */ LIMIT $3  [["garage_id", 2], ["color", "green"], ["LIMIT", 11]]
SELECT "cars".* FROM "cars" WHERE "cars"."garage_id" = $1 AND "cars"."color" = $2 /* loading for inspect */ LIMIT $3  [["garage_id", 3], ["color", "green"], ["LIMIT", 11]]
SELECT "cars".* FROM "cars" WHERE "cars"."garage_id" = $1 AND "cars"."color" = $2 /* loading for inspect */ LIMIT $3  [["garage_id", 4], ["color", "green"], ["LIMIT", 11]]
SELECT "cars".* FROM "cars" WHERE "cars"."garage_id" = $1 AND "cars"."color" = $2 /* loading for inspect */ LIMIT $3  [["garage_id", 5], ["color", "green"], ["LIMIT", 11]]

This is what we want to avoid. Let's try eager loading our new method in an effort to prevent this!

Garage.includes(:green_cars).map { |g| g.green_cars }

Unfortunately Rails complains here with the following error telling us we haven't defined an association called green_cars on the Garage model:

Traceback (most recent call last):
        2: from (irb):6:in `<main>'
        1: from (irb):6:in `map'
ActiveRecord::AssociationNotFoundError (Association named 'green_cars' was not found on Garage; perhaps you misspelled it?)
Did you mean?  cars

Rails is trying really hard to point us in the right direction here, telling us we don't have an association called green_cars and even pointing us to associations that we do have defined on the model. In this case it's just one; cars.

Creating the green_cars association

Let's do what we're being told and create an association rather than an instance method:

class Garage
  has_many :cars
  has_many :green_cars, -> { where(color: :green) }, class_name: 'Car'
end

Now we'll be able to eager load our association with just one call to the cars table!

Garage.includes(:green_cars).map { |g| g.green_cars }

Which executes the following SQL when eager loaded, which is exactly what we want:

SELECT "garages".* FROM "garages"
SELECT "cars".* FROM "cars" WHERE "cars"."color" = $1 AND "cars"."garage_id" IN ($2, $3, $4, $5, $6)

Conclusion

Creating an instance method to grab specific associated records might seem like the easiest solution off the bat but I promise you that creating many methods in this way will result in performance degradation as your application grows in complexity and usage. Taking the time to declare relationships that can be eager loaded is marginally more time consuming but it's considerably more performant. I also find it more declarative in that any relationship defined at the top of the page will be returning an association that can be eagerly loaded, rather that needing to dig through the instance methods of a model.