31 Aug 2021
It's fairly common when working on existing or legacy codebases to find models that have instance methods defined on them that return ActiveRecord::Relation collections. This post explains how this can be avoided to improve the performance of your application.
Much like with other posts I'm going to be referring to the simple Rails app cars which, at the time of writing consists of two models Car
and Garage
. A Garage
can have many cars parked in it and each car has a color. We're going to create an association on the Garage
model that can be eager loaded and return all green cars.
So we've been tasked with being able to determine all green cars in a specific garage. The query for that will look something like this:
garage = Garage.first
garage.cars.where(color: :green) #=> A collection of green cars
But we don't want to have to write that #where
query every time we want to find some green cars so let's write a method!
The most obvious implementation of this, and one you are likely to see in many Rails applications that you work on is to create an instance method on theGarage
model class like so:
class Garage
has_many :cars
def green_cars
cars.where(color: :green)
end
end
This will work, allowing us to do the following:
garage = Garage.first
garage.green_cars #=> A collection of green cars
It will also allow us to iterate over all of our garages and get all of the green cars that are currently parked in each:
Garage.all.map { |garage| garage.green_cars }
However, this is going to result in N+1 queries! Executing the command above results in a database hit for every single garage:
SELECT "garages".* FROM "garages"
SELECT "cars".* FROM "cars" WHERE "cars"."garage_id" = $1 AND "cars"."color" = $2 /* loading for inspect */ LIMIT $3 [["garage_id", 1], ["color", "green"], ["LIMIT", 11]]
SELECT "cars".* FROM "cars" WHERE "cars"."garage_id" = $1 AND "cars"."color" = $2 /* loading for inspect */ LIMIT $3 [["garage_id", 2], ["color", "green"], ["LIMIT", 11]]
SELECT "cars".* FROM "cars" WHERE "cars"."garage_id" = $1 AND "cars"."color" = $2 /* loading for inspect */ LIMIT $3 [["garage_id", 3], ["color", "green"], ["LIMIT", 11]]
SELECT "cars".* FROM "cars" WHERE "cars"."garage_id" = $1 AND "cars"."color" = $2 /* loading for inspect */ LIMIT $3 [["garage_id", 4], ["color", "green"], ["LIMIT", 11]]
SELECT "cars".* FROM "cars" WHERE "cars"."garage_id" = $1 AND "cars"."color" = $2 /* loading for inspect */ LIMIT $3 [["garage_id", 5], ["color", "green"], ["LIMIT", 11]]
This is what we want to avoid. Let's try eager loading our new method in an effort to prevent this!
Garage.includes(:green_cars).map { |g| g.green_cars }
Unfortunately Rails complains here with the following error telling us we haven't defined an association called green_cars
on the Garage
model:
Traceback (most recent call last):
2: from (irb):6:in `<main>'
1: from (irb):6:in `map'
ActiveRecord::AssociationNotFoundError (Association named 'green_cars' was not found on Garage; perhaps you misspelled it?)
Did you mean? cars
Rails is trying really hard to point us in the right direction here, telling us we don't have an association called green_cars
and even pointing us to associations that we do have defined on the model. In this case it's just one; cars
.
Let's do what we're being told and create an association rather than an instance method:
class Garage
has_many :cars
has_many :green_cars, -> { where(color: :green) }, class_name: 'Car'
end
Now we'll be able to eager load our association with just one call to the cars table!
Garage.includes(:green_cars).map { |g| g.green_cars }
Which executes the following SQL when eager loaded, which is exactly what we want:
SELECT "garages".* FROM "garages"
SELECT "cars".* FROM "cars" WHERE "cars"."color" = $1 AND "cars"."garage_id" IN ($2, $3, $4, $5, $6)
Creating an instance method to grab specific associated records might seem like the easiest solution off the bat but I promise you that creating many methods in this way will result in performance degradation as your application grows in complexity and usage. Taking the time to declare relationships that can be eager loaded is marginally more time consuming but it's considerably more performant. I also find it more declarative in that any relationship defined at the top of the page will be returning an association that can be eagerly loaded, rather that needing to dig through the instance methods of a model.