Every week, I send out an automation tutorial that will save you time and make you more productive. I also write about being a solopreneur, and building helpful things with Ruby. Join 2853 others who value their time.
Hi Reader, how's your day so far?
This week I wanted to share a story about the consequences of choosing the wrong tool for a job. It's a common mistake, but it's never too late to learn!
I also wrote another quick post, but it's specific to the latest Macs with the Apple Silicon M1 chip, so instead of including the whole post in this email, I'll just include a link for those of you that have an M1 Mac:
It all started with a ping from one of the team leads who noticed a recurring exception in Sentry. At the same time, DataDog showed the Postgres CPU utilization was spiking.
Based on the Sentry logs, the team lead thought this exception was caused by a specific deploy, but they couldn’t find anything in the most recent commits. Then they asked me to look into it. I was new to this project, so I had the fresh eyes benefit.
The first thing I did was look at the exception in Sentry, which pointed to the problematic line of code, which had been introduced about a year earlier. I immediately knew what the problem was when I saw the infamous 6 characters:
.count
In Rails, calling .count
on an ActiveRecord::Relation
issues a Postgres COUNT
query, which is slow, and gets slooooower the bigger the table.
I confirmed this by noticing statement timeouts for those COUNT
queries in New Relic. This was with a statement_timeout
set to 30 seconds!
I also double checked by looking at the id
of the last record in the table, and it was over 34 million!
The query included a .where
and a .where.not
, so it wasn't trying to count the entire table, but I presumed even the filtered list was so big that Postgres couldn't count it all in 30 seconds.
Now that I knew this was indeed the line of code causing exceptions, timeouts, and high DB CPU usage, I looked at it more closely. Can you spot the issue?
previous_documents = Document.where(series_id: document.series_id).
order(:id).
where.not(vbms_document_id: ids)
if previous_documents.count > 0
document.copy_metadata_from_document(previous_documents.last)
end
The only reason the previous_documents
were being counted was to see if there was at least one of them. We were not at all interested in the actual number of documents that matched the first query, just whether or not the result was empty.
There are more efficient (and more idiomatic) ways to check if a query returned any results. The problem is there are several of them, such as any?
, exists?
and present?
, but they don't behave the same way. On top of that, the behavior depends on the Rails version! It's hard to remember which one to use in which situation, and there aren't any automated tools that I know of to guide us.
Luckily, Nate Berkopec wrote a detailed article explaining it all, and I knew to replace .count > 0
with .any?
which uses a SELECT 1 AS one FROM ... LIMIT 1
.
As soon as this change was deployed, the CPU utilization was reduced by about 5-10x! 🎉
Here's my PR, where I also fixed another related instance of .count > 0
.
Once this production issue was resolved, I found other instances of .count > 0
where we only needed to check if the results were not empty, and I fixed those as well.
One question that came up was why did the timeouts happen all of a sudden on that particular day? My best guess was that the documents
table was slowly growing over time, and finally reached the size at which Postgres could no longer make the COUNT
query within 30 seconds. Perhaps a larger-than-usual amount of documents were created on that day. Unfortunately, that table didn't have timestamps (it does now!), so we couldn't easily tell how many documents were created at any given time. 😥
Here are actions you can take today to stand out and improve your project:
.count > 0
, .count.positive?
, or just .count
in general, and replace them with one of the faster methods.While the example in this story is based on a Rails application, the concept applies to any app (regardless of language or framework) that uses the Postgres COUNT
query where a SELECT 1 AS one FROM ... LIMIT 1
would be more appropriate.
As always, if there's any question you have, or topic you want to talk about, just hit reply. I read everything you send.
Have a great week!
Moncef
Every week, I send out an automation tutorial that will save you time and make you more productive. I also write about being a solopreneur, and building helpful things with Ruby. Join 2853 others who value their time.
Hi Reader 👋🏼 Happy Sunday! I hope you and your loved ones are doing well. Earlier this week, I found a great use case for the 1Password CLI that hadn't occurred to me before. I'm gonna use it a lot more often whenever I can! If this email doesn't look right, or if you prefer reading on my site, you can click the title link below. Automate GitHub API Calls With Ruby, Keyboard Maestro, and 1Password CLI One of the perks of the “Ultimate” version of Ruby on Mac is access to the private GitHub...
Hi Reader! This week's automation guide is about a free but powerful Mac app called Bunch. I had heard of it years ago but never took the time to explore it in detail. Until now, and it has proven very useful so far. I'm not sure how the code samples will look like in your email, so you might prefer to read this on my site by clicking the title below. Automate Context Switching With Bunch You sit down to work on a feature, and wake up your Mac. Oh hey, Slack is open. You decide to check it...
Hi Reader! This week's automation guide is about a little-known app called PopClip. PopClip was originally released in 2011, but I didn’t hear about it until four years ago, and I’m sure there are still a lot of people who don’t know about it. It’s one of the many useful apps you can discover and quickly install with the “Ultimate” version of Ruby on Mac. You can pick and choose from hundreds of Mac apps, fonts, and dev tools in the included Brewfile, and Ruby on Mac will install them all at...