profile

Hi! I'm Moncef Belyamani

The 6 Characters That Could Bring Down Your Rails App

Published almost 3 years ago • 3 min read

Hi Reader, how's your day so far?

This week I wanted to share a story about the consequences of choosing the wrong tool for a job. It's a common mistake, but it's never too late to learn!

I also wrote another quick post, but it's specific to the latest Macs with the Apple Silicon M1 chip, so instead of including the whole post in this email, I'll just include a link for those of you that have an M1 Mac:

https://www.moncefbelyamani.com/how-to-install-homebrew-and-ruby-on-a-mac-with-the-m1-apple-silicon-chip/

The 6 Characters That Could Bring Down Your Rails App

It all started with a ping from one of the team leads who noticed a recurring exception in Sentry. At the same time, DataDog showed the Postgres CPU utilization was spiking.

Based on the Sentry logs, the team lead thought this exception was caused by a specific deploy, but they couldn’t find anything in the most recent commits. Then they asked me to look into it. I was new to this project, so I had the fresh eyes benefit.

The first thing I did was look at the exception in Sentry, which pointed to the problematic line of code, which had been introduced about a year earlier. I immediately knew what the problem was when I saw the infamous 6 characters:

.count

In Rails, calling .count on an ActiveRecord::Relation issues a Postgres COUNT query, which is slow, and gets slooooower the bigger the table.

I confirmed this by noticing statement timeouts for those COUNT queries in New Relic. This was with a statement_timeout set to 30 seconds!

I also double checked by looking at the id of the last record in the table, and it was over 34 million!

The query included a .where and a .where.not, so it wasn't trying to count the entire table, but I presumed even the filtered list was so big that Postgres couldn't count it all in 30 seconds.

Now that I knew this was indeed the line of code causing exceptions, timeouts, and high DB CPU usage, I looked at it more closely. Can you spot the issue?

previous_documents = Document.where(series_id: document.series_id).
  order(:id).
  where.not(vbms_document_id: ids)

if previous_documents.count > 0
  document.copy_metadata_from_document(previous_documents.last)
end

The only reason the previous_documents were being counted was to see if there was at least one of them. We were not at all interested in the actual number of documents that matched the first query, just whether or not the result was empty.

There are more efficient (and more idiomatic) ways to check if a query returned any results. The problem is there are several of them, such as any?, exists? and present?, but they don't behave the same way. On top of that, the behavior depends on the Rails version! It's hard to remember which one to use in which situation, and there aren't any automated tools that I know of to guide us.

Luckily, Nate Berkopec wrote a detailed article explaining it all, and I knew to replace .count > 0 with .any? which uses a SELECT 1 AS one FROM ... LIMIT 1.

As soon as this change was deployed, the CPU utilization was reduced by about 5-10x! 🎉

DataDog screenshot showing CPU usage lowered by 10x

Here's my PR, where I also fixed another related instance of .count > 0.

Once this production issue was resolved, I found other instances of .count > 0 where we only needed to check if the results were not empty, and I fixed those as well.

One question that came up was why did the timeouts happen all of a sudden on that particular day? My best guess was that the documents table was slowly growing over time, and finally reached the size at which Postgres could no longer make the COUNT query within 30 seconds. Perhaps a larger-than-usual amount of documents were created on that day. Unfortunately, that table didn't have timestamps (it does now!), so we couldn't easily tell how many documents were created at any given time. 😥

Lessons Learned

Here are actions you can take today to stand out and improve your project:

  • Check your app for .count > 0, .count.positive?, or just .count in general, and replace them with one of the faster methods.
  • Add timestamps to any tables that don't have them, even if you think you'll never need them.
  • Add monitoring to track and measure things that matter.
  • Use a lower value for global DB statement timeouts, and set a different value on a case-by-case basis, such as for long-running migrations.

P.S.

While the example in this story is based on a Rails application, the concept applies to any app (regardless of language or framework) that uses the Postgres COUNT query where a SELECT 1 AS one FROM ... LIMIT 1 would be more appropriate.

As always, if there's any question you have, or topic you want to talk about, just hit reply. I read everything you send.

Have a great week!

Moncef

Hi! I'm Moncef Belyamani

Every week, I send out an automation tutorial that will save you time and make you more productive. I also write about being a solopreneur, and building helpful things with Ruby. Join 2853 others who value their time.

Read more from Hi! I'm Moncef Belyamani

Hi Reader 👋🏼 Happy Sunday! I hope you and your loved ones are doing well. Earlier this week, I found a great use case for the 1Password CLI that hadn't occurred to me before. I'm gonna use it a lot more often whenever I can! If this email doesn't look right, or if you prefer reading on my site, you can click the title link below. Automate GitHub API Calls With Ruby, Keyboard Maestro, and 1Password CLI One of the perks of the “Ultimate” version of Ruby on Mac is access to the private GitHub...

over 1 year ago • 5 min read

Hi Reader! This week's automation guide is about a free but powerful Mac app called Bunch. I had heard of it years ago but never took the time to explore it in detail. Until now, and it has proven very useful so far. I'm not sure how the code samples will look like in your email, so you might prefer to read this on my site by clicking the title below. Automate Context Switching With Bunch You sit down to work on a feature, and wake up your Mac. Oh hey, Slack is open. You decide to check it...

over 1 year ago • 12 min read
PopClip default extensions

Hi Reader! This week's automation guide is about a little-known app called PopClip. PopClip was originally released in 2011, but I didn’t hear about it until four years ago, and I’m sure there are still a lot of people who don’t know about it. It’s one of the many useful apps you can discover and quickly install with the “Ultimate” version of Ruby on Mac. You can pick and choose from hundreds of Mac apps, fonts, and dev tools in the included Brewfile, and Ruby on Mac will install them all at...

over 1 year ago • 2 min read
Share this post