How a console.log brought my EC2 to its knees

It’s always a bittersweet moment when I discover something that makes me sigh to myself and realize I’m gonna have to write a blog post about this. And today turned out to be one of those days.

Epic Facepalm GIFs | Tenor
accurate depiction of me after finally figuring it out

Alright then let’s set the scene. Back in 2023, I had switched our hosting from Heroku to AWS EC2 since the former had shut down. Along with that, I had also written a rather impressive download delivery system that I was proud of, full of queues and workers and the whole shabang. Everything’s working absolutely fine, no user’s complaining, there’s no erroneous logs, life’s good.

Teletubbies Pok GIF - Teletubbies Pok - Discover & Share GIFs
life when code do good

Now, here comes the fun part. Early this year, I finished redesigning THE ENTIRE WEBSITE. That’s right, weeks of sleepless nights ideating and planning and developing and testing and rewriting and refactoring and redeveloping and retesting and good lord did it completely take over me. But, hardwork never loses, and our new website design was vastly superior. Highly intuitive for users, better fault tolerance, as well as more usable. Well, maybe TOO usable.

You see, I realized that in our earlier design, I had restricted users to download one file at a time. This was to prevent them accidentally overloading our old, feeble download function and taking down our server. BUT, after my glorious download revamp, I could safely remove this. I need no safety wheels. I need no limits. LET THE USERS FEAST. LET THEM DOWNLOAD AS MUCH AS THEIR HEARTS DESIRE. AFTER ALL, WHAT COULD GO WRONG? RIGHT? RIGHT?

I'm going to change the world. For the better right? Star ...
rule #1 of software development: never challenge a feature to break.

Nothing, actually. Users were able to download multiple files now. Payments were fine. No issues whatsoever. Except that you know, the EC2 instance would randomly crash once in a blue moon. It had happened just once between 2023 and 2024, and once in early February. I checked the logs, only to find no issue with a DDOS or something along those lines. Resolving it took hardly a minute; all I had to do was restart the instance and we were good to go. These factors go really well in creating what I’d coin as a ‘quiet bug’. It’s the sort of bug that’s a symptom of a much larger issue, but it’s hard to reproduce, easy to resolve, has no discernible markers, and rarely occurs. One really ought to catch these early on, before they turn into violent, application thrashing bugs. Of course, this is all wisdom I’ve learnt after my battle with this beast of a problem.

Four Horsemen of the Apocalypse Meme Generator - Imgflip
The four horsemen of quiet bugs

iCANthink went down for a whole day. I wasn’t available to check or resolve it. We were expecting a massive influx of users due to a podcast’s release. And MY application failed to deliver. I was furious. I decided I’d had enough of the quiet bug, and it was time to squash it out. And so off I went to diagnose it.

Here’s what I knew going in: I randomly get a spike in CPU usage, causing the server to crash. I checked out the logs just before each crash, and performed the same actions in a local server. No problemo. Worked like a charm. A bit of CPU spike but nothing much. Memory was fine as well. I was completely stumped. It had to be a DDOS right? Yet, the spikes in CPU usage did not coincide with spikes in network requests. WHAT COULD IT POSSIBLY BE?

The worst part about this was, Stack Overflow was not of much help. Neither was ChatGPT. The nature of the problem was so vague and ambiguous that no solution could help me outside of providing me tips on zeroing in on the cause. Out of options and ideas, I figured ah screw it, I’m just gonna attempt to crash the server myself. I saw no other cause than a DDOS, so let’s try it. Wrote a simple line of code that invoked 500 simultaneous curl requests to download a file, in an attempt to thwart my CPU. But alas, my beautiful, elegant, well written code could handle it easily. CPU spiked to 50% but no higher.

Suffering from Success - Wikipedia
a victim of my own success

But something’s off. Each download request caused the pm2 app’s memory usage to go up by 10MB. And it wasn’t clearing up. COULD IT BE A MEMORY LEAK? I was excited, not because I had inadvertentaly caused one, but because I might finally have a lead to investigate into. And the culprit I found would SHOCK YOU. or not I guess, it’s in the title.

console.log

That’s right, the culprit was a console.log statement. It seemed so bizarre and something that I’d only ever thought possible in memes. How could a console.log, a lowly, dungeon dwelling, innocent faced, gullible console.log, cause a behemoth of a memory leak?! Here’s how.

Bill Nye Magic Meme | Bill Nye The Science Guy Remixes
let me explain

For debugging purposes, I had added a console.log statement to verify the contents of the file I was downloading. It printed the entire buffer into the console. When you preview it, the terminal compresses the string, so it doesn’t seem like much. When you measure the node’s CPU/memory usage locally, it does not have the memory leak since its console’s memory is not accounted for. But that’s not the case for an EC2 instance. The pm2’s logs are also part of the running memory. This meant, each time a user downloaded any file, it would be logged to the terminal, permanently taking away 10 MB from the instance’s memory. There would have to be about 50-60 download requests without a restart for this error to occur. And since most users download one file a week, paired with the previously rapid development cycle that reset the memory every other day, this bug rarely occurred. It’s such a powerful boss it reads like a boss blind from Balatro (An absolutely phenomenal game by the way, hats off to its dev LocalThunk.)

david vs goliath

And so that was the fix. -1 line. Removed a singular console log statement. Memory leak fixed. Quiet bug fixed. No randomly crashing instances. No upset users. No downtime. The old me would have left it here, but what sort of developer would I be if I didn’t learn from my own mistakes? And so here I am, finally writing this blog post after spending an additional 6 hours after having cleaned up all console logs, setting up AWS alarms, Sentry error tracking, Vercel logging and the likes, so that I can stay on top of things. Never make the same mistake twice. Or in my case, four times.

And that’s the lesson from this adventure. Don’t dismiss quiet bugs. Resolving a bug is not the same as fixing it. Even if you fix it, establish checks in place to track the same bug or a similar one if it were to come up in the future. And check your console logs for memory leaks, the bastard’s always hungry for more.

Carina Seah, PhD on X: "That conspiracy meme except it's me and a nephron  https://t.co/bDYm4WEVQo" / X
“you gotta trust me bro, we’re one stray console.log away from absolute anarchy and chaos in the world”

Leave a Reply

Your email address will not be published. Required fields are marked *