Skip to content
  • Home

  • Engineering

  • Engineering insights

Engineering insights

2 min read

Decreasing RAM Usage by 40% Using jemalloc with Python & Celery

By Bryan Helmig · December 14, 2017

At Zapier, we're running hundreds of instances of Python and Celery, a distributed queue that helps us perform thousands of tasks per second. With that kind of load, the RAM usage has become a point of contention for us.

To process all the activity within our service, we run about 150 Celery workers with the settings -w 92 on our m3.2xlarge – which means each worker got approximately ~320 MB of RAM. This should have been plenty, yet it wasn't uncommon to have 8-12 GB of swap usage on a worker after a couple hours running normally. Not only was this taxing our RAM, it was also preventing us from upgrading to newer versions of Amazon hardware which didn't have the faster instance store enabled over EBS.

Further, we noticed that most of this memory usage smelled a bit like fragmentation –tons of memory pages were being moved to swap but it didn't seem to hurt performance at all. Clearly Celery or our Python code wasn't touching these fragmented pages of memory very often.

Familiar with the memory allocator jemalloc from Redis and how well it handles fragmentation, we decided to experiment using jemalloc with Python.

The Setup

First we need to install jemalloc. We got our bits from the jemalloc GitHub repo and followed the classic instructions:

# installing jemalloc
wget https://github.com/jemalloc/jemalloc/releases/download/5.1.0/jemalloc-5.1.0.tar.bz2
tar xvjf jemalloc-5.1.0.tar.bz2
cd jemalloc-5.1.0
./configure
make
sudo make install

Then we set up our Python app:

# running your app
LD_PRELOAD=/usr/local/lib/libjemalloc.so python your_app.py

We chose a few servers for a canary, deployed the change, and watched the graphs.

Using the vanilla Python malloc

Using jemalloc 3.5.1

Using jemalloc 5.0.1

Final numbers

As the RAM usage grew in the vanilla, we saw a 30% decrease in RAM usage with 3.5.1 and a 40% decrease in RAM usage with 5.0.1! With vanilla malloc, we now have an average peak of 38.8gb vs 26.6gb for 3.5.1 and 22.7gb for 5.0.1. Note: The dips in the graphs were from routine deployments or restarts.

Why was this so effective?

We're not 100% sure, but the current working theory on memory fragmentation seems plausible. jemalloc is known to reclaim memory effectively (which is why Redis uses it to great effect).

Do you have more context on why this may be the case? Please leave a comment and let us know, we'd love to learn more!

Get productivity tips delivered straight to your inbox

We’ll email you 1-3 times per week—and never share your information.

Related articles

Improve your productivity automatically. Use Zapier to get your apps working together.

Sign up
See how Zapier works
A Zap with the trigger 'When I get a new lead from Facebook,' and the action 'Notify my team in Slack'