Loading
Loading
  • Home

  • Engineering

  • Engineering insights

Engineering insights

5 min read

Async Celery by Example: Why and How

By Bryan Helmig · January 31, 2012
A yellow rectangle with dotted lines running through it.

Let's start with our example. You want your web apps users to be able to do something. Excellent: so you write some code that does it for them. However, that something involves fetching the top 50 images of cats from a Google, Facebook and Twitter image search. This is gonna take a few seconds to do, at best.

Note: the code samples are for example purposes only as won't work out of the box, you'll need to customize them for your own needs.

So, first you might try:

1) Do it in the request.

This is business as usual. You already do lots of stuff in the request-response cycle like query the DB and render HTML, so it is a natural place to fetch more things that you want to return to the user. In Djangoland that looks something like this:

def main_view(request): results = download_lots_of_cat_images() return render_to_response('main_template.html', {'results': results})

But this has the side effect of making your user wait while you do stuff. Which isn't the worst thing in the world, but it certainly isn't good. They sit and stare at a blank screen while your server feverishly works away. Put yourself in their shoes, wouldn't that suck? They'll be wonder why it isn't loading. Did something go wrong? Did I click the wrong button? Is it broke?

Not good, we can improve on this.

2) Do it in a different request.

This is almost business as usual, except you split the work between two requests: one for the basic page wrapper and maybe a "loading" spinning gif and another, second request that does the compiling of results from your top 50 mashup:

def main_view(request): return render_to_response('main_template.html', {}) def ajax_view(request): results = download_lots_of_cat_images() return render_to_response('ajax_fragment.html', {'results': results})

The user would hit the main_view just like normal and get a rather dull page with your choice of spinning loader gif along with some javascript that fires off an AJAX request to the ajax_view. Once ajax_view returns, you just dump that template fragment containing all of your cat HTML into the users DOM.

This is better! It gives the user feedback, and while they are thinking "Oh, what is this? It's still loading. Gotcha." your server is churning away in the background collecting cat pictures "out of process" (sort of!). Any second they'll show up.

However, you are still tieing up valuable resources associated with Apache or Nginx because those live connections aren't really doing anything but waiting for the cat pictures to download. They are a limited commodity.

So, this is better (and may work for the majority of situations), but we can go further.

3) Do it with an async library, like Celery!

This takes the "out of process" idea even further by not tying up a request for very long at all. The basic idea is the same, except we need to introduce a few more tools outside of the HTTP server: a message queue.

The concept is simple: intensive processes are moved outside of the request process entirely. That means instead of a request taking many seconds (or worse, minutes!), you get a bunch of split second responses like: nope, cat pictures aren't ready, nope, still waiting... and finally, here are your cat pictures!

So, here is the code if it were using Celery (more on how later):

def main_view(request): task = download_lots_of_cat_images.delay() return render_to_response('main_template.html', {'task': task}) def ajax_view(request, task_id): results = download_lots_of_cat_images.AsyncResult(task_id) if results.ready(): return render_to_response('ajax_fragment.html', {'results': results.get()}) return render_to_response('not_ready.html', {})

The downside is this is definitely more complicated, as your AJAX request have to include the original task id to check for, plus continue retrying if the not_ready.html content is return (or perhaps a 4xx HTTP code).

Plus you'll need to be running a backend like Redis or RabbitMQ. Not for the feint of heart (but a lot easier than you think).

But, if you have lots and lots volume and really long lived user initiated tasks, Celery or some sort of async library is a must-have if you ever expect it to grow. Otherwise you'll be battling timeouts, bottlenecks and other very difficult to solve problems.

How hard is it to write for Celery?

I'm glad you asked, below is my sample cat images function:

def download_lots_of_cat_images(): results = [] for url in big_old_list: results += requests.get(url) return results

Now we're going to do all the hard work of converting that over to Celery:

@task def download_lots_of_cat_images(): results = [] for url in big_old_list: results += requests.get(url) return results

Did you catch that? I added a task decorator to the function. Now you call the function like download_lots_of_cat_images.delay() instead of download_lots_of_cat_images(). The only difference is the delay returns a AsyncResult and not the list of results you'd expect. However, it does it instantly and you can use that AsyncResult id to look up the results when they're ready. Isn't that neat?

Show me another trick!

Alright, let's imagine for some insane reason you want to combine waiting for requests with the power of async Celery! You mad scientist you!

Let's say we have a task like so for retrieving URLs from slow servers. For the sake of argument they take about 10 seconds each. It's simple:

@task def get_slow_url(url): return requests.get(url)

Now, lets grab ten of them in a single request! That would take 100 seconds (10*10 for you non-maths) end to end, but, what if we could do all of them at once?

def main_view(request): asyncs = [get_slow_url.delay(url) for url in SOME_URL_LIST] # this isn't really optimal, but oh well! results = [async_item.wait() for async_item in asyncs] return render_to_response('main_template.html', {'results': results})

And don't forget, you can call these tasks anywhere you like. With or without positional or keyword arguments.

How we use it at Zapier.

Zapier has many many thousands of tasks running at all hours of the night and day, so having a distributed task queue is very important. We use RabbitMQ and the celerybeat functionality to run these periodic tasks every couple of minutes or so (depending on what the user requires).

We make sure to only pass in the ID's for tasks to be ran that way if our works start backing up (IE: if they're slowed down for some reason) the data is always pulled automatically live from the DB. Some tasks fire off other tasks if needed, so it is easy to string them along to create the desired effect.

Wrapping up and getting started.

I highly recommend spending 20-40 minutes setting up Celery with Redis (unless you need something more "scalable" like RabbitMQ which does redundancy, replication and more). You'll need to make sure to launch a few things:

  1. Your backend (Redis or RabbitMQ recommended).

  2. celeryd (the worker that runs your tasks)

  3. celerybeat (if you want periodic tasks)

  4. celerycam (if you want to dump those tasks into the Django ORM)

Enjoy!

Get productivity tips delivered straight to your inbox

We’ll email you 1-3 times per week—and never share your information.

Related articles

Improve your productivity automatically. Use Zapier to get your apps working together.

Sign up
A Zap with the trigger 'When I get a new lead from Facebook,' and the action 'Notify my team in Slack'