Unlock the hidden power of your apps.

Try Zapier Free!

Async Celery by Example: Why and How

Bryan Helmig
Bryan Helmig / January 30, 2012

Let's start with our example. You want your web apps users to be able to do something. Excellent: so you write some code that does it for them. However, that something involves fetching the top 50 images of cats from a Google, Facebook and Twitter image search. This is gonna take a few seconds to do, at best.

Note: the code samples are for example purposes only as won't work out of the box, you'll need to customize them for your own needs.

So, first you might try:

1) Do it in the request.

This is business as usual. You already do lots of stuff in the request-response cycle like query the DB and render HTML, so it is a natural place to fetch more things that you want to return to the user. In Djangoland that looks something like this:

def main_view(request):
    results = download_lots_of_cat_images()
    return render_to_response('main_template.html',
                              {'results': results})

But this has the side effect of making your user wait while you do stuff. Which isn't the worst thing in the world, but it certainly isn't good. They sit and stare at a blank screen while your server feverishly works away. Put yourself in their shoes, wouldn't that suck? They'll be wonder why it isn't loading. Did something go wrong? Did I click the wrong button? Is it broke?

Not good, we can improve on this.

2) Do it in a different request.

This is almost business as usual, except you split the work between two requests: one for the basic page wrapper and maybe a "loading" spinning gif and another, second request that does the compiling of results from your top 50 mashup:

def main_view(request):
    return render_to_response('main_template.html', {})

def ajax_view(request):
    results = download_lots_of_cat_images()
    return render_to_response('ajax_fragment.html',
                              {'results': results})

The user would hit the main_view just like normal and get a rather dull page with your choice of spinning loader gif along with some javascript that fires off an AJAX request to the ajax_view. Once ajax_view returns, you just dump that template fragment containing all of your cat HTML into the users DOM.

This is better! It gives the user feedback, and while they are thinking "Oh, what is this? It's still loading. Gotcha." your server is churning away in the background collecting cat pictures "out of process" (sort of!). Any second they'll show up.

However, you are still tieing up valuable resources associated with Apache or Nginx because those live connections aren't really doing anything but waiting for the cat pictures to download. They are a limited commodity.

So, this is better (and may work for the majority of situations), but we can go further.

3) Do it with an async library, like Celery!

This takes the "out of process" idea even further by not tying up a request for very long at all. The basic idea is the same, except we need to introduce a few more tools outside of the HTTP server: a message queue.

The concept is simple: intensive processes are moved outside of the request process entirely. That means instead of a request taking many seconds (or worse, minutes!), you get a bunch of split second responses like: nope, cat pictures aren't ready, nope, still waiting… and finally, here are your cat pictures!

So, here is the code if it were using Celery (more on how later):

def main_view(request):
    task = download_lots_of_cat_images.delay()
    return render_to_response('main_template.html',
                              {'task': task})

def ajax_view(request, task_id):
    results = download_lots_of_cat_images.AsyncResult(task_id) 
    if results.ready():
        return render_to_response('ajax_fragment.html',
                                  {'results': results.get()})
    return render_to_response('not_ready.html', {})

The downside is this is definitely more complicated, as your AJAX request have to include the original task id to check for, plus continue retrying if the not_ready.html content is return (or perhaps a 4xx HTTP code).

Plus you'll need to be running a backend like Redis or RabbitMQ. Not for the feint of heart (but a lot easier than you think).

But, if you have lots and lots volume and really long lived user initiated tasks, Celery or some sort of async library is a must-have if you ever expect it to grow. Otherwise you'll be battling timeouts, bottlenecks and other very difficult to solve problems.

How hard is it to write for Celery?

I'm glad you asked, below is my sample cat images function:

def download_lots_of_cat_images():
    results = []

    for url in big_old_list:
        results += requests.get(url)

    return results

Now we're going to do all the hard work of converting that over to Celery:

@task
def download_lots_of_cat_images():
    results = []

    for url in big_old_list:
        results += requests.get(url)

    return results

Did you catch that? I added a task decorator to the function. Now you call the function like download_lots_of_cat_images.delay() instead of download_lots_of_cat_images(). The only difference is the delay returns a AsyncResult and not the list of results you'd expect. However, it does it instantly and you can use that AsyncResult id to look up the results when they're ready. Isn't that neat?

Show me another trick!

Alright, let's imagine for some insane reason you want to combine waiting for requests with the power of async Celery! You mad scientist you!

Let's say we have a task like so for retrieving URLs from slow servers. For the sake of argument they take about 10 seconds each. It's simple:

@task
def get_slow_url(url):
    return requests.get(url)

Now, lets grab ten of them in a single request! That would take 100 seconds (10*10 for you non-maths) end to end, but, what if we could do all of them at once?

def main_view(request):
    asyncs = [get_slow_url.delay(url) for url in SOME_URL_LIST]
    # this isn't really optimal, but oh well!
    results = [async.wait() for async in asyncs]
    return render_to_response('main_template.html',
                              {'results': results})

And don't forget, you can call these tasks anywhere you like. With or without positional or keyword arguments.

How we use it at Zapier.

Zapier has many many thousands of tasks running at all hours of the night and day, so having a distributed task queue is very important. We use RabbitMQ and the celerybeat functionality to run these periodic tasks every couple of minutes or so (depending on what the user requires).

We make sure to only pass in the ID's for tasks to be ran that way if our works start backing up (IE: if they're slowed down for some reason) the data is always pulled automatically live from the DB. Some tasks fire off other tasks if needed, so it is easy to string them along to create the desired effect.

Wrapping up and getting started.

I highly recommend spending 20-40 minutes setting up Celery with Redis (unless you need something more "scalable" like RabbitMQ which does redundancy, replication and more). You'll need to make sure to launch a few things:

  1. Your backend (Redis or RabbitMQ recommended).
  2. celeryd (the worker that runs your tasks)
  3. celerybeat (if you want periodic tasks)
  4. celerycam (if you want to dump those tasks into the Django ORM)

Enjoy!

Photo of Alex Minchin

“Zapier is the extra team member at our agency linking our systems together and managing the push and pull of data.”

Alex Minchin, Managing Partner at Zest

Try Zapier Today
Wufoo, Google Sheets & Mailchimp

Build workflows with your apps.

Try Zapier Free

Connect apps. Automate tasks. Get more done.

Try Zapier Free
Load Comments...

Comments powered by Disqus

Workflow

Take the Work out of Workflow

Zapier is the easiest way to automate powerful workflows with more than 750 apps.