A blog about productivity, workflow best practices, company building and how to get things done with less work.

 

Archives for January 2012

If you are technically-inclined and have a Zapier account already, you may have noticed our current frontend is rendered using the standard HTTP request/response cycle. That is, when you access https://zapier.com your browser makes a single page request to our server and we send back a fully rendered HTML page.

Clicking on any link repeats the process. Django (our backend framework) handles this out of the box and allowed us to build a rapid frontend to get initial feedback and momentum.

Bryan had his backend systems work cut out for him and this left me with a little time to find a more elegant solution for our frontend. Backbone.js was recommended to me. Having never used Backbone.js before, I built a few simple models, collections, and views to wrap my head around it all.

Backbone.js is a very lightweight framework with few imposed standards. Understanding that there are several ways to approach Backbone.js applications will prevent a headache or two early on. The new frontend utilizing Backbone.js is in development but most of the architecture is complete.

Frontend Tech Stack

We are utilizing several open-source technologies for the new frontend in addition to Backbone.js.

  • Backbone.js – lightweight framework for handling "page" view and content, and execution
  • RequireJS – Code organization, dependency control, optimization
  • Mustache – HTML templating for views
  • CoffeeScript – Sanity while writing complex Backbone.js code
  • Sass – Sanity while writing complex style sheets

Getting each of these technologies to play nicely together required some long nights but the payout was worth it. The new frontend is fast. With Backbone.js, it no longer makes sense to think of webpages as "pages" but rather "views" since any given webpage might be built from several views. Each view is rendered entirely from a few lightweight JSON API calls. APIs are what we do, after all, so it made sense to treat the browser client like any other API client.

This is a perfect separation-of-concerns. Bryan can make any changes, additions, or subtractions to the API (utilizing API versioning) without breaking a thing and the frontend can be iterated completely independently of any server side logic. For a team of 3 the separation-of-concerns is already apparent. I imagine that in any scaled team it makes even more sense.

In future posts, I will be diving into each technology to highlight exactly how we leverage it and how we overcame integration challenges.

About the Author

Mike Knoop is a Co-founder at Zapier. He helps run product and love the color orange.

Let's start with our example. You want your web apps users to be able to do something. Excellent: so you write some code that does it for them. However, that something involves fetching the top 50 images of cats from a Google, Facebook and Twitter image search. This is gonna take a few seconds to do, at best.

Note: the code samples are for example purposes only as won't work out of the box, you'll need to customize them for your own needs.

So, first you might try:

1) Do it in the request.

This is business as usual. You already do lots of stuff in the request-response cycle like query the DB and render HTML, so it is a natural place to fetch more things that you want to return to the user. In Djangoland that looks something like this:

def main_view(request):
    results = download_lots_of_cat_images()
    return render_to_response('main_template.html', {'results': results})

But this has the side effect of making your user wait while you do stuff. Which isn't the worst thing in the world, but it certainly isn't good. They sit and stare at a blank screen while your server feverishly works away. Put yourself in their shoes, wouldn't that suck? They'll be wonder why it isn't loading. Did something go wrong? Did I click the wrong button? Is it broke?

Not good, we can improve on this.

2) Do it in a different request.

This is almost business as usual, except you split the work between two requests: one for the basic page wrapper and maybe a "loading" spinning gif and another, second request that does the compiling of results from your top 50 mashup:

def main_view(request):
    return render_to_response('main_template.html', {})

def ajax_view(request):
    results = download_lots_of_cat_images()
    return render_to_response('ajax_fragment.html', {'results': results})

The user would hit the main_view just like normal and get a rather dull page with your choice of spinning loader gif along with some javascript that fires off an AJAX request to the ajax_view. Once ajax_view returns, you just dump that template fragment containing all of your cat HTML into the users DOM.

This is better! It gives the user feedback, and while they are thinking "Oh, what is this? It's still loading. Gotcha." your server is churning away in the background collecting cat pictures "out of process" (sort of!). Any second they'll show up.

However, you are still tieing up valuable resources associated with Apache or Nginx because those live connections aren't really doing anything but waiting for the cat pictures to download. They are a limited commodity.

So, this is better (and may work for the majority of situations), but we can go further.

3) Do it with an async library, like Celery!

This takes the "out of process" idea even further by not tying up a request for very long at all. The basic idea is the same, except we need to introduce a few more tools outside of the HTTP server: a message queue.

The concept is simple: intensive processes are moved outside of the request process entirely. That means instead of a request taking many seconds (or worse, minutes!), you get a bunch of split second responses like: nope, cat pictures aren't ready, nope, still waiting... and finally, here are your cat pictures!

So, here is the code if it were using Celery (more on how later):

def main_view(request):
    task = download_lots_of_cat_images.delay()
    return render_to_response('main_template.html', {'task': task})

def ajax_view(request, task_id):
    results = download_lots_of_cat_images.AsyncResult(task_id) 
    if results.ready():
        return render_to_response('ajax_fragment.html', {'results': results.get()})
    return render_to_response('not_ready.html', {})

The downside is this is definitely more complicated, as your AJAX request have to include the original task id to check for, plus continue retrying if the not_ready.html content is return (or perhaps a 4xx HTTP code).

Plus you'll need to be running a backend like Redis or RabbitMQ. Not for the feint of heart (but a lot easier than you think).

But, if you have lots and lots volume and really long lived user initiated tasks, Celery or some sort of async library is a must-have if you ever expect it to grow. Otherwise you'll be battling timeouts, bottlenecks and other very difficult to solve problems.

How hard is it to write for Celery?

I'm glad you asked, below is my sample cat images function:

def download_lots_of_cat_images():
    results = []

    for url in big_old_list:
        results += requests.get(url)

    return results

Now we're going to do all the hard work of converting that over to Celery:

@task
def download_lots_of_cat_images():
    results = []

    for url in big_old_list:
        results += requests.get(url)

    return results

Did you catch that? I added a task decorator to the function. Now you call the function like download_lots_of_cat_images.delay() instead of download_lots_of_cat_images(). The only difference is the delay returns a AsyncResult and not the list of results you'd expect. However, it does it instantly and you can use that AsyncResult id to look up the results when they're ready. Isn't that neat?

Show me another trick!

Alright, let's imagine for some insane reason you want to combine waiting for requests with the power of async Celery! You mad scientist you!

Let's say we have a task like so for retrieving URLs from slow servers. For the sake of argument they take about 10 seconds each. It's simple:

@task
def get_slow_url(url):
    return requests.get(url)

Now, lets grab ten of them in a single request! That would take 100 seconds (10*10 for you non-maths) end to end, but, what if we could do all of them at once?

def main_view(request):
    asyncs = [get_slow_url.delay(url) for url in SOME_URL_LIST]
    results = [async.wait() for async in asyncs] # not really optimal, but oh well!
    return render_to_response('main_template.html', {'results': results})

And don't forget, you can call these tasks anywhere you like. With or without positional or keyword arguments.

How we use it at Zapier.

Zapier has many many thousands of tasks running at all hours of the night and day, so having a distributed task queue is very important. We use RabbitMQ and the celerybeat functionality to run these periodic tasks every couple of minutes or so (depending on what the user requires).

We make sure to only pass in the ID's for tasks to be ran that way if our works start backing up (IE: if they're slowed down for some reason) the data is always pulled automatically live from the DB. Some tasks fire off other tasks if needed, so it is easy to string them along to create the desired effect.

Wrapping up and getting started.

I highly recommend spending 20-40 minutes setting up Celery with Redis (unless you need something more "scalable" like RabbitMQ which does redundancy, replication and more). You'll need to make sure to launch a few things:

  1. Your backend (Redis or RabbitMQ recommended).
  2. celeryd (the worker that runs your tasks)
  3. celerybeat (if you want periodic tasks)
  4. celerycam (if you want to dump those tasks into the Django ORM)

Enjoy!

About the Author

Bryan Helmig is a co-founder and developer at Zapier, self-taught hacker, jazz/blues musician and fine beer and whiskey lover.

We at Zapier are all hackers. You can even see Wade, our resident customer dev guru, regularly committing to our GitHub repo. So its pretty important we use tools that we are all fairly familiar with, but also ones that are best suited for the task at hand. That's a lot of tradeoff, but one we've handled fairly well.

Language & framework.

Python/Django on the backend and JS/Backbone on the front.

Since we got started at Startup Weekend, we had to pick something we were familiar with so we could iterate fast. The key word here is familiarity, and Django was an obvious choice. Rails, Node.js would have all been great choices, but nearly none of us had built anything significant with that stack, so we stuck to something we knew well.

On the front, we started with straight JS, hacked together by Mike for a quick, live demo on stage. We knew this would need something more sophisticated, but it served its purpose. Right now we're working a much, much more robust frontend powered by Backbone.js which will supplant our current MVP.

Server stack.

Linode, nginx, Gunicorn, MySQL, Redis and RabbitMQ.

Right now we're hosting everything with Linode, which makes it easy for us to monitor and spin up. We thought about EC2, but again, the familiarity we had with Linode let us move a little faster.

We currently have three boxes on Linode:

  • Web: running nginx and gunicorn.
  • DB: running MySQL & redis.
  • Messaging: running RabbitMQ and Celery.

nginx was a great call, its a lot easier to set up and maintain over Apache (especially for non-PHP apps), and Gunicorn works brilliant in conjunction.

We debated on using PostgreSQL, but decided to go with MySQL, again, for familiarity's sake we need to just get up and going. Redis is another no brainer. Right now we're using it as a glorified memcached, but the opportunity to use it for other things made us chose it over memcached.

Finally, celery and RabbitMQ for every single scheduled background task (and there are looooots of them). Celery truly is a best in class piece of software, as is RabbitMQ. We took a bit of a gamble as both were products none of us had much experience with, but we know there are dozens of people out there using Rabbit and Celery in scaled environments, and task scheduling was our biggest "unknown". We're glad we made the call.

Honorable mentions.

Coffeescript, SASS, and many, many Python packages.

*Insert snarky comment about reading and writing Javascript all day long.* What can we say, Coffeescript is brilliant. We owe Mike's continued sanity to it.

SASS is also another wonderful invention. The easy nesting alone is worth the effort of installing and using it. We cannot recommend it enough. Reusable mixins make life wonderful.

Since we work with dozens of APIs, there are few things that have made my life better moreso than python-requests did. Its sane internal API made it easy for us to quickly iterate and add new API's in a jiffy (the current record is 13 minutes to integrate Hipchat).

Wrapping it up.

To think that all these first class tools are available to us for free is just... well, mind-blowing. We hope to eventually give back to the community through patches or other fixes we've done to various projects, and maybe even releasing some projects of our own.

Now, back to work...

About the Author

Bryan Helmig is a co-founder and developer at Zapier, self-taught hacker, jazz/blues musician and fine beer and whiskey lover.

As a fast growing fast-growing startup with limited personnel and resources it's absolutely essential we make the best possible use of our time as possible. And since we are building integrations for a number of web apps we've had the opportunity to test many different tools.

Here's our list of the apps we have found essential for running a small startup and team.

Office Tools - Google Apps

The Google suite of products is about the best bang for the buck you can get. We are constantly using Gmail for email, Google Docs for documents, spreadsheets, and ad hoc surveys and forms, and Google Calendar for keeping track of events and calls.

Email Marketing - Postmark & MailChimp

The life blood of any early stage startup is their launch list. Understandably making sure emails get delivered reliably is a big deal.

Postmark has been great for sending automated notification emails from our app and they do it for just a buck fifty for every thousand emails.

For bulk marketing messages it's hard to beat MailChimp. They have a great free plan that supports 2,000 subscribers and 12,000 sends a month. Why use MailChimp over Postmark for marketing messages? MailChimp come with a pretty solid analytics interface that lets you track and A/B test your campaigns.

Special Mention: since building your email launch list is so important, it makes sense to have a landing page to collect emails as soon as possible. We found Kickoff Labs to be an excellent tool for doing that. Plus Josh and Scott provide amazing customer support.

Hosting - Linode

Our application is built on Django so that narrows our choices for hosting. We choose Linode even though it is a little more expensive than something like WebFaction. Though for root access to your machine and the extra flexibility and security that users won't take us down - it's worth it. For a startup this is definitely not an area to skimp.

File Sharing - Dropbox

Dropbox is the no brainer here. It doesn't get much easier than sharing documents with Dropbox. We use Dropbox as a repository to store, share, review and edit documents that the company uses.

Payment Processing - Stripe

For any startup being able to collect money from your users is important. If you are developer, the Stripe API couldn't be easier to work with.

It makes it really simple to collect money from your users and send it to your bank account. Plus, you can easily generate subscription plans, one time payments, and just about any payment plan you can imagine.

Customer Interaction & Support - Olark

Most of the tools in this category aren't cheap, but for us it was worth having Olark on the site even before we had a product. Simply having it on the site has lead to more interaction with customers and early sales.

Not only has Olark made it easy to chat with prospects and customers it's also create the opportunity to talk with investors randomly visiting our site, vendors wanting integrations, and potential partners looking to expand business. Bottom line - Olark makes us money.

Social - Twitter

As primarily a B2B company we've found Twitter to be our users social network of choice. A significant portion of our customers hang out on Twitter to talk shop so it makes sense for us to have an active presence there.

For most startups it makes sense to go where your users are. So if that's Facebook, then go there.

Analytics - Google Analytics

One of the most important thing a startup needs is users and that means getting people to your site. Keeping track of that is important and the free and power packed Google Analytics makes it easy to keep track of traffic, where it's coming from, and gives you ideas for how to grow your site.

After all, if you aren't measuring you can't know for sure how to improve.

Integrations - Zapier

At Zapier we take our own medicine and use Zapier to integrate a lot of these apps, make all our processes better and to have a little fun too. Some of our favorite integrations are:

Those are the tools we find essential for running our startup. What tools are absolute must haves for you?

About the Author

Wade Foster is a Co-founder and CEO at Zapier. He likes to write about process, productivity, startups and how to do awesome work.

Get help