Introduction to Graph APIs

Brian Cooksey
Brian Cooksey / July 27, 2017

If you’ve consumed a web API in the last decade, there is a good chance it was a REST API. The data was likely organized around resources, responses included ids to related objects, and HTTP verbs were used to communicate reading, writing, and updating (yes, we know this is a loose definition, not Roy Fielding REST). RESTful API design has been the dominant standard in the industry for awhile.

REST, however, has its problems. A client might get stuck in a pattern of data over-fetching, requesting an entire resource to just get one or two pieces of information. Or the client may regularly need several objects at once, but can’t fetch it all in one request, known as data under-fetching. In terms of maintenance, changes to a REST API can mean clients need to update their integrations to follow the new API structure or response schemas.

To attempt to solve these issues, a new design has gained ground the last few years: Graph APIs.

What is a Graph API?

A simplistic definition of a Graph API is an API that models the data in terms of nodes and edges (objects and relationships) and allows the client to interact with multiple nodes in a single request. For example, imagine a server holds data on authors, blog posts, and comments. In a REST API, to get the author and comments for a particular blog post, the client might make three HTTP requests like /posts/123, /authors/455, /posts/123/comments.

In a graph API, the client formulates the call so data from all three resources is pulled in at once. The client is also able to specify the fields it cares about, giving more control over the response schema.

To explore how this works in detail, we’re going to look at a couple case studies from APIs in the wild.

Case Study 1: Facebook Graph API

Since releasing version 1.0 of its API in 2010, Facebook has used a design inspired from graph databases. There are nodes such as posts and comments, and edges that connect them such as comments “belonging to” a post. This approach allows the API to have the discoverability of a typical REST API, yet still provide clients a way to optimize data retrieval. Let’s use the example of a post and look at some basic operations.

To start off, a client fetches a post by doing a GET request off the root of the API with the post’s ID.

GET /<post-id>

By default, this returns most of the top level fields on the post. If the client only wants access to part of the post, say the caption and created time, it can use a query parameter to request only those fields:

GET /<post-id>?fields=caption,created_time

To fetch related data, the client queries an edge, such as the comments for the post:

GET /<post-id>/comments

So far, this looks a lot like how a REST API functions. Perhaps the ability to specify a subset of fields is new, but the organization of the data feels a lot like resources. Where things get interesting is when the client builds a nested query. Here’s another way a client could fetch the comments for the post:

GET /<post-id>?fields=caption,created_time,comments{id,message}

The above request returns a response that has the post’s caption, created time, and a list of comments (selecting only the id and message from each comment). This is not something you could do in REST. The client would have to first fetch the post, then fetch the comments.

What if the client wants to nest further?

GET /<post-id>?fields=caption,created_time,comments{id,message,from{id,name}}

This request fetches the post’s comments, including the id and name of the author for each comment. Consider the REST counterpart. The client would need to make a request for the post, a request for the comments, and then individual requests to fetch the author of each comment. That quickly becomes a lot of HTTP calls! With the graph design, however, all of that information is condensed into one call that fetches only the data the client needs.

A final aspect of graph design worth noting is that any object fetched from an edge is itself a root node, and can be queried directly. So, for example, to fetch more info about a particular comment:

GET /<comment-id>

Notice that the client did not have to build up a URL like /posts/<post-id>/comments/<comment-id> as a REST API might require. This can be helpful in situations where the client does not have immediate access to the parent object’s id.

This also comes into play when making changes to data. For updating and deleting objects, say a comment, PUT and DELETE requests are sent to the /<id> endpoint directly. For creating objects, a client can POST to the appropriate edge of a node. To add a comment to a post, for example, the client makes a POST request to the comments edge of the post:

POST /<post-id>/comments

message=This+is+a+comment

Case Study 2: GitHub V4 GraphQL API

Another contender in the graph API world is a specification known as GraphQL. This design is a big departure from REST, providing only one endpoint that accepts GET and POST requests. All interactions with the API involve sending queries following the GraphQL syntax.

In May of 2017, GitHub released version 4 of its API, which follows this spec. Let’s take a look at how to do some operations on repos to get a feel for GraphQL.

To fetch a repo, a client defines a GraphQL query:

POST /graphql

{
  "query": "repository(owner:\"zapier\", name:\"transformer\") {
    id
    description
  }"
}

This request fetches the ID and description of the “transformer” repo from the Zapier org. There are a bunch of things to note. First, we use POST to read data from the API, since we are sending a body in the request. Second, the query itself is a JSON payload, as prescribed by GraphQL. Third, the response will have the exact structure that our query specified, {"data": {"repository": {"id": "MDEwOlJlcG9zaXRvcnk1MDEzODA0MQ==", "description": "..."}}} (the root key data is another requirement of responses in GraphQL).

To fetch data related to the repo, say issues and their authors, a client uses a nested query:

POST /graphql

{
  "query": "repository(owner: \"zapier\", name: \"transformer\") {
    id
    description
    issues(last: 20, orderBy: {field: CREATED_AT, direction: DESC}) {
      nodes {
        title
        body
        author {
          login
        }
      }
    }
  }"
}

This request snags the repo ID and description, the title and body of the last 20 issues created on the repo, and the GitHub login (username) of the author of each issue. That’s a lot of info to grab in a single request. Imagine what the REST equivalent would look like and you can see where GraphQL offers a lot of power and flexibility for clients.

When it comes to updating data, GraphQL uses the concept of a “mutation.” Unlike REST, where updates happen by PUTing or POSTing a modified copy of the resource to the same endpoint the client retrieved it from, a mutation in GraphQL is an explicit operation that the API defines. If a client wants to tweak the data, it has to know what mutations the server supports. Handily, GraphQL offers a way to discover them through a process dubbed “Schema Introspection.”

Before we chat introspection, let’s first clarify the term “schema.” In GraphQL, an API defines a set of types that it uses to validate queries. So far in GitHub, we’ve used the repository type, the issue type, and the author type. Each type specifies the data that the type contains, and the relationships it has to other types. As a whole, these types make up the schema for the API.

With a detailed schema in-hand, one of the features GraphQL mandates is that a client be able to query that schema using GraphQL queries. This allows a client to learn the capabilities of an API through introspection.

For GitHub, a client curious about what mutations are possible can simply ask:

POST /graphql

{
  "query": "__type(name: \"Mutation\") {
    name
    kind
    description
    fields {
      name
      description
    }
  }"
}

From the response, one of the mutations listed is addStar, which allows a client to add a star to a repo (or any starrable object). To actually perform the mutation, a client uses a request like this:

POST /graphql
{
  "query": "mutation {
    addStar(input:{starrableId:\"MDEwOlJlcG9zaXRvcnk1MDEzODA0MQ==\"}) {
      starrable {
        viewerHasStarred
      }
    }
  }"
}

This request specifies that the client wants to perform the addStar mutation and provides the required arguments to perform the operation, which in this case is the ID of the repo. Note that the request prefixes the keyword mutation to the query. This is how GraphQL knows the client wants to do a mutation. In all the previous queries, the client could also prefix with the query keyword, but query is assumed if the operation type is not specified. A final thing to note is that the client has full control over the response data. In this request, the client asks for the viewerHasStarred field off the repo—something not particularly interesting in this scenario since the mutation adds a star so we know it will return true. However, if the client performed a mutation like creating an issue, the client could request to get back generated values like the ID or issue number, as well as nested data like the total open issue count of the repo.

APIs of the Future

Hopefully these case studies illustrate how API design is evolving in the SaaS industry. This is not to say that graph APIs are the future and REST is dead. Architectures like GraphQL come with their own set of challenges. What’s good is that the space is growing, providing more options so that the next time you find yourself needing to build an API, you can weigh the tradeoffs of each design and choose the solution that fits.


Load Comments...

Comments powered by Disqus