Loading
Loading
  • Home

  • Productivity

  • App tips

App tips

12 min read

How does ChatGPT work?

Here's the human-written answer for how ChatGPT works.

By Harry Guinness · September 6, 2023
Hero image with the OpenAI logo

ChatGPT has been a household name for less than a year, but the algorithms working in the background of the popular AI tool have actually been powering a whole range of apps and services since 2020. So to understand how ChatGPT works, we need to start by talking about the underlying language engine that powers it.

The GPT in ChatGPT is mostly two related algorithms: GPT-3.5 Turbo and GPT-4, though the latter is only available in ChatGPT for ChatGPT Plus subscribers. The GPT bit stands for Generative Pre-trained Transformer, and the number is just the version of the algorithm. The GPT models were developed by OpenAI (the company behind ChatGPT and the image generator DALL·E 2), but they power everything from Bing's AI features to writing tools like Jasper and Copy.ai. In fact, most of the AI text generators available at the moment use GPT-3, GPT-3.5, and GPT-4—though they tend to keep quiet when they use each one.

ChatGPT brought GPT into the limelight because it made the process of interacting with an AI text generator simple and—most importantly—free to everyone. Plus, it's a chatbot, and people have loved a good chatbot since SmarterChild.

While GPT-3.5 and GPT-4 are the most popular large language models (LLMs) right now, over the next few years, there's likely to be a lot more competition. Google, for example, has Bard—its AI chatbot—which is powered by its own language engine, Pathways Language Model (PaLM 2). Meta, Facebook's parent company, released Llama 2, its latest LLM. And then there are other options aimed at large companies, like Writer's Palmyra LLMs and Anthropic's Claude. Still, at least for now, OpenAI's offering is the de facto industry standard. It's just the easiest tool for people to get their hands on.  

So the answer to "how does ChatGPT work?" is basically: GPT-3.5 and GPT-4. But let's dig a little deeper.

With Zapier, you can connect ChatGPT to thousands of other apps to bring AI into all your business-critical workflows.

What is ChatGPT?

ChatGPT is an app built by OpenAI. Using the GPT language models, it can answer your questions, write copy, draft emails, hold a conversation, explain code in different programming languages, translate natural language to code, and more—or at least try to—all based on the natural language prompts you feed it. It's a chatbot, but a really, really good one.

Examples, capabilities, and limitations of ChatGPT
Image from ChatGPT

While it's cool to play around with if, say, you want to write a Shakespearean sonnet about your pet or get a few ideas for subject lines for some marketing emails, it's also good for OpenAI. It's a way to get a lot of data from real users and serves as a fancy demo for the power of GPT, which could otherwise feel a little fuzzy unless you were deep into machine learning. (That data collection got ChatGPT blocked in Italy in early 2023, though the Italian regulators' concerns have now been resolved.)

Right now, ChatGPT offers two GPT models. The default, GPT-3.5, is less powerful but available to everyone for free. The more advanced GPT-4 is limited to ChatGPT Plus subscribers, and even they only get a limited number of questions every day. (It's 25 messages every three hours at present, but that could change.)

One of ChatGPT's big features is that it can remember the conversation you're having with it. This means it can glean context from whatever you've asked it previously and then use that to inform its conversation with you. You're also able to ask for reworks and corrections, and it will refer back to whatever you'd been discussing before. It makes interacting with the AI feel like a genuine back-and-forth. 

If you want to really get a feel for it, go and spend five minutes playing with ChatGPT now (it's free!), and then come back to read about how it works. 

How does ChatGPT work?

This humongous dataset was used to form a deep learning neural network [...] modeled after the human brain—which allowed ChatGPT to learn patterns and relationships in the text data [...] predicting what text should come next in any given sentence. 

ChatGPT works by attempting to understand your prompt and then spitting out strings of words that it predicts will best answer your question, based on the data it was trained on. While that might sound relatively simple, it belies the complexity of what's going on under the hood. 

Supervised vs. unsupervised learning

Let's actually talk about that training. The P in GPT stands for "pre-trained," and it's a super important part of why GPT is able to do what it can do. 

Before GPT, the best performing AI models used "supervised learning" to develop their underlying algorithms. They were trained with manually-labeled data, like a database with photos of different animals paired with a text description of each animal written by humans. These kinds of training data, while effective in some circumstances, are incredibly expensive to produce. Even now, there just isn't that much data suitably labeled and categorized to be used to train LLMs.

Instead, GPT employed generative pre-training, where it was given a few ground rules and then fed vast amounts of unlabeled data—near enough the entire open internet. It was then left "unsupervised" to crunch through all this data and develop its own understanding of the rules and relationships that govern text. 

Of course, you don't really know what you're going to get when you use unsupervised learning, so GPT is also "fine-tuned" to make its behavior more predictable and appropriate. There are a few ways this is done (which I'll get to), but it often uses forms of supervised learning. 

Transformer architecture

All this training is intended to create a deep learning neural network—a complex, many-layered, weighted algorithm modeled after the human brain—which allowed ChatGPT to learn patterns and relationships in the text data and tap into the ability to create human-like responses by predicting what text should come next in any given sentence. 

This network uses something called transformer architecture (the T in GPT) and was proposed in a research paper back in 2017. It's absolutely essential to the current boom in AI models. 

While it sounds—and is—complicated when you explain it, the transformer model fundamentally simplified how AI algorithms were designed. It allows for the computations to be parallelized (or done at the same time), which means significantly reduced training times. Not only did it make AI models better, but it made them quicker and cheaper to produce.

At the core of transformers is a process called "self-attention." Older recurrent neural networks (RNNs) read text from left-to-right. This is fine when related words and concepts are beside each other, but it makes things complicated when they're at opposite ends of the sentence. (It's also a slow way to compute things as it has to be done sequentially.)

Transformers, however, read every word in a sentence at once and compare each word to all the others. This allows them to direct their "attention" to the most relevant words, no matter where they are in the sentence. And it can be done in parallel on modern computing hardware. 

Of course, this is all vastly simplifying things. Transformers don't work with words, they work with "tokens," which are chunks of text encoded as a vector (a number with position and direction). The closer two token-vectors are in space, the more related they are. Similarly, attention is encoded as a vector, which allows transformer-based neural networks to remember important information from earlier in a paragraph. 

And that's before we even get into the underlying math of how this works. While it's beyond the scope of this article to get into it, Machine Learning Mastery has a few explainers that dive into the technical side of things.

Tokens

How text is understood by AI models is also important, so let's look a little deeper at tokens. GPT-3 was trained on roughly 500 billion tokens, which allows its language models to more easily assign meaning and predict plausible follow-on text by mapping them in vector-space. Many words map to single tokens, though longer or more complex words often break down into multiple tokens. On average, tokens are roughly four characters long. OpenAI has stayed quiet about the inner workings of GPT-4, but we can safely assume it was trained on much the same dataset since it's even more powerful.

Block of text broken down into GPT-3 tokens and characters.

All the tokens came from a massive corpus of data written by humans. That includes books, articles, and other documents across all different topics, styles, and genres—and an unbelievable amount of content scraped from the open internet. Basically, it was allowed to crunch through the sum total of human knowledge to develop the network it uses to generate text.

Based on all that training, GPT-3's neural network has 175 billion parameters or variables that allow it to take an input—your prompt—and then, based on the values and weightings it gives to the different parameters (and a small amount of randomness), outputs whatever it thinks best matches your request. OpenAI hasn't said how many parameters GPT-4 has, but it's a safe guess that it's more than 175 billion and less than the once-rumored 100 trillion parameters. Regardless of the exact number, more parameters doesn't automatically mean better. Some of GPT-4's increased power probably comes from having more parameters than GPT-3, but a lot is probably down to improvements in how it was trained.

Reinforcement learning from human feedback (RLHF)

Of course, GPT's initial neural network was entirely unsuitable for public release. It was trained on the open internet with almost no guidance, after all. So, to further refine ChatGPT's ability to respond to a variety of different prompts in a safe, sensible, and coherent way, it was optimized for dialogue with a technique called reinforcement learning with human feedback (RLHF). 

Essentially, OpenAI created some demonstration data that showed the neural network how it should respond in typical situations. From that, they created a reward model with comparison data (where two or more model responses were ranked by AI trainers), so the AI could learn which was the best response in any given situation. While not pure supervised learning, RLHF allows networks like GPT to be fine-tuned effectively.   

A graphic from ChatGPT showing how GPT-3 was trained
Image from ChatGPT

Natural language processing (NLP)

All this effort is intended to make GPT as effective as possible at natural language processing (NLP). NLP is a huge bucket category that encompasses many aspects of artificial intelligence, including speech recognition, machine translation, and chatbots, but it can be understood as the process through which Al is taught to understand the rules and syntax of language, programmed to develop complex algorithms to represent those rules, and then made to use those algorithms to carry out specific tasks.

Since I've covered the training and algorithm development side of things, let's look at how NLP enables GPT to carry out certain tasks—in particular, responding to user prompts. 

It's important to understand that for all this discussion of tokens, ChatGPT is generating text of what words, sentences, and even paragraphs or stanzas could follow. It's not the predictive text on your phone bluntly guessing the next word; it's attempting to create fully coherent responses to any prompt. This is what transformers bring to NLP.

In the end, the simplest way to imagine it is like one of those "finish the sentence" games you played as a kid.

In the end, the simplest way to imagine it is like one of those "finish the sentence" games you played as a kid. ChatGPT starts by taking your prompt, breaking it down into tokens, and then using its transformer-based neural network to try to understand what the most salient parts of it are, and what you are really asking it to do. From there, the neural network kicks into gear again and generates an appropriate output sequence of tokens, relying on what it learned from its training data and fine-tuning.

For example, when I gave ChatGPT using GPT-3 the prompt, "Zapier is…" it responded saying:

"Zapier is a web-based automation tool that allows users to connect different web applications together in order to automate repetitive tasks and improve workflows."

That's the kind of sentence you can find in hundreds of articles describing what Zapier does, so it makes sense that it's the kind of thing that it spits out here. But when my editor gave it the same prompt, it said:

"Zapier is a web-based automation tool that allows users to connect different web applications and automate workflows between them."

That's pretty similar, but it isn't exactly the same response. Asking "What is Zapier?", "What does Zapier do?", and "Describe Zapier" all get similar results too, presumably because they occupy similar positions in vector space. GPT understands that the most salient word here is Zapier, and that all the others are just asking for a short summary in slightly different ways.

That randomness (which you can control in some GPT apps with a setting called "temperature") ensures that ChatGPT isn't just responding to every single response with what amounts to a stock answer. It's running each prompt through the entire neural network each time, and rolling a couple of dice here and there to keep things fresh. Its understanding of natural language also allows it to parse the subtle differences between "What is Zapier?" and "What does Zapier do?" While fundamentally similar questions, you would expect the answer to be slightly different. Whatever way you ask things, ChatGPT is not likely to start claiming that Zapier is a color from Mars, but it will mix up the following words based on their relative likelihoods.  

(For what it's worth, when running on GPT-4, ChatGPT said: "Zapier is a web-based automation tool that allows users to integrate and automate tasks between various online applications and services." Much the same!)

Spitting out words, not knowledge

For all my talk of "understanding" and "comprehending," you have to remember that these are simply the most useful words we have for describing how AIs operate. GPT doesn't truly understand English, but it has a very detailed map of how many concepts relate to each other. The makers of ChatGPT are the first to say that it can produce incorrect (and possibly harmful) information, though they're working hard to fix it.

Here's a tame example. When I asked ChatGPT using GPT-3 and GPT-4 to reply to "Harry Guinness is…" GPT-3 first replied by asking which Harry Guinness. Great question—there are literally two of us! So I said, "Harry Guinness the writer" (that's me). GPT-4 guessed who I was straight away, though both responses were fascinating:

GPT-3: Harry Guinness is a freelance writer and journalist based in Ireland. He has written for a variety of publications, including The New York Times, The Guardian, The Huffington Post, and Popular Mechanics. He covers topics ranging from technology and photography to travel and culture.

GPT-4: Harry Guinness is an Irish writer, photographer, and technology expert. He has written for various publications, including How-To Geek, Lifehacker, and Tuts+. His work primarily focuses on technology, tutorials, and tips, as well as photography and image editing. Guinness has built a reputation for creating detailed guides that help readers understand complex subjects or solve problems in a straightforward manner.

GPT-3's first and last lines are pulled almost verbatim from my various websites and author bios around the web (although I normally list myself as a freelance writer and photographer, not a journalist). But the list of publications is basically made up. I've written for The New York Times, but not for The Guardian, The Huffington Post, or Popular Mechanics (I do write regularly for Popular Science, so that might be where that came from).

GPT-4 gets the photographer part right and actually lists some publications I've written for, which is impressive, though they're not the ones I'd be most proud of. It's a great example of how OpenAI has been able to increase the accuracy of GPT-4 relative to GPT-3, though it might not always offer the most correct answer. 

But let's go back to GPT-3 as its error provides an interesting example of what's going on behind the scenes in ChatGPT. It doesn't actually know anything about me. It's not even copy/pasting from the internet and trusting the source of the information. Instead, it's simply predicting a string of words that will come next based on the billions of data points it has.

For example: The New York Times is grouped far more often with The Guardian and The Huffington Post than it is with the places I've written for, like Wired, Outside, The Irish Times, and, of course, Zapier. So when it has to work out what should follow on from The New York Times, it doesn't pull from the published information about me; it pulls that list of large publications from all the training data it has (or really, considers where they're mapped in vector space). It's very clever and looks plausible, but it isn't true.

GPT-4 does a much better job and nails the publications, but the rest of what it says really just feels like plausible follow-on sentences. I don't think it has any great appreciation for my reputation: it's just saying the kind of thing a bio says. It's far better at hiding how it works than GPT-3, though it's actually using much the same technique.

Still, it's very impressive how much GPT has already improved. For now, GPT-4 is locked behind a premium subscription, so most ChatGPT content you see will rely on GPT-3, but that may change over the next while. Who knows what GPT-5 will bring.

What is the ChatGPT API?

OpenAI doesn't have a just-us attitude with its technology. The company has an API platform that allows developers to integrate the power of ChatGPT into their own apps and services (for a price, of course).

Zapier uses the ChatGPT API to power its own ChatGPT integration, which lets you connect ChatGPT to thousands of other apps and add AI to your business-critical workflows. Here are some examples to get you started, but you can trigger ChatGPT from basically any app.

Start a conversation with ChatGPT when a prompt is posted in a particular Slack channel

Start a conversation with ChatGPT when a prompt is posted in a particular Slack channel
  • Slack logo
  • ChatGPT logo
  • Slack logo
Slack + ChatGPT

You can also make use of OpenAI's other models—like DALL·E and Whisper—with Zapier's OpenAI integration. Automate workflows that involve image generation and audio transcription, straight from the apps you're already using.

Zapier is the leader in workflow automation—integrating with 6,000+ apps from partners like Google, Salesforce, and Microsoft. Use interfaces, data tables, and logic to build secure, automated systems for your business-critical workflows across your organization's technology stack. Learn more.

Related reading:

This article was originally published in February 2023. The most recent update was in September 2023.

Get productivity tips delivered straight to your inbox

We’ll email you 1-3 times per week—and never share your information.

tags
mentioned apps

Related articles

Improve your productivity automatically. Use Zapier to get your apps working together.

Sign up
A Zap with the trigger 'When I get a new lead from Facebook,' and the action 'Notify my team in Slack'