• Home

  • Productivity

  • App tips

App tips

13 min read

The best large language models (LLMs) in 2024

These are the most significant, interesting, and popular LLMs you can use right now.

By Harry Guinness · August 5, 2024
Hero image with an icon representing an AI agent

Large language models (LLMs) are the main kind of text-handling AIs, and they're popping up everywhere. ChatGPT is the most famous tool that openly uses an LLM, but Google uses one to generate AI answers in Search, and Apple is launching the LLM-powered Apple Intelligence on its devices later this year. And that's before you consider any of the other chatbots, text generators, and other tools built on top of LLMs.

LLMs have been simmering in research labs since the late 2010s, but after the release of ChatGPT (which showcased the power of GPT), they've burst out of the lab and into the real world.

We're now into the third and fourth generation of LLMs, and with that, they're increasingly useful and powerful. We also have the first generations of large multimodal models (LMMs), which are able to handle other input and output modalities, like images, audio, and video, as well as text—which complicates things even more. So here, I'll break down some of the most important LLMs and LMMs on the scene right now.

  • The best LLMs

  • What is an LLM?

  • What is an open source LLM?

  • How do LLMs work?

  • What can LLMs be used for?

  • Why are there so many LLMs?

  • What to expect from LLMs in the future

The best LLMs in 2024

There are dozens of major LLMs, and hundreds that are arguably significant for some reason or other. Listing them all would be nearly impossible, and in any case, it would be out of date within days because of how quickly LLMs are being developed. (I'm updating this list for the second time this year, and there are plenty of new versions of multiple models to talk about, as well as a few to add.)

Take the word "best" with a grain of salt here: I've tried to narrow things down by offering a list of the most significant, interesting, and popular LLMs (and LMMs), not necessarily the ones that outperform on benchmarks (though most of these do). I've also mostly focused on LLMs that you can use—rather than ones that are the subjects of super interesting research papers or just teased in marketing materials—since we like to keep things practical around here.

Click on any app in the list below to learn more about it.

LLM

Developer

Multimodal?

Access

GPT

OpenAI

Yes

Chatbot and API

Gemini

Google

Yes

Chatbot and API

Gemma

Google

No

Open

Llama

Meta

No

Chatbot and open

Claude

Anthropic

Yes

Chatbot and API

Command

Cohere

No

API

Falcon

Technology Innovation Institute

No

Open

DBRX

Databricks and Mosaic

No

Open

Mixtral 8x7B and 8x22B

Mistral AI

No

Open source

Phi-3

Microsoft

No

Open

Grok

xAI

No

Chatbot and open


What is an LLM?

An LLM, or large language model, is a general-purpose AI text generator. It's what's behind the scenes of all AI chatbots, AI writing generators, and most other AI-powered features like summarized search answers.

LLMs are supercharged auto-complete. Stripped of fancy interfaces and other workarounds, what they do is take a prompt and generate an answer using a string of plausible follow-on text. The chatbots built on top of LLMs aren't looking for keywords so they can answer with a canned response—instead, they're doing their best to understand what's being asked and reply appropriately.

This is why LLMs have really taken off: the same models (with or without a bit of extra training) can be used to respond to customer queries, write marketing materials, summarize meeting notes, and do a whole lot more.

But LLMs can only work with text, which is why LMMs are starting to crop up: they can incorporate images, handwritten notes, audio, video, and more. While not as readily available as LLMs, they have the potential to offer a lot more real-world functionality.

What is an open source LLM?

There are three major categories of LLM: proprietary, open, and open source.

Proprietary models like GPT-4o and Claude 3.5 are some of the most popular and powerful models available, but they're developed and operated by private companies. The source code, training strategies, model weights, and even details like the number of parameters they have are all kept secret. The only ways to access these models are through a chatbot or app built with them, or through an API. You can't just run GPT-4o on your own server.

Open and open source models are more freely available. You can download Lllama 3 and Gemma 2 from Hugging Face and other model platforms and run them on your own devices—and even re-train them with your own data to create your own model. Developers can build their own chatbots and apps on top of them. You can even dig deep into things like the model weights and system architecture to understand how they work (as best as anyone can).

So what's the difference between open and open source? Well, companies like Meta and Google say that Llama 3 and Gemma 2 are open as thought it's the same as open source, but there is a major distinction. 

Open source licenses are incredibly permissive. Mostly, you have to agree to make anything you build with it open source as well—and give attribution to the original developers. If you want to build a multi-billion dollar company off open source software or create a crime chatbot that tells people how to get away with heists, you're absolutely free to do so. The police might have some issues with the latter project, but you wouldn't be breaking any software licenses. 

Open licenses are still permissive, but they have some additional limits. For example, Llama 3's license allows commercial use up to 700 million monthly users and blocks certain uses. You or I could build something with it, but Apple and Google can't. Similarly, Gemma 2's prohibited use policy, among other things, bans "facilitating or encouraging users to commit any type of crimes." Understandably, Google doesn't want to see unsavory bots "powered by Google Gemma" plastered all over the news.

How do LLMs work?

Early LLMs, like GPT-1, would fall apart and start to generate nonsense after a few sentences, but today's LLMs, like GPT-4, can generate thousands of words that all make sense.

To get to this point, LLMs were trained on huge corpuses of data. The specifics vary a little bit between the different LLMs—depending on how careful the developers are to fully acquire the rights to the materials they're using—but as a general rule, you can assume that they've been trained on something like the entire public internet and every major book that's ever been published at a bare minimum. This is why LLMs can generate text that sounds so authoritative on such a wide variety of subjects.

From this training data, LLMs are able to model the relationship between different words (or really, fractions of words called tokens) using high-dimensional vectors. This is all where things get very complicated and mathy, but the basics are that every individual token ends up with a unique ID and that similar concepts are grouped together. This is then used to generate a neural network, a kind of multi-layered algorithm based on how the human brain works—and that's at the core of every LLM. 

The neural network has an input layer, an output layer, and multiple hidden layers, each with multiple nodes. It's these nodes that compute what words should follow on from the input, and different nodes have different weights. For example, if the input string contains the word "Apple," the neural network will have to decide to follow up with something like "Mac" or "iPad," something like "pie" or "crumble," or something else entirely. When we talk about how many parameters an LLM has, we're basically comparing how many layers and nodes there are in the underlying neural network. In general, the more nodes, the more complex the text a model is able to understand and generate.

LMMs are even more complex because they also have to incorporate data from additional modalities, but they're typically trained and structured in much the same way.

Of course, an AI model trained on the open internet with little to no direction sounds like the stuff of nightmares. And it probably wouldn't be very useful either, so at this point, LLMs undergo further training and fine-tuning to guide them toward generating safe and useful responses. One of the major ways this works is by adjusting the weights of the inputs and outputs of different nodes, though there are other aspects of it too.

Infographic showing how natural language processing works

All this is to say that while LLMs are black boxes, what's going on inside them isn't magic. Once you understand a little about how they work, it's easy to see why they're so good at answering certain kinds of questions. It's also easy to understand why they tend to make up (or hallucinate) random things.

What can LLMs be used for?

LLMs are powerful mostly because they're able to be generalized to so many different situations and uses. The same core LLM (sometimes with a bit of fine-tuning) can be used to do dozens of different tasks. While everything they do is based around generating text, the specific ways they're prompted to do it changes what features they appear to have. 

Here are some of the tasks LLMs are commonly used for:

  • General-purpose chatbots (like ChatGPT and Google Gemini)

  • Summarizing search results and other information from around the web

  • Customer service chatbots that are trained on your business's docs and data

  • Translating text from one language to another

  • Converting text into computer code, or one language into another

  • Generating social media posts, blog posts, and other marketing copy

  • Sentiment analysis

  • Moderating content

  • Correcting and editing writing

  • Data analysis

And hundreds of other things. We're only in the early days of the current AI revolution.

But there are also plenty of things that LLMs can't do, but that other kinds of AI models can. A few examples:

  • Interpret images

  • Generate images

  • Convert files between different formats

  • Create charts and graphs

  • Perform math and other logical operations

Of course, some LLMs and chatbots appear to do some of these things. But in most cases, there's another AI service stepping in to assist—or you're actually using an LMM.

One last thing before diving in: I've mentioned when big-name companies use these models, but a lot of AI-powered apps don't list what LLMs they rely on. Some we can guess at or it's clear from their marketing materials, but for lots of them, we just don't know.

With all that context, let's move on to the LLMs themselves.

The best LLMs in 2024

GPT

OpenAI Playground with a modified system prompt.
  • Developer: OpenAI

  • Parameters: More than 175 billion

  • Context window: 128,000

  • Access: API

OpenAI's Generative Pre-trained Transformer (GPT) models kickstarted the latest AI hype cycle. There are two main models currently available: GPT-4o and GPT-4o mini. Both are also multimodal models, so they can also handle images and audio. All the different versions of GPT are general-purpose AI models with an API, and they're used by a diverse range of companies—including Microsoft, Duolingo, Stripe, Descript, Dropbox, and Zapier—to power countless different tools. Still, ChatGPT is probably the most popular demo of its powers.

You can also connect Zapier to GPT or ChatGPT, so you can use GPT straight from the other apps in your tech stack. Here's more about how to automate ChatGPT, or you can get started with one of these pre-made workflows.

Generate conversations in ChatGPT with new emails in Gmail

Generate conversations in ChatGPT with new emails in Gmail
  • Gmail logo
  • ChatGPT logo
Gmail + ChatGPT

Create ChatGPT conversations from new tl;dv transcripts

Create ChatGPT conversations from new tl;dv transcripts
  • tl;dv logo
  • ChatGPT logo
tl;dv + ChatGPT

Create ChatGPT conversations from new Microsoft Outlook emails

Create ChatGPT conversations from new Microsoft Outlook emails
  • Microsoft Outlook logo
  • ChatGPT logo
Microsoft Outlook + ChatGPT

Gemini 

  • Developer: Google

  • Parameters: Nano available in 1.8 billion and 3.25 billion versions; others unknown

  • Context window: Up to 2 million

  • Access: API

Google Gemini is a family of AI models from Google. The four models—Gemini 1.0 Nano, Gemini 1.5 Flash, Gemini 1.5 Pro, and Gemini 1.0 Ultra—are designed to operate on different devices, from smartphones to dedicated servers. While capable of generating text like an LLM, the Gemini models are also natively able to handle images, audio, video, code, and other kinds of information. They're optimized for a long context window, which means they can process larger volumes of text.

Gemini 1.5 Pro and 1.5 Flash also power AI features throughout Google's apps, like Docs and Gmail, as well as Google's chatbot, which is confusingly also called Gemini (formerly Bard). Gemini 1.5 Pro and 1.5 Flash are available to developers through Google AI Studio or Vertex AI, and Gemini Nano and Ultra are due out later in 2024.

With Zapier's Google Vertex AI and Google AI Studio integrations, you can access Gemini from all the apps you use at work. Here are a few examples to get you started, or you can learn more about how to automate Google AI Studio.

Send prompts to Google Vertex AI from Google Sheets and save the responses

Send prompts to Google Vertex AI from Google Sheets and save the responses
  • Google Sheets logo
  • Google Vertex AI logo
  • Google Sheets logo
Google Sheets + Google Vertex AI

Create a Slack assistant with Google Vertex AI

Create a Slack assistant with Google Vertex AI
  • Slack logo
  • Google Vertex AI logo
  • Slack logo
Slack + Google Vertex AI

Label incoming emails automatically with Google AI Studio (Gemini)

Label incoming emails automatically with Google AI Studio (Gemini)
  • Gmail logo
  • Google AI Studio (Gemini) logo
  • Gmail logo
Gmail + Google AI Studio (Gemini)

Send prompts in Google AI Studio (Gemini) for new or updated rows in Google Sheets

Send prompts in Google AI Studio (Gemini) for new or updated rows in Google Sheets
  • Google Sheets logo
  • Google AI Studio (Gemini) logo
Google Sheets + Google AI Studio (Gemini)

Gemma

  • Developer: Google

  • Parameters: 2 billion, 9 billion, and 27 billion

  • Context window: 8,200

  • Access: Open

Google Gemma is a family of open AI models from Google based on the same research and technology it used to develop Gemini. The latest version, Gemma 2, is available in three sizes: 2 billion, 9 billion, and 27 billion parameters.

Llama

Using Llama 2 with Llama Chat
  • Developer: Meta

  • Parameters: 8 billion, 70 billion, and 405 billion

  • Context window: 128,000

  • Access: Open

Llama 3.1 is a family of open LLMs from Meta, the parent company of Facebook and Instagram. In addition to powering most AI features throughout Meta's apps, it's one of the most popular and powerful open LLMs, and you can download the source code yourself from GitHub. Because it's free for research and commercial uses, a lot of other LLMs use Llama 3.1 (or a previous version of Llama) as a base.

There are 8 billion, 70 billion, and 405 billion parameter versions available. Meta's previous model family, Llama 2, is still available in 7 billion, 13 billion, and 70 billion parameter versions, though they're far less powerful.

Claude

Claude, the best AI chatbot for creating chatbots with Artifacts
  • Developer: Anthropic

  • Parameters: Unknown 

  • Context window: 200,000

  • Access: API

Claude is arguably one of the most important competitors to GPT. Its three models—Claude 3 Haiku, Claude 3.5 Sonnet, and Claude 3 Opus—are designed to be helpful, honest, harmless, and crucially, safe for enterprise customers to use. As a result, companies like Slack, Notion, and Zoom have all partnered with Anthropic.

Like all the other proprietary LLMs, Claude is only available as an API, though it can be further trained on your data and fine-tuned to respond how you need. You can also connect Claude to Zapier so you can automate Claude from all your other apps. Here are some pre-made workflows to get you started.

Write AI-generated email responses with Claude and store in Gmail

Write AI-generated email responses with Claude and store in Gmail
  • Gmail logo
  • Anthropic (Claude) logo
  • Gmail logo
Gmail + Anthropic (Claude)

Create AI-generated posts in WordPress with Claude

Create AI-generated posts in WordPress with Claude
  • Google Sheets logo
  • Anthropic (Claude) logo
  • WordPress logo
Google Sheets + Anthropic (Claude) + WordPress

Generate an AI-analysis of Google Form responses and store in Google Sheets

Generate an AI-analysis of Google Form responses and store in Google Sheets
  • Google Forms logo
  • Anthropic (Claude) logo
  • Google Sheets logo
Google Forms + Anthropic (Claude) + Google Sheets

Command

  • Developer: Cohere 

  • Parameters: Unknown

  • Context window: Up to 128,000

  • Access: API

Like Claude 3, Cohere's Command models are designed for enterprise users. Command R and Command R+ offer an API and are optimized for retrieval augmented generation (RAG) so that organizations can have the model respond accurately to specific queries from employees and customers. 

As a result, companies like Oracle, Accenture, Notion, and Salesforce use Cohere's models.

Falcon 

  • Developer: Technology Innovation Institute

  • Parameters: 11 billion

  • Context window: 8,000

  • Access: Open

Falcon is a family of open LLMs that have consistently performed well in the various AI benchmarks. The latest version, Falcon 2, has 11 billion parameters and performs similarly to other small open models like Llama 3 8B and Gemma 7B.  It's released under a permissive Apache 2.0 license, so it's suitable for commercial and research use.

DBRX

  • Developer: Databricks and Mosaic

  • Parameters: 132 billion

  • Context window: 32k

  • Access: Open 

Databricks' DBRX LLM is the successor to Mosaic's MPT-7B and MPT-30B LLMs. It's one of the most powerful open LLMs. Interestingly, it's not built on top of Meta's Llama model, unlike a lot of other open models. 

DBRX surpasses or equals previous generation closed LLMs like GPT-3.5 on most benchmarks.

Mixtral 8x7B and 8x22B

  • Developer: Mistral

  • Parameters: 45 billion and 141 billion

  • Context window: Up to 64,000

  • Access: Open source

Mistral's Mixtral 8x7B and 8x22B models use a series of sub-systems to efficiently outperform larger models. Despite having significantly fewer parameters (and thus being capable of running faster or on less powerful hardware), they're able to beat other models like Llama 2 and GPT-3.5 in some benchmarks. They're also released under an Apache 2.0 license.

Mistral has also released more direct GPT competitors called Mistral Large 2 and Mistral NeMo, a 12 billion parameter model developed in collaboration with NVIDIA.

Phi-3

  • Developer: Microsoft

  • Parameters: 3.8 billion, 7 billion, and 14 billion

  • Context window: Up to 128,000

  • Access: Open 

Microsoft's Phi-3 family of small language models are optimized for performance at small size. The 3.8 billion parameter Mini, 7 billion parameter Small, and 14 billion parameter Medium all out perform larger models on language tasks. 

The models are available through Azure AI Studio, Hugging Face, and other open model platforms.

Grok

  • Developer: xAI

  • Parameters: Unknown

  • Context window: 128,000

  • Access: Chatbot and open

Grok, an AI model and chatbot trained on data from X (formerly Twitter), doesn't really warrant a place on this list on its own merits as it's not very popular nor is it dramatically better than any other model available. Still, I'm listing it here because it was developed by xAI, the AI company founded by Elon Musk. While it might not be making waves in the AI scene, it's still getting plenty of media coverage, so it's worth knowing it exists.

Why are there so many LLMs?

Until a year or two back, LLMs were limited to research labs and tech demos at AI conferences. Now, they're powering countless apps and chatbots, and there are hundreds of different models available that you can run yourself (if you have the computer skills). How did we get here?

Well, there are a few factors in play. Some of the big ones are:

  • With GPT-3 and ChatGPT, OpenAI demonstrated that AI research had reached the point where it could be used to build practical tools—so lots of other companies started doing the same. 

  • LLMs take a lot of computing power to train, but it can be done in a matter of weeks or months.

  • There are lots of open models that can be re-trained or adapted into new models without the need to develop a whole new model.

  • There's a lot of money being thrown at AI companies, so there are big incentives for anyone with the skills and knowledge to develop any kind of LLM to do so.

What to expect from LLMs in the future

I think we're going to see a lot more LLMs in the near future, especially from major tech companies. Apple, Amazon, IBM, Intel, and NVIDIA all have LLMs under development, in testing, or available for customers to use. They're not as buzzy as the models I listed above, nor are regular people ever likely to use them directly, but I think it's reasonable to expect large enterprises to start deploying them widely, both internally and for things like customer support.

I also think we're going to see a lot more efficient LLMs tailored to run on smartphones and other lightweight devices. Google has already hinted at this with Gemini Nano, which runs some features on the Google Pixel Pro 8, and Apple Intelligence is due on Apple devices later this year. There has also been a lot more attention given to smaller models that are able to outperform their size, like Mistral's Mixtral 8x22B.

The other big thing that's coming is large multimodal models or LMMs. These combine text generation with other modalities, like images and audio, so you can ask a chatbot what's going on in an image or have it respond with audio. GPT-4o and Google's Gemini models are two of the first LMMs that are widely deployed, though their full powers are still rolling out—we're definitely going to see more.

Other than that, who can tell? Three years ago, I definitely didn't think we'd have powerful AIs like ChatGPT available for free. Maybe in a few years, we'll have artificial general intelligence (AGI).

Related reading:

  • The best AI productivity tools

  • The best AI courses for beginners

  • Claude vs. ChatGPT: What's the difference?

This article was originally published in January 2024. The most recent update was in August 2024.

Get productivity tips delivered straight to your inbox

We’ll email you 1-3 times per week—and never share your information.

tags

Related articles

Improve your productivity automatically. Use Zapier to get your apps working together.

Sign up
See how Zapier works
A Zap with the trigger 'When I get a new lead from Facebook,' and the action 'Notify my team in Slack'