• Home

  • Productivity

  • App tips

App tips

5 min read

What is Llama 2 and why does it matter?

By Harry Guinness · August 16, 2023
Hero image with the Meta logo

Llama 2 is Meta's open source large language model (LLM). It's basically the Facebook parent company's response to OpenAI's GPT models and Google's AI models like PaLM 2—but with one key difference: it's freely available for almost anyone to use for research and commercial purposes. 

That's a pretty big deal, and it could blow the whole AI space wide open. Let me explain. 

What is Llama 2?

Llama 2 is a family of LLMs like GPT-3 and PaLM 2. While there are some technical differences between it and other LLMs, you would really need to be deep into AI for them to mean much. All these LLMs were developed and work in essentially the exact same way; they all use the same transformer architecture and development ideas like pretraining and fine-tuning.

When you enter a text prompt or provide Llama 2 with text input in some other way, it attempts to predict the most plausible follow-on text using its neural network—a cascading algorithm with billions of variables (called "parameters") that's modeled after the human brain. By assigning different weights to all the different parameters, and throwing in a small bit of randomness, Llama 2 can generate incredibly human-like responses. 

How to try Llama 2

Llama 2 doesn't yet have a flashy, easy-to-use demo application like ChatGPT or Google Bard. For now, the best way to try it out is through Hugging Face, the platform that's become the go-to hub for open source AI models. Through Hugging Face, you can try out the following versions of Llama 2:

That "Chat" at the end indicates that they're using a fine-tuned version of each model called Llama-2-chat, which is optimized for chatbot-like dialogue—similar to how ChatGPT is a fine-tuned, chatbot-optimized version of GPT.

You'll also notice three different sizes: 7B with seven billion parameters, 13B with 13 billion parameters, and 70B with 70 billion parameters. While all are optimized for speed, the smaller sizes will run significantly quicker on lower spec hardware, even if they aren't quite as effective at generating plausible or accurate text.

Bear in mind that right out of the box, Llama 2 simply isn't as good as ChatGPT for many tasks, especially if you're using GPT-4. Both the foundation models and the fine-tuned chat models are designed to be further trained to meet your specific needs (more on that in a second). 

How does Llama 2 work?

To create its neural network, Llama 2 was trained with 2 trillion "tokens" from publicly available sources like Common Crawl (an archive of billions of webpages), Wikipedia, and public domain books from Project Gutenberg. Each token is a word or semantic fragment that allows the model to assign meaning to text and plausibly predict follow-on text. If the words "Apple" and "iPhone" consistently appear together, it's able to understand that the two concepts are related—and are distinct from "apple," "banana," and "fruit."

Of course, training an AI model on the open internet is a recipe for racism and other horrendous content, so the developers also employed other training strategies, including reinforcement learning with human feedback (RLHF), to optimize the model for safe and helpful responses. With RLHF, human testers rank different responses from the AI model to steer it toward generating more appropriate outputs. The chat versions were also fine-tuned with specific data to make them better at responding to dialogue in a natural way.

But even these models are just intended to be a base to build from. If you want to create an LLM to generate article summaries in your company's particular brand style or voice, you can train Llama 2 with dozens, hundreds, or even thousands of examples and create one that does just that. Similarly, you can further fine-tune one of the chat-optimized models to respond to your customer support requests by providing it with your FAQs and other relevant information like chat logs. 

Llama vs. GPT, Bard, and other AI models: How do they compare?

In the research paper describing how they developed Llama 2, the researchers compare its performance on various benchmarks (like the multi-task language understanding and TriviaQA reading comprehension dataset) to other open source and closed source models, including the big names like GPT-3.5, GPT-4, PaLM, and PaLM 2. In short, the 70B versions of Llama outperform the other open source LLMs and are generally as good as GPT-3.5 and PaLM on most benchmarks, but don't perform as well as GPT-4 or PaLM 2.

And that kind of tracks with my testing. I found Llama 2 was more likely to "hallucinate" or just make things up when given simple prompts. Though in support of what Meta set out to do, I couldn't trick it into saying anything egregious. 

ChatGPT describing Joanna Stern accurately
Llama describing Joanna Stern, including some inaccurate information
Joanna Stern has never worked for Wired, and calling her a YouTube personality is a big stretch—she hosts some Wall Street Journal videos on the platform. Wired has a YouTube series called Technique Critique, but it doesn't have a host. Everything else is also a bit suspect.  

Similarly, it was suitably silly when asked to write some poetry.

Llama writing a poem about pepperoni pizza

Though ChatGPT was a bit more creative.

ChatGPT writing a poem about pepperoni pizza

Of course, this isn't a fair test for Llama 2. It's not really trying to be a direct ChatGPT competitor—it's something a little different. 

Why Llama matters

Most of the LLMs you've heard of—OpenAI's GPT-3 and GPT 4, Google's PaLM and PaLM 2, Anthropic's Claude—are all closed source. Researchers and businesses can use the official APIs to access them and even fine-tune versions of their models so they give tailored responses, but they can't really get their hands dirty or understand what's going on inside.

With Llama 2, though, you can read the research paper detailing exactly how the model was created and trained. You can also download the model right now, and as long as you have the technical chops, get it running on your computer or even dig into its code. (Though be warned: the smallest version is more than 13 GB.)

And much more usefully, you can also get it running on Microsoft Azure, Amazon Web Services, and other cloud infrastructures through platforms like Hugging Face, where you can train it on your own data to generate the kind of text you need. Just be sure to check out Meta's guide to responsibly using Llama.

By being so open with Llama, Meta is making it significantly easier for other companies to develop AI-powered applications that they have more control over. The only real limits to the license are that companies with more than 700 million monthly users have to ask for special permission to use Llama, so it's off limits for the likes of Apple, Google, and Amazon.

And really, that's quite exciting. So many of the big developments in computing over the past 70 years have been built on top of open research and experimentation, and now AI looks set to be one of them. While Google and OpenAI are always going to be players in the space, they won't be able to build the kind of commercial moat or consumer lock-in that Google has in search and advertising. 

By letting Llama out into the world, there will likely always be a credible alternative to closed source AIs.

Related reading:

Get productivity tips delivered straight to your inbox

We’ll email you 1-3 times per week—and never share your information.


Related articles

Improve your productivity automatically. Use Zapier to get your apps working together.

Sign up
A Zap with the trigger 'When I get a new lead from Facebook,' and the action 'Notify my team in Slack'