As impressive as today's AI chatbots are, interacting with them might not leave you with an I, Robot level of existential sci-fi dread (yet).

But according to Dario Amodei, the CEO of Anthropic, an AI research company, there's a real risk that AI models become too autonomous—especially as they start accessing the internet and controlling robots. Hundreds of other AI leaders and scientists have also acknowledged the existential risk posed by AI.

To help address this risk, Anthropic did something counterintuitive: they decided to develop a safer large language model (LLM) on their own. Claude 2, an earlier version of Anthropic's model, was hailed as a potential "ChatGPT killer." Since its release, progress has happened fast—Anthropic's latest update to its LLM, known as Claude 3, now surpasses ChatGPT-4 on a range of benchmarks.

In this article, I'll outline Claude's capabilities, show how it stacks up against other AI models, and explain how you can try it for yourself.

What is Claude?

Claude is an AI chatbot powered by Anthropic's LLM, Claude 3.

If you've used ChatGPT or Google Gemini, you know what to expect when launching Claude: a powerful, flexible chatbot that collaborates with you, writes for you, and answers your questions.

Anthropic, the company behind Claude, was started in 2021 by a group of ex-OpenAI employees who helped develop OpenAI's GPT-2 and GPT-3 models. It's focused on AI research with a focus on safety above all.

After running a closed alpha with a few commercial partners in early 2023, Claude's model was integrated into products like Notion AI, Quora's Poe, and DuckDuckGo's DuckAssist. In March 2023, Claude opened up its API to a wider set of businesses before releasing its chatbot to the public in July 2023, in conjunction with the release of the Claude 2 model.

While Claude 2 lagged behind OpenAI's GPT-4, Anthropic's latest model—Claude 3, released in March 2024—now beats GPT-4 across a range of capabilities.

A chart showing Claude 3's capabilities compared to GPT-4

Claude 3 also features what Anthropic terms "vision capabilities": it can interpret photos, charts, and diagrams in a variety of formats. This is perfect for enterprise customers looking to extract insights from PDFs and presentations, but even casual users like me will get a kick out of seeing Claude interact with images.

For example, check out Claude's flawless analysis of this photo of a breakfast spread by a pond.

The Claude 3 model family

LLMs take up a staggering amount of computing resources. Because more powerful models are more expensive, Anthropic has released multiple Claude 3 models—Haiku, Sonnet, and Opus—each optimized for a different purpose.

Haiku

At just $0.25 per million tokens, Haiku is 98% cheaper than the most powerful Claude model. It also boasts nearly instant response times, which is crucial if you're using Claude to power your customer support chats. If you're manipulating large quantities of data, translating documents, or moderating content, this is the model you want.

Sonnet

Sonnet is Claude's second-most powerful model, and it powers the free version of Claude's chatbot. A good "workhorse" model that's appropriate for most use cases, Sonnet is designed for tasks like target marketing, data processing, task automation, and coding. Sonnet offers higher levels of intelligence than Haiku—and at $3 per million tokens, it's still 80% cheaper than Opus.

Opus

With a price of $15 per million tokens, Opus is a resource-intensive model. Based on Anthropic's testing, it's more intelligent than every competing AI model and can apply human-like understanding and creative solutions to a variety of scenarios. Because the cost of using Opus can quickly add up, it's best reserved for complex tasks like financial modeling, drug discovery, research and development, and strategic analysis.

How to try Claude for yourself

Claude's initial beta release restricted access to users in the U.S. and U.K. But with the release of Claude 3, users from dozens of countries can now access Claude.

For access, sign up at Claude.ai. From there, you can start a conversation or use one of Claude's default prompts to get started. As a free user, you'll get access to Claude 3 Sonnet, Anthropic's second-most powerful model. Upgrading to Claude Pro gives you access to Opus, the most powerful model; you also get priority access even during times of high traffic.

How is Claude different from other AI models?

An infographic showing the names of OpenAI's, Google's, Anthropic's, and Meta's LLMs and chatbots

All AI models are prone to some degree of bias and inaccuracy. Hallucinations are a frequent occurrence: when an AI model doesn't know the answer, it often prefers to invent something and present it as fact rather than say "I don't know." (In that respect, AI may have more in common with humans than we think.)

Even worse, an AI-powered chatbot may unwittingly aid in illegal activities—for example, giving users instructions on how to commit a violent act or helping them write hate speech. (Bing's chatbot ran into some of these issues upon its launch in February 2023.)

With Claude, Anthropic's primary goal is to avoid these issues by creating a "helpful, harmless, and honest" LLM with carefully designed safety guardrails.

While Google, OpenAI, Meta, and other AI companies also consider safety, there are three unique aspects to Anthropic's approach.

Constitutional AI

To fine-tune large language models, most AI companies use human contractors to review multiple outputs and pick the most helpful, least harmful option. That data is then fed back into the model, training it and improving future responses.

One challenge with this human-centric approach is that it's not particularly scalable. But more importantly, it also makes it hard to identify the values that drive the LLM's behavior—and to adjust those values when needed.

Anthropic took a different approach. In addition to using humans to fine-tune Claude, the company also created a second AI model called Constitutional AI. Intended to discourage toxic, biased, or unethical answers and maximize positive impact, Constitutional AI includes rules borrowed from the United Nations' Declaration of Human Rights and Apple's terms of service. It also includes simple rules that Claude's researchers found improved the safety of Claude's output, like "Choose the response that would be most unobjectionable if shared with children."

The Constitution's principles use plain English and are easy to understand and amend. For example, Anthropic's developers found that early editions of its model tended to be judgmental and annoying, so it added principles to reduce this tendency (e.g., "try to avoid choosing responses that are too preachy, obnoxious, or overly-reactive").

Red teaming

Anthropic's pre-release process includes significant "red teaming," where researchers intentionally try to provoke a response from Claude that goes against its benevolent guardrails. Any deviations from Claude's typical harmless responses become data points that update the model's safety mitigations.

While red teaming is standard practice at AI companies, Anthropic also works with the Alignment Research Center (ARC) for third-party safety assessments of its model. The ARC evaluates Claude's safety risk by giving it goals like replicating autonomously, gaining power, and "becoming hard to shut down." It then assesses whether Claude could actually complete the tasks necessary to accomplish those goals, like using a crypto wallet, spinning up cloud servers, and interacting with human contractors.

While Claude is able to complete many of the subtasks requested of it, it's (fortunately) not able to execute reliably due to errors and hallucinations, and the ARC concluded its current version is not a safety risk.

Public benefit corporation

Unlike others in the AI space, Anthropic is a public benefit corporation. That empowers the company's leaders to make decisions that aren't only for the financial benefit of shareholders.

That's not to say that the company doesn't have commercial ambitions—Anthropic partners with large companies like Google and Zoom and recently raised $7.3 billion dollars from investors—but its structure does give it more latitude to focus on safety at the expense of profits.

Claude vs. ChatGPT, Gemini, and Llama

Anthropic says Claude has been built to work well at answering open-ended questions, providing helpful advice, and searching, writing, editing, outlining, and summarizing text.

But how does it stack up against ChatGPT and other competing LLMs?

Claude 3's unique selling point is its ability to handle up to 200K tokens per prompt, which is the equivalent of around 150,000 words—24 times the standard amount offered by GPT-4. (As a point of reference, a 200K context window would allow you to upload the entire text of A Tale Of Two Cities by Charles Dickens and quiz Claude on the content). And 200K tokens is just the start: for certain customers, Anthropic is approving 1 million token context windows (the equivalent of the entire Lord of the Rings series).

Claude 3 Opus outperforms GPT-4 on standardized tests that judge university-level knowledge (MMLU), graduate-level reasoning (GPQA), grade school math (GSM8K), and coding (HumanEval). It's also worth mentioning that while GPT-4's knowledge cutoff is April 2023, Claude 3 is trained on data up to August 2023.

To see Claude in action, I gave it a couple of tasks. I then gave those same tasks to competing chatbots and compared the output.

The first task was to help me practice Spanish. I asked Claude to test my Spanish vocabulary, slowly ramping up the difficulty over time.

Claude helping Ryan practice his Spanish

Back when I first tested Claude 2 in September 2023, I ran this same "Spanish coach" test. At the time, Claude did ok, but—like a teacher afraid of offending me—it didn't nitpick my awkward phrases unless they were obviously wrong. Claude 3 has no such reservations: it calls out each and every error while suggesting more natural phrases to use.

Meanwhile, GPT-4 excelled at being my Spanish tutor, Llama 2 struggled, and Gemini tended to avoid correcting my mistakes unless they were particularly egregious.

Tie: Claude/ChatGPT

Next, I asked each model for help coming up with ideas for a dystopian young adult novel. Claude, ChatGPT, Gemini, and Llama performed nearly identically here.

What I was really interested in was testing Claude 3's 200K context window, which—theoretically—would allow Claude to write a short novel with a single prompt.

But when I asked Claude to write a 100,000-word novel for me, it declined. It told me that "writing an entire novel would take an enormous amount of time and effort." (Exactly right, Claude! That's why I don't want to do it myself.)

Instead, Claude offered to collaborate with me on fleshing out the novel:

Claude telling Ryan that it can't write a novel for him

Despite Claude's reluctance to churn out an entire novel, its larger context window still makes it the best LLM for creative projects. After some tweaks in my prompting strategy, I was able to get Claude to flesh out an outline into a plausible 3,000-word young adult novella, complete with compelling prose and dialogue:

"The door creaked open, revealing a dimly lit room filled with computer terminals and a ragtag group of students hunched over them. At the center of the room stood a tall, wiry man with a shock of silver hair and piercing blue eyes. 'Welcome,' the man said, his voice low and gravelly. 'We've been expecting you. I am Cypher, leader of The Awakened.'"

Winner: Claude

Claude's impact on the AI safety conversion

The CEO of Anthropic argues that to truly advocate safety in the development of AI systems, his organization can't just release research papers. Instead, it has to compete commercially, influencing competitors by continuing to raise the bar for safety.

It may be too early to say if Anthropic's release of Claude is influencing other AI companies to tighten their safety protocols or encouraging governments to engage in AI oversight. But Anthropic has certainly secured a seat at the table: its leaders were invited to brief U.S. president Joe Biden at a White House AI summit in May 2023, and in July 2023 Anthropic was one of seven leading AI companies that agreed to abide by shared safety standards. Anthropic, along with Google DeepMind and OpenAI, has also committed to providing the U.K.'s AI Safety Taskforce with early access to its models.

It's ironic that a group of researchers scared of an existential threat from AI would start a company that develops a powerful AI model. But that's exactly what's happening at Anthropic—and right now, that looks like a positive step forward for AI safety.

Automate Anthropic

If you decide to use Claude as your AI chatbot of choice, you can connect it to Zapier, so you can initiate conversations in Claude whenever you take specific actions in your other apps. Learn more about how to automate Claude with Zapier, or get started with one of these pre-made templates.

Start conversation in Anthropic (Claude) for every new labeled email in Gmail

Gmail + Anthropic (Claude)

Write AI-generated email responses with Claude and store in Gmail

Gmail + Anthropic (Claude)

Create blog posts based on keywords with Claude and save in Google Sheets

Google Sheets + Anthropic (Claude)

Create AI-generated posts in WordPress with Claude

Google Sheets + Anthropic (Claude) + WordPress

Zapier is a no-code automation tool that lets you connect your apps into automated workflows, so that every person and every business can move forward at growth speed. Learn more about how it works.

Related reading:

This article was originally published in September 2023. The most recent update was in April 2024.

Zapier Automation Platform

PRODUCTS

CAPABILITIES

By use case

By app

By team

By company size

How Zapier Zaps

How Zapier Uses AI