As impressive as today's AI chatbots are, interacting with them might not leave you with an I, Robot level of existential sci-fi dread (yet).
But according to Dario Amodei, the CEO of Anthropic, an AI research company, there's a real risk that AI models become too autonomous—especially as they start accessing the internet and controlling robots. Hundreds of other AI leaders and scientists have also acknowledged the existential risk posed by AI.
To help address this risk, Anthropic did something counterintuitive: they decided to develop their own large language model (LLM). Anthropic's latest update to its LLM, known as Claude 2, is designed to be safer than competing models and is already being called a potential "ChatGPT killer."
Hyperbole aside, Claude brings a sorely needed focus on AI safety to an increasingly crowded marketplace of LLMs.
What is Claude?
Claude is an AI chatbot powered by Anthropic's LLM, Claude 2.
Anthropic, the company behind Claude, was started in 2021 by a group of ex-OpenAI employees who helped develop OpenAI's GPT-2 and GPT-3 models. It's focused on AI research with a focus on safety above all.
After running a closed alpha with a few commercial partners in early 2023, Claude's model was integrated into products like Notion AI, Quora's Poe, and DuckDuckGo's DuckAssist. In March 2023, Claude opened up its API to a wider set of businesses before releasing its chatbot to the public in July 2023, in conjunction with the release of the Claude 2 model.
Although Claude 2 isn't yet as capable as GPT-4, it's improving fast and handily beats most other AI models on standardized tests.
How to try Claude for yourself
The Claude chatbot—powered by the latest Claude 2 model—is available through an open beta in the U.S. and U.K. (Anthropic intends to expand access globally at a later date.)
For access, sign up at Claude.ai. From there, you can start a conversation or use one of Claude's default prompts to get started. (Note: Anthropic recently added a Claude Pro option, which gives you access at high-traffic times and lets you use extra features as they're released.)
You can also try Claude through Quora's Poe, which allows you to interact directly with the Claude 2 100K model (among other AI models). In my testing, this way provided fewer limitations than accessing it through Claude.ai.
How is Claude different from other AI models?
All AI models are prone to some degree of bias and inaccuracy. Hallucinations are a frequent occurrence: when an AI model doesn't know the answer, it often prefers to invent something and present it as fact rather than say "I don't know." (In that respect, AI may have more in common with humans than we think.)
Even worse, an AI-powered chatbot may unwittingly aid in illegal activities—for example, giving users instructions on how to commit a violent act or helping them write hate speech. (Bing's chatbot ran into some of these issues upon its launch in February 2023.)
With Claude, Anthropic's primary goal is to avoid these issues by creating a "helpful, harmless, and honest" LLM with carefully designed safety guardrails.
While Google, OpenAI, Meta, and other AI companies also consider safety, there are three unique aspects to Anthropic's approach.
To fine-tune large language models, most AI companies use human contractors to review multiple outputs and pick the most helpful, least harmful option. That data is then fed back into the model, training it and improving future responses.
One challenge with this human-centric approach is that it's not particularly scalable. But more importantly, it also makes it hard to identify the values that drive the LLM's behavior—and to adjust those values when needed.
Anthropic took a different approach. In addition to using humans to fine-tune Claude, the company also created a second AI model called Constitutional AI. Intended to discourage toxic, biased, or unethical answers and maximize positive impact, Constitutional AI includes rules borrowed from the United Nations' Declaration of Human Rights and Apple's terms of service. It also includes simple rules that Claude's researchers found improved the safety of Claude's output, like "Choose the response that would be most unobjectionable if shared with children."
The Constitution's principles use plain English and are easy to understand and amend. For example, Anthropic's developers found that early editions of its model tended to be judgmental and annoying, so it added principles to reduce this tendency (e.g., "try to avoid choosing responses that are too preachy, obnoxious, or overly-reactive").
Anthropic's pre-release process includes significant "red teaming," where researchers intentionally try to provoke a response from Claude that goes against its benevolent guardrails. Any deviations from Claude's typical harmless responses become data points that update the model's safety mitigations.
While red teaming is standard practice at AI companies, Anthropic also works with the Alignment Research Center (ARC) for third-party safety assessments of its model. The ARC evaluates Claude's safety risk by giving it goals like replicating autonomously, gaining power, and "becoming hard to shut down." It then assesses whether Claude could actually complete the tasks necessary to accomplish those goals, like using a crypto wallet, spinning up cloud servers, and interacting with human contractors.
While Claude is able to complete many of the subtasks requested of it, it's (fortunately) not able to execute reliably due to errors and hallucinations, and the ARC concluded its current version is not a safety risk.
Public benefit corporation
Unlike others in the AI space, Anthropic is a public benefit corporation. That empowers the company's leaders to make decisions that aren't only for the financial benefit of shareholders.
That's not to say that the company doesn't have commercial ambitions—Anthropic partners with large companies like Google and Zoom and is seeking to raise billions of dollars—but its structure does give it more latitude to focus on safety at the expense of profits.
Claude vs. ChatGPT, Bard, and Llama 2
Anthropic says Claude has been built to work well at answering open-ended questions, providing helpful advice, and searching, writing, editing, outlining, and summarizing text.
But how does it stack up to ChatGPT?
Claude 2's unique selling point is its ability to handle up to 100K tokens per prompt, which is the equivalent of around 75,000 words—12 times the standard amount offered by GPT-4. Claude 2 performed roughly as well as GPT-4 on standardized tests like the GRE, although it's notably worse at coding and quantitative reasoning, and arguably better at creative writing. It's also worth mentioning that while GPT's knowledge cutoff is September 2021, Claude 2 is trained on data up to early 2023.
To see Claude in action, I gave it a couple of tasks. I then gave those same tasks to competing chatbots and compared the output.
The first task was to help me practice Spanish. I asked Claude to test my Spanish vocabulary, slowly ramping up the difficulty over time.
It responded well to some complex additions I made to the prompt during our conversation, although it missed the bigger picture at times and—like a teacher afraid of offending me—didn't nitpick my awkward phrases unless they were obviously wrong.
Meanwhile, ChatGPT excelled at being my Spanish tutor, Llama 2 struggled, and Bard seemed incapable of even understanding my prompt.
Next, I asked each model for help coming up with ideas for a dystopian young adult novel. Claude, ChatGPT, and Llama 2 performed nearly identically here. (Bard misunderstood my intent, instead offering up a list of existing books like The Hunger Games.)
What I was really interested in was testing Claude 2's 100K context window, which—theoretically—would allow Claude to write a novel with a single prompt.
But when I asked Claude to write a 30,000-word novel based on the plot outline it generated for me, it declined. It told me it "lacks the imaginative skills needed to develop multidimensional characters, write extended scenes and dialogue, or expand the worldbuilding." Instead, Claude offered to collaborate with me on fleshing out the novel.
This may be a limitation in Claude's open beta (for better or worse). When I accessed the Claude 2 model through Poe, I was able to use this same plot outline to generate the first five chapters of a very plausible young adult novel, complete with compelling prose and dialogue:
"They continued in tense silence through the serpentine tunnels. After forty minutes, Jax halted. He craned his neck to peer up the rusty rungs of a maintenance ladder anchored to the curving wall."
Claude's impact on the AI safety conversion
The CEO of Anthropic argues that to truly advocate safety in the development of AI systems, his organization can't just release research papers. Instead, it has to compete commercially, influencing competitors by continuing to raise the bar for safety.
It may be too early to say if Anthropic's release of Claude is influencing other AI companies to tighten their safety protocols or encouraging governments to engage in AI oversight. But Anthropic has certainly secured a seat at the table: its leaders were invited to brief U.S. president Joe Biden at a White House AI summit in May 2023. Anthropic, along with Google DeepMind and OpenAI, has also committed to providing the U.K.'s AI Safety Taskforce with early access to its models.
It's ironic that a group of researchers scared of an existential threat from AI would start a company that develops a powerful AI model. But that's exactly what's happening at Anthropic—and right now, that looks like a positive step forward for AI safety.
If you decide to use Claude as your AI chatbot of choice, you can connect it to Zapier, so you can initiate conversations in Claude whenever you take specific actions in your other apps.