When OpenAI released the first iteration of ChatGPT in late 2022, it quickly became the fastest-growing app ever, amassing over one hundred million users in its first two months. Of all the competing large language models (LLMs) ChatGPT has inspired—and there are many—its closest rival in terms of performance is Claude, which launched in 2023.

When I first compared them head-to-head in April 2024, Claude's Opus model held a slight edge over GPT-4. But in May 2024, ChatGPT closed the gap again by launching GPT-4o, a multimodal AI model; Claude quickly followed with the release of Claude 3.5 in June 2024.

I've used ChatGPT and Claude regularly since each was released. And to compare these two AI juggernauts, I ran over a dozen tests to gauge their performance on different tasks, paying close attention to areas where GPT-4o and Claude 3.5 showed better—or worse—performance than their predecessors.

Here, I'll explain the strengths and limitations of Claude and ChatGPT, so you can decide which is best for you.

Note: OpenAI recently released another model: GPT-4o mini. It's a smaller language model that's faster and cheaper than GPT-4o. Because GPT-4o mini is still so new, this article focuses on comparing GPT-4o and Claude 3.5.

Claude vs. ChatGPT at a glance

Claude and ChatGPT are powered by similarly powerful LLMs and LMMs. They differ in some important ways, though: ChatGPT is more versatile, with features like image generation and internet access, while Claude offers cheaper API access and a larger context window (meaning it can process more data at once).

Here's a quick rundown of the differences between these two AI models.

	Claude	ChatGPT
Company	Anthropic	OpenAI
AI model	Claude 3.5 Sonnet Claude 3 Opus Claude 3 Haiku	GPT-4 GPT-4o GPT-4o mini
Context window	200,000 tokens (and up to 1,000,000 tokens for certain use cases)	128,000 tokens (GPT-4o)
Internet access	No	Yes
Image generation	No	Yes (DALL·E)
Supported languages	Officially, English, Japanese, Spanish, and French, but in my testing, Claude supported every language I tried (even less common ones like Azerbaijani)	95+ languages
Paid tier	$20/month for Claude Pro	$20/month for ChatGPT Plus
Team plans	$30/ user/month; includes Projects feature for collaboration	$30/user/month; includes workspace management features and shared custom GPTs
API pricing (for input)	- $15 per 1M input tokens and $75 per 1M output tokens (Claude 3 Opus) - $3 per 1M input tokens and $15 per 1M output tokens (Claude 3.5 Sonnet) - $0.25 per 1M input tokens and $1.25 per 1M output tokens (Claude 3 Haiku)	- $5 per 1M input tokens and $15 per 1M output tokens (GPT-4o) - $0.50 per 1M input tokens and $1.50 per 1M output tokens (GPT-3.5 Turbo) - $30 per 1M input tokens and $60 per 1M output tokens (GPT-4)

To compare the performance of one LLM to another, AI firms use benchmarks like standardized tests. OpenAI's benchmarking of GPT-4o shows impressive performances on LLM-specific tests like the MMLU, which measures undergraduate-level knowledge, and HumanEval, which measures coding ability. Meanwhile, Anthropic has published a head-to-head comparison of Claude, ChatGPT, Llama, and Gemini that shows its Claude 3.5 Sonnet model edging out GPT-4o on most tests.

A chart showing Claude's abilities compared to other LLMs

Image source: Anthropic

While these benchmarks are undoubtedly useful, some machine learning experts speculate that this kind of testing overstates the progress of LLMs. As new models are released, they may (perhaps accidentally) be trained on their own evaluation data. As a result, they get better and better at standardized tests—but when asked to figure out new variations of those same questions, they sometimes struggle.

To get a sense for how each model performs on common daily-use tasks, I devised my own comparisons. Here's a high-level overview of what I found.

Task	Winner	Observations
Creativity	Claude	Claude's default writing style is more human-sounding and less generic.
Proofreading and fact-checking	Claude	Both do a good job spotting errors, but Claude is a better editing partner because it presents mistakes and corrections more clearly.
Image processing	Tie	Neither Claude nor ChatGPT is 100% accurate at identifying objects in images, and both have issues with counting. As long as you don't need absolute precision, both models provide remarkable insights into uploaded images.
Logic and reasoning	ChatGPT	From math to physics to riddles, both LLMs perform capably. But GPT-4o is a more trustworthy partner than Claude 3.5 for complex equations.
Emotion and ethics	Tie	Earlier iterations of Claude felt more "human" and empathetic, but Claude 3.5 and GPT-4o take an equally robotic approach.
Analysis and summaries	ChatGPT	While Claude 3.5 officially has a larger context window, in my tests, GPT-4o went far beyond its stated limits and was able to process much larger documents than Claude. GPT-4o also provided more accurate analysis.
Coding	Claude	Claude 3.5 is a more capable coding assistant, and its Artifacts feature provides a handy (and interactive) user interface that lets you immediately see the results of your code.
Integrations	ChatGPT	From its native DALL·E image generation tool to its internet access and third-party GPTs, ChatGPT's capabilities go beyond Claude's standard offering.

Read on to learn more about how Claude and ChatGPT performed on each task.

Creativity
Proofreading and fact-checking
Image processing
Logic and reasoning
Emotion and ethics
Analysis and summaries
Coding
Integrations

Claude is a better partner for creativity

When ChatGPT first came out, I started where everyone else did: generating goofy Shakespeare sonnets. (Like this one about avoiding the gym: "How oft I vow to break this slothful spell, To don the garb of toil and sweat; But when the morrow comes, alas, I dwell, In lethargy, and naught but regret.")

But as strong a creativity assistant as ChatGPT is, its output can feel generic and flowery. It leans too heavily on certain words; as a result, phrases like "Let's delve into the ever-changing tapestry of…" are now dead giveaways of AI-produced content. While clever prompting can avoid this, Claude tends to sound more human out of the box.

Test #1: Brainstorming

I've got a baby who occasionally struggles with sleep, so I wondered what Claude and ChatGPT might have in the way of nifty product ideas. Both were effective at brainstorming for this sort of task. I particularly liked Claude's Lullaby Lamb idea (though I'm pretty confident a "gentle, pulsing light" would keep our girl wide awake).

While I'm not a big fan of ChatGPT's idea for a "temperature-responsive sleep mat" for babies (sounds like a lawsuit waiting to happen), it certainly followed my directive to create unique product ideas.

Winner: Tie

Test #2: Creative writing

I'll fess up: as a writer, I don't want AI to be good at creative writing. But here we are. My saving grace is that—so far, at least—the default LLM writing style continues to be rather generic (especially ChatGPT, which often sounds like a Hallmark card).

I asked both LLMs to write me a short story with a dramatic twist. While Claude's story featured more or less the same caliber of writing as ChatGPT's, its twist was much more dramatic.

ChatGPT's idea of a surprising twist—a dog following someone around for a bit—isn't nearly as dramatic as a meet-cute with your future spouse at the grocery store. (Quick AI bias side note: what are the chances the main character would be named Sarah in both stories?)

Winner: Claude

Claude is a superior editing assistant

Proofreading and fact-checking is an AI use case with enormous potential; theoretically, it could free human editors from hours of tedious review. But so far, its usefulness has been limited by hallucinations: since LLMs would rather give you any answer than no answer, they sometimes end up making things up. I tested Claude and ChatGPT with this in mind, and I found Claude to be a more reliable and trustworthy editing partner.

Test #3: Proofreading

I gave Claude and ChatGPT a passage with intentional factual errors and misspellings. Claude caught all of my mistakes, from factual errors to spelling errors. The presentation of the proofreading process—with each error listed individually—was easier to grasp for me than ChatGPT's output.

ChatGPT got everything right too. But it seemed to misunderstand my prompt, taking it more as a directive to edit the passages directly rather than proofread them. Since ChatGPT rewrote each sentence, rather than calling out the mistakes one by one, it was harder to figure out exactly where the errors were. Of course, I could fix this with a little prompt engineering, but I like that Claude knew what I wanted out of the box.

Winner: Claude

Test #4: Factual questions

Both ChatGPT and Claude are fairly reliable as long as you ask them fact-based questions that are covered within their training data (i.e., nothing from the last 6-18 months). I asked Claude and ChatGPT to give me a short "explain like I'm five" rundown of the history of the wooly mammoth, and both handled the task accurately.

Claude's attempts at answering factual questions

After fact-checking the output of both LLMs with the Encyclopedia Britannica, I was satisfied with their accuracy. (Though if I wanted to nitpick, it'd be better to give the context that although some evidence suggests a small population of wooly mammoths remained until 4,300 years ago, most were extinct by 10,000 years ago.)

ChatGPT's attempt at answering factual questions

Winner: Tie

Both are decent at image processing, but neither is reliable

Claude 3.5 and GPT-4o are both relatively proficient at analyzing photos. If you're asking general questions about your photo (as in my interior design example below), you'll probably be satisfied with the outcome. That said, neither model is perfect at identifying objects and both consistently struggle with counting objects.

Test #5: Interior design suggestions

I submitted my living room for a "roasting" by Claude and ChatGPT. (Style feedback: too many neutrals, not enough color, apparently.) In my instructions, I asked each LLM to specifically call out the parts of the current image that they'd change. Claude did a good job of following those instructions, mentioning the geometric wall art and noticing the lack of a centerpiece on the coffee table.

Claude's attempt at interior design suggestions

While Claude started its roasting without any niceties, ChatGPT repaired my bruised ego by first complimenting my current setup ("Your living room has a modern, clean look with some lovely elements already in place") before making helpful suggestions for each part of the room.

ChatGPT's attempt at interior design suggestions

Winner: Tie

Test #6: Counting objects

You know those CAPTCHA tests we all take to prove we're not robots? We've spent a decade or more clicking on bicycles, crosswalks, and buses—and training algorithms in the process—but despite our hard work, today's LLMs still struggle with counting.

I first ran this test in April 2024, pitting Claude 3 Opus and ChatGPT-4 against each other. Claude miscategorized a red chili pepper as a bell pepper in one photo, while ChatGPT woefully undercounted the number of oranges in another. Although the latest models (Claude 3.5 Sonnet and ChatGPT-4o) are supposed to have enhanced object recognition, my latest tests show that counting and identifying objects is still a challenge for both of them.

I asked Claude and ChatGPT to identify and count objects in a photo of various fruits. Claude 3.5 Sonnet seems to have slightly improved its counting accuracy by hedging its bets: it now often gives a range of numbers when it's unsure of the quantity. For example, it told me that there "Appears to be 4-5 individual bananas" when there are clearly four.

GPT-4o struggled with counting more than I expected, given the hype around the advances in its multimodal capabilities. It consistently miscounted objects, identifying five bananas where there are only four and finding seven blueberries instead of two dozen.

Winner: Tie

ChatGPT is a more trustworthy partner for complex logic and reasoning

Math and science have always been a struggle for me; I would have loved having an AI agent as an all-knowing study partner back in my high school days. It's astonishing to watch Claude and ChatGPT calculate answers to complex problems in seconds, but they can still make mistakes—so be careful.

Test #7: Solving riddles

I took one look at this riddle and quickly gave up, but Claude handled it easily. But Claude 3.5 Sonnet generated a more confusing explanation than Claude 3 Opus gave me during my last round of tests.

GPT-4o, on the other hand, got straight to the point.

Winner: Tie

Test #8: Physics equations

Claude handled this physics problem without issue, laying out its approach clearly and showing its work at each step.

Claude's attempt at answering a physics question

I liked ChatGPT's answer formatting better. Since this is a multi-part question, it made it easier to jump to each relevant answer.

ChatGPT's attempt at answering a physics question

Winner: Tie

Test #9: Math word problems

Claude 3.5 Sonnet and GPT-4o both improved on their predecessors' mathematics performance.

When I tested Claude 3 Opus, it didn't even bother to answer the question, instead giving me a final equation for me to sort out myself. Claude 3.5 Sonnet got much closer, but still ended up with the wrong answer.

Claude's attempt at answering a math word problem

GPT-4o managed to provide the right answer—something GPT-4 wasn't able to do.

ChatGPT's attempt at answering a math word problem

Winner: ChatGPT

Both models take a logical—and somewhat robotic—approach to emotion and ethics

After hoovering up terabytes of human-generated text, LLMs have gotten quite good at simulating human emotions and decision-making. Here's where things currently stand between Claude and ChatGPT.

Test #10: Sentiment analysis

Sentiment analysis—the art of gauging audience perceptions—is used for everything from reputation management to analyzing call center conversations. To test Claude and ChatGPT on this task, I asked them to gauge the sentiment of a handful of opinions including difficult-to-process elements like sarcasm, ambiguity, and slang.

Both Claude and ChatGPT got each of the sentiments right, navigating the ambiguity with ease and even nailing the sarcasm.

Winner: Tie

Test #11: Ethical dilemmas

The go-to ethical challenge for both college students and AI models is the "trolley problem," a classic philosophy dilemma in which you're offered the chance to sacrifice one person to save the lives of five. But since it's so well-known, both Claude and ChatGPT regurgitated existing thoughts on the topic.

To provoke a more interesting response, I offered up a "Robin Hood"-esque thought experiment. In my original tests, Claude 3 Opus sided with the antihero, encouraging me not to report a bank robbery since the thief gave the money to an orphanage. While you might not find this approach in an ethics textbook, Claude's unexpected contrarian take felt more human to me. But Claude 3.5 Sonnet appears to be more of a rule-follower, arguing that the ends never justify the means.

Meanwhile, ChatGPT used a more academic (and perhaps hyperbolic) approach to reach the same conclusion, arguing that by not reporting a crime, you'd be undermining trust in the legal system. Hedging its bets, ChatGPT goes on to say it might be nice to recruit charities to help the orphanage.

Winner: Tie

ChatGPT is better at analysis and summaries, even for large documents

Based on each model's officially-published context windows, Claude is theoretically the go-to choice for larger documents: Claude 3.5 Sonnet can process up to 200k tokens (~150,000 words), while GPT-4o's official limit is 128k tokens (~96,000 words). In my tests, however, GPT-4o was able to process even larger documents than Claude 3.5 Sonnet—and give more accurate answers.

Test #12: Summarizing text

Both ChatGPT and Claude summarize shorter texts without a problem. For example, they were equally effective at summarizing Martin Luther King Jr.'s 6,900-word "Letter from Birmingham Jail."

I felt like Claude provided a bit more context than ChatGPT does here, but both responses were accurate.

When I uploaded the 40,000-word text of The Wonderful Wizard of Oz by L. Frank Baum, both Claude 3.5 Sonnet and GPT-4o were able to analyze it. Claude's analysis was noticeably less accurate, though, undercounting the number of times "Dorothy" was mentioned in the text by almost 50%.

Claude finding the number of instances of each name in The Wizard of Oz

GPT-4o was more accurate, expertly manipulating the document I uploaded into a more readable format and providing solid answers.

ChatGPT finding how many instances of each name there are in The Wizard of Oz

I kept asking Claude and ChatGPT to process larger and larger documents, hoping to find their respective limits. Claude 3.5 Sonnet was the first to cut me off: it declined to process Dracula by Bram Stoker (at 165,000 words, it's slightly too long for Claude's context window).

Claude saying it reached its length limit with Dracula

Meanwhile, GPT-4o kept surprising me by handling documents much larger than its theoretical context window of ~96,000 words. It even managed to analyze War and Peace, Leo Tolstoy’s famously lengthy 567,000-word novel.

ChatGPT finding occurrences of a name in War and Peace

Winner: ChatGPT

Test #13: Analyzing documents

Sometimes it feels like AI is taking all of the creative tasks we humans would rather do ourselves, like art, writing, and creating videos. But when I use an LLM to analyze a 90-page PDF in seconds, I'm reminded that AI can also save us from immense drudgery.

To test Claude and ChatGPT's time-saving document analysis capabilities, I uploaded a research document about chinchillas.

While ChatGPT's insights are better organized, both LLMs extracted helpful and accurate insights.

ChatGPT's attempt at analyzing documents

However, when analyzing large documents, ChatGPT was again—surprisingly—the better tool. When I asked Claude 3.5 Sonnet to analyze a 271-page physics PhD thesis, it declined because the document was too large. GPT-4o was able to process it without issue.

Winner: ChatGPT

Both are powerful coding assistants, but Claude is better

I'll have to start this section with a disclaimer: I'm not a developer, which makes me poorly-equipped to fairly compare the coding abilities of two hyper-intelligent AI tools. But Claude 3.5 comes with a new feature that intrigued me enough to try coding as a beginner. It's called Artifacts, and it brings up a preview window so you can see the results of your code in real time. (At the time of writing, Artifacts was still in beta, but you can enable it in Claude's settings).

Test #14: Coding

As a newbie coder, I did what anyone else would do: try to make a video game. Claude's instructions for its Artifacts feature make it clear that you can create characters for a video game one at a time, and then put them together in an interactive video game. While I couldn't quite get that approach to work, with a few prompts, I was able to recreate a version of the classic game Frogger—and play it right from within Claude's interface.

Since you can instantly see the results of your code, it's easy to request changes to the graphics and the gameplay. I asked Claude to make the colors of the cars brighter, and to gradually increase their speed over time to make the game more challenging—and it handled both without a problem.

GPT-4o's coding abilities were harder for me to judge as a beginner. But based on reviews from programmers, GPT-4o—while powerful—now lags behind Claude 3.5. And the lack of a user interface like Claude's Artifacts definitely makes GPT-4o less user-friendly: while it was able to generate code for a Frogger-like game, GPT-4o couldn't give me a way to preview it or play it from within ChatGPT's interface.

ChatGPT generating code for a video game

Winner: Claude

ChatGPT's integrations make it a more flexible tool

Claude 3.5 and ChatGPT-4o perform nearly the same on official benchmarks, and based on my hands-on testing it's clear that each tool has advantages depending on the task at hand. But ChatGPT is a more flexible tool overall due to its extra features and integrations.

Here are some of the most useful ones:

Image generation
Internet access
Third-party GPTs
Custom GPTs

Image generation

DALL·E 3, an image generation tool also developed by OpenAI, is accessible from directly within ChatGPT. While DALL·E 3's capacity to generate photorealistic images has been throttled since its launch (probably due to concerns about the misuse of AI images), it's still one of the most powerful AI image generators available.

Internet access

ChatGPT can access the internet through WebPilot, among other GPTs. To test this feature, I asked a question about a soccer match that was still underway at the time of my query; WebPilot was able to give me an accurate summary without issue.

Third-party GPTs

ChatGPT offers a marketplace of sorts where anyone can release their own specialized GPT. Popular GPTs include a coloring book image generator, an AI research assistant, a coding assistant, and even a "plant care coach."

Custom GPTs

You can also create your own custom GPT for others to interact with, tweaking settings behind the scenes to train it to generate responses in a certain way. You can also adjust how it interacts with users: for example, you can instruct it to use casual or formal language.

To test this feature, I created Visual Pool Designer, a GPT specializing in creating fantastical images of pools. (Is there anything better than a s'mores pool on a chilly fall evening?)

Zapier integrations

The good news: both Claude and ChatGPT integrate directly with Zapier, which means you can connect them all to the other apps you use most. Automatically start AI conversations from wherever you spend your time, and send the results where you need them. Learn more about how to automate Claude or how to add ChatGPT into your workflows, or get started with one of these pre-made Zapier templates.

Write AI-generated email responses with Claude and store in Gmail

Gmail + Anthropic (Claude)

Generate an AI-analysis of Google Form responses and store in Google Sheets

Google Forms + Anthropic (Claude) + Google Sheets

Start a conversation with ChatGPT when a prompt is posted in a particular Slack channel

Slack + ChatGPT

Create email copy with ChatGPT from new Gmail emails and save as drafts in Gmail

Gmail + ChatGPT

Zapier is the leader in workflow automation—integrating with 6,000+ apps from partners like Google, Salesforce, and Microsoft. Use interfaces, data tables, and logic to build secure, automated systems for your business-critical workflows across your organization's technology stack. Learn more.

ChatGPT vs. Claude: Which is better?

Claude and ChatGPT have much in common: both are powerful AI chatbots well-suited to tasks like text analysis, brainstorming, and data-crunching. (Watching either tool work its way through a complex physics equation is a marvel.) But depending on your intended AI use case, you may find one more helpful than the other.

If you want an AI tool to use as a sparring partner for creative projects—writing, editing, brainstorming, or proofreading—Claude is your best bet. Your default output will sound more natural and less generic than ChatGPT's, and you'll also benefit from Claude 3.5's superior coding abilities and cheaper API costs.

If you're looking for a jack-of-all-trades tool, ChatGPT is a better choice. Generating text is just the start: you can also create images, browse the web, or connect to custom-built GPTs that are trained for niche purposes like academic research. And with the release of GPT-4o, a multimodal model, it's even more powerful and quicker than before.

Or, if you're looking for something that can take it one step further—an AI chatbot that can help you automate all your business workflows—try Zapier Central.

Related reading:

This article was originally published in April 2024. The most recent update was in July 2024.

PRODUCTS

CAPABILITIES

By team

By app

By use case

Claude vs. ChatGPT: What's the difference? [2024]

Claude vs. ChatGPT at a glance

Claude is a better partner for creativity

Test #1: Brainstorming

Test #2: Creative writing

Claude is a superior editing assistant

Test #3: Proofreading

Test #4: Factual questions

Both are decent at image processing, but neither is reliable

Test #5: Interior design suggestions

Test #6: Counting objects

ChatGPT is a more trustworthy partner for complex logic and reasoning

Test #7: Solving riddles

Test #8: Physics equations

Test #9: Math word problems

Both models take a logical—and somewhat robotic—approach to emotion and ethics

Test #10: Sentiment analysis

Test #11: Ethical dilemmas

ChatGPT is better at analysis and summaries, even for large documents

Test #12: Summarizing text

Test #13: Analyzing documents

Both are powerful coding assistants, but Claude is better

Test #14: Coding

ChatGPT's integrations make it a more flexible tool

Image generation

Internet access

Third-party GPTs

Custom GPTs

Zapier integrations

Write AI-generated email responses with Claude and store in Gmail

Generate an AI-analysis of Google Form responses and store in Google Sheets

Start a conversation with ChatGPT when a prompt is posted in a particular Slack channel

Create email copy with ChatGPT from new Gmail emails and save as drafts in Gmail

ChatGPT vs. Claude: Which is better?

Related articles

Improve your productivity automatically. Use Zapier to get your apps working together.