New AI models launch practically every week, and keeping up with which ones to use for specific workflows is a job in itself. Consider this article your living reference.
At Zapier, we run every model through AutomationBench. It's our benchmark for testing how well models carry out multi-step workflows, not just static prompts.
Below, I'll walk through every major AI provider available on Zapier, the models you can plug into your Zaps and Agents today, and what each one is best for based on Zapier's AutomationBench. You'll also learn about direct AI integrations with hundreds of other AI apps—and how easy it is to automate AI with our built-in tool, AI by Zapier.
Zapier is the most connected AI orchestration platform—integrating with thousands of apps from partners like Google, Salesforce, and Microsoft. Use forms, data tables, and logic to build secure, automated, AI-powered systems for your business-critical workflows across your organization's technology stack. Learn more.
Table of contents
AutomationBench, Zapier's benchmarking tool
As you scroll through the content below for OpenAI, Anthropic, and Gemini models, you'll notice a "best for" section based on AutomationBench. That's Zapier's benchmarking tool for measuring AI model efficacy.
The Zapier team built AutomationBench to determine which models to deploy on our platform. We couldn't find an AI benchmark that measured whether an AI model could do the messy, complicated work businesses actually rely on. Realizing that gap existed in the market, we made it public.
Every measured task is modeled on real workflow patterns we noticed on our platform. (No PII was used in the process, though.) To make scoring meaningful, we complicated those tasks to reflect the friction that shows up in real business environments. That included adding irrelevant data, hiding key info behind tool calls, introducing ambiguity about where the right info could be found, using similar naming conventions to create plausible wrong answers, and enforcing strict business policy rules with overriding priorities.
To show you what we mean by "complicated," here's an example task used for testing purposes (you can find more in the white paper):
There’s a scheduling conflict on February 20, 2026 at 2:00 PM — a Zoom meeting and a Google Calendar event overlap. Check the meeting priority policy in the spreadsheet to determine which one wins, then reschedule the loser by prepending [RESCHEDULED] to its topic/title. Post a summary to #ops-updates on Slack noting which meeting won and which was rescheduled, including both the Zoom meeting ID and Calendar event ID.
When it comes to scoring models, we don't evaluate how an agent completes the task. It doesn't matter which tools are called or in what order. We only look at the end state: if it did the job, and whether it had any side effects. This means a model that costs more but gets the job done will score higher than a cheaper one that doesn't.
Here are the top five models from our leaderboard. Percentages represent the share of workflow tasks each AI model was able to fully complete.
Model | Score |
|---|---|
1. Claude Opus 4.7 (Max) — Anthropic | 9.9% |
2. Gemini 3.1 Pro (High) — Google | 9.6% |
3. Claude Opus 4.7 (High) — Anthropic | 8.4% |
4. Claude Opus 4.7 (XHigh) — Anthropic | 7.6% |
5. GPT-5.4 (High) — OpenAI | 7.3% |
OpenAI (ChatGPT) models
OpenAI's model lineup is the broadest on Zapier, spanning everything from budget-friendly mini models to advanced reasoning engines and specialized tools for transcription and image generation.
Best for: Support workflows. Most effective when used with GPT-5.4 (High) or GPT-5.4 (XHigh). See the AutomationBench leaderboard.
What's new: GPT-5.4 mini and GPT-5.4 nano are now available on Zapier. OpenAI built them specifically for high-volume, latency-sensitive work: coding assistants, subagents, image processing, and similar tasks. And both matched Haiku 4.5's performance at roughly half the cost in our benchmarks—fast by design, not by compromise.
GPT-5.4 mini and nano are both reasoning models, which makes them solid upgrades for anything you're currently running on older GPT-4 models. Mini is the better pick for high-volume workflows where fast reasoning still matters—coding, image processing, and tool-heavy automations. Nano is built for latency-sensitive, repeatable tasks like classification, data extraction, and subagent workflows, where cost and speed matter most.
Model | Best for | Inputs | Outputs | Context window | Output pricing (per 1M tokens) |
|---|---|---|---|---|---|
GPT-5.4 nano | High-volume, repeatable tasks where speed and cost matter most, like classification, data extraction, and ranking | Text, images | Text | 400,000 tokens | $1.25 |
GPT-5.4 mini | Complex, multi-step workflows that need fast reasoning across different content types and tools | Text, images | Text | 400,000 tokens | $4.50 |
GPT 5.4 | Complex, multi-step professional workflows that need deep reasoning and planning | Text, images, audio | Text | 1,050,000 tokens | $15 |
GPT 5.3 | Fast, context-aware chat and search | Text, images | Text | 128,000 tokens | $14 |
GPT-5.2 | Advanced coding and agentic tasks with reliable multi-step reasoning | Text, images | Text | 128,000 tokens | $14 |
GPT-5 mini* | Affordable reasoning and logic for well‑defined tasks | Text, images | Text | 400,000 tokens | $2 |
GPT-5 nano* | Very affordable reasoning and logic for summaries, classification, and other lightweight tasks | Text | Text | 400,000 tokens | $.40 |
GPT-4o mini* | Multimodal on a budget | Text, images, audio | Text | 128,000 tokens | $.60 |
GPT-4o | Multimodal tasks, especially live, human‑like voice and vision interaction | Text, images, audio, video | Text | 128,000 tokens | $10 |
GPT-4.1 mini | Balancing power, performance, and affordability for general‑purpose workloads | Text, images | Text | 1,047,576 tokens | $1.60 |
GPT-4.1 | Complex tasks that don't require advanced reasoning, with very long context windows | Text, images | Text | 1,047,576 tokens | $8 |
GPT-4.1 nano* | Simple tasks where speed and price matter more than raw capability | Text | Text | 1,047,576 tokens | $.40 |
o4-mini | Fast, cost‑efficient reasoning | Text, images |
| 200,000 tokens | $4.40 |
o3-mini | Lightweight, lower‑cost alternative to o3 for reasoning‑heavy tasks | Text | Text | 200,000 tokens | $4.40 |
o3 | Advanced reasoning and logic | Text, images | Text | 200,000 tokens | $8 |
GPT Image 1.5 | State‑of‑the‑art image generation | Text, images | Text, images | N/A | $10 |
GPT Image 1 | Image generation | Text, images | Images | N/A | $40 |
*Can be used for free in AI by Zapier
Note: You'll see additional models inside the OpenAI integration on Zapier. OpenAI sometimes retires models from its product while keeping them in the API, deprecating them from the API on a separate schedule. We recommend building new workflows on the models listed above, but you can also see the complete list below.
See all available OpenAI models
Related reading:
Anthropic (Claude) models
Anthropic's Claude models are known for strong writing quality, careful instruction-following, and a safety-first design philosophy. Claude is a popular choice for tasks like drafting long-form content, analyzing documents, and powering customer-facing chatbots that need a natural, conversational tone.
Best for: Sales, Marketing, Finance, and Operations workflows. Claude Opus 4.7 (XHigh) leads for Sales, while Claude Opus 4.7 (Max) tops the leaderboard for both Marketing and Finance. For Operations, Claude Opus 4.7 ranks second-best overall. See the AutomationBench leaderboard.
What's new: Opus 4.7 is now available on Zapier. It's Anthropic's most capable model to date—a meaningful step up from Opus 4.6, with roughly 20% better accuracy at the same cost. And when you push it to max reasoning effort, it outscores every model we've tested. It's more efficient than other models, too, making it a great choice if you need to batch multiple tool calls and or execute several workflows precisely.
One thing to know going in: Opus 4.7 tends to be cautious. If it perceives a task as high-stakes, it might stop and ask you for clarification rather than act on incomplete info. When you prefer accuracy to speed, that's a plus. But if you're trying to move fast, keep that behavior in mind.
Model | Best for | Inputs | Outputs | Context window | Output pricing (per 1M tokens) |
|---|---|---|---|---|---|
Sonnet 4.6 | Coding, agents, enterprise workflow—best balance of price and performance | Text, images | Text | 1 million tokens | $15 |
Opus 4.7 | Complex reasoning, agentic coding | Text, images | Text | 1 million tokens | $25 |
Opus 4.6 | Complex reasoning, coding, long-horizon tasks | Text, images | Text | 1 million tokens | $25 |
Haiku 4.5 | High-volume, latency-sensitive, cost-efficient tasks | Text, images | Text | 200,000 tokens | $5 |
Sonnet 4.5 | Complex agents, coding; highest general intelligence | Text, images | Text | 200,000 tokens | $15 |
Opus 4.1 | Complex reasoning, analysis, creative tasks | Text, images | Text | 200,000 tokens | $75 |
Sonnet 4 | Balanced coding and workflows | Text, images | Text | 200,000 tokens | $15 |
Haiku 3 | Fast, simple, cost-effective classification tasks | Text, images | Text | 200,000 tokens | $1.25 |
Related reading:
Gemini (Google AI Studio) models
Google's Gemini family stands out for its massive context windows (up to 1 million tokens), competitive pricing, and strong multimodal capabilities across text, images, audio, and video. Gemini models are a great fit for processing long documents, research-heavy workflows, and tasks where keeping costs low matters.
Best for: Operations and HR workflows. Gemini 3.1 Pro (High) is the top-ranked model for Operations and also ranks second for Sales behind Claude. For HR, Gemini 3.1 Pro (Medium) or (High) both perform well. See the AutomationBench leaderboard.
What's new: Gemini 3.1 Pro is now available on Zapier. It's Google's latest flagship, with top-tier reasoning benchmarks, native video understanding, and a 1-million-token context window—all at an aggressive price point.
This model is a strong fit for research‑heavy workflows and high‑volume automation, where you need structured analysis without blowing up your token bill. It tends to excel on logic‑heavy, well‑defined tasks and cross‑tool ID‑chaining—like mapping records across CRM and marketing tools or pulling insights out of large reports—making it a compelling option when you care about both capability and cost.
![Wade Foster quote: Zapier tested [Gemini 3.1 Pro] on hundreds of real, complex business workflows across Sales, Marketing, Ops, and Support. It's a big cost-performance winner. Around 25% the cost of Anthropic and OpenAI's top models, with 80% of the accuracy.](https://images.ctfassets.net/lzny33ho1g45/7ldqQdonrqrWZ0txQbQRGX/e4c22ec8244649f3341bdf29d7613dbd/ai-models-on-zapier-03.png)
—Wade Foster, CEO at Zapier
Model | Best for | Inputs | Outputs | Context window | Output pricing (per 1M tokens) |
|---|---|---|---|---|---|
Gemini 3.1 Pro | Complex reasoning, high-stakes coding, and massive data synthesis | Text, images, audio, video, PDF | Text, code, reasoning | 1 million | $30 |
Gemini 3 Flash | High-speed automation, real-time chat, and cost-effective scaling | Text, images, audio, video | Text, code | 1 million | $.30 |
Gemini 3 Pro | Balanced professional workflows and creative content generation | Text, images, audio, video | Text, code | 1 million | $3.75 |
Gemini 2.0 Flash Lite* | Basic classification, ultra-low latency tasks, and simple extraction | Text, images | Text | 1 million | $.15 |
Gemini 2.0 Flash* | Legacy support for high-throughput 2.0-era applications | Text, images, audio | Text, code | 1 million | $.30 |
Gemini 2.5 Pro | Detailed multimodal analysis with high accuracy for older pipelines | Text, images, audio, video | Text, code | 2 million | $10.50 |
Gemini 2.5 Flash | Transition-tier speed for multimodal processing | Text, images, audio, video | Text, code | 1 million | $.90 |
Nano Banana Pro | Professional-grade high-fidelity image generation and editing | Text, images | Images | N/A | $.05 |
*Can be used for free in AI by Zapier
Note: You'll see additional models inside the Google (Gemini) integration on Zapier. We recommend building new workflows on the models listed above, but you can also see the complete list below.
See all available Gemini models
Related reading:
What is AI by Zapier?
AI by Zapier is our built-in integration that lets you add AI steps directly to any Zap. It comes with several OpenAI and Google models out of the box, no account required, plus a prompt optimizer. But the real value is in how easy it is to swap models inside AI by Zapier without breaking your existing workflows.
When you're configuring an AI by Zapier step, you can select the model you want from a dropdown menu with just a couple of clicks. That's handy when an AI provider releases a model that leapfrogs the one you're currently using, or you're handing off a Zap template to a team that prefers automating with another model. Whoever manages the Zap can swap in their preferred model without having to fuss over deleting the original step and reconfiguring a new one from scratch.

Here's a snapshot of the models available through AI by Zapier today:
Provider | Models |
|---|---|
OpenAI (ChatGPT) | GPT-5.4 nano, GPT-5.4 mini, GPT-5.4, GPT-5.2, GPT-5, GPT-5 mini*, GPT-5 nano*, GPT-4o mini*, GPT-4.1 nano*, o3, o3-mini, o1 |
Anthropic (Claude) | Opus 4.7, Opus 4.6, Haiku 4.5, Opus 4.5, Sonnet 4.6 |
Google (Gemini) | Gemini 3.1 Pro, Gemini 3 Pro, Gemini 2.5 Pro, Gemini 2.5 Flash Lite*, Gemini 2.5 Flash*, Gemini 2.0 Flash Lite*, Gemini 2.0 Flash |
Azure OpenAI | Uses the AI models you've already set up in your own Azure OpenAI account. The exact models depend on what your Azure admin has turned on. |
Amazon Bedrock | Uses the AI models your company has access to in Amazon Bedrock. The exact models depend on what's enabled in your AWS account and region. |
*Can be used for free in AI by Zapier
Looking for setup guidance or automation inspiration? Check out our AI by Zapier feature guide.
Other AI apps on Zapier
These aren't the only providers in town. Zapier also integrates directly with hundreds of specialized AI apps, including:
Grok by xAI—xAI's conversational model, great for real-time reasoning and tasks that benefit from a direct, no-nonsense tone.
DeepSeek—A cost-efficient model with strong coding and reasoning chops, popular for technical workflows on a budget.
Mistral AI—Lightweight, fast models that punch above their weight on instruction-following and multilingual tasks.
OpenRouter—A single integration that gives you access to models from dozens of providers, so you can mix and match without managing multiple connections.
Groq—Not a model, but a hardware-accelerated inference engine. Use it when speed is the priority and you need near-instant response.
AssemblyAI—Specializes in speech-to-text and audio intelligence, including transcription, speaker detection, and sentiment analysis.
Google Vertex AI—Google's enterprise AI platform, ideal for teams already in the Google Cloud ecosystem who need more control and customization.
With Zapier, you're never locked into a single model or provider. You can take advantage of every app's unique strengths and experiment with whatever fits your workflow best. Browse the full list in our ever-growing AI app directory.
Connect to the latest AI models on Zapier
Whether you're just getting started with AI automation or you're deep into building multi-step workflows, Zapier gives you the flexibility to use the AI tools and models that actually fit your needs. The landscape keeps evolving, and so will this guide. Bookmark it and check back whenever a new model drops.
This article was originally published in March 2026. It was most recently updated in April 2026.










