How to choose the right AI model for your workflows: A deep-dive into workflow and agent performance
Wednesday, June 24
1 PM ET
"We don't pick one model and hope it's the best at everything. We route different work to different models—and we measure the difference." — Bryan Helmig, CTO & Co-Founder, Zapier
Every new model release brings up the same question: should I switch?
Underneath that question are important ones like, for which workflows? Agents? And is the latest model actually worth it if I'm going to burn through more tokens?
Most teams are stuck guessing—or burning tokens on frontier models for work a cheaper one handles fine.
AutomationBench is how Zapier can definitively answer those questions. It's the execution benchmark frontier labs, like Anthropic, cite in their model system cards—measuring whether models can complete hard, real business workflows. AutomationBench evaluates models across six business domains (Sales, Marketing, Operations, Support, Finance, and HR), selected based on the most common use-case patterns across the 3.7M companies and 2B monthly tasks Zapier sees.
Join technical leaders from Zapier for a live session on how to pick the right model for different roles and workflows, featuring:
– Side-by-side output comparisons across frontier providers on the same business tasks
– Cost-conscious routing: where you need premium models vs where you're wasting budget
– Practical re-evaluation patterns when dot releases ship—so you're testing impact, not vibes
– AutomationBench results as proof: how Zapier (and labs like Anthropic) measure execution on hard workflows
You’ll walk away with a better understanding on when to use Opus 4.8, ChatGPT 5.5, or Gemini 3.5 Flash.
If you’re a head of AI, an innovation or IT leader, or a builder running agents at your organization – this webinar is for you.
Save your seat. Recording shared with registrants.




This webinar will be packed with lots of live examples, real cost data, and side-by-side use case breakdowns. We'll go deep on:
– What AutomationBench actually measures: Why it's the benchmark labs use to validate model quality on business workflows (not abstract reasoning or coding tests)
– Side-by-side model comparisons on the same workflows: See how outputs, accuracy, and costs differ across OpenAI, Anthropic, and Google on real tasks
– Domain-by-domain winners: Which models excel at department-specific workflows (HR vs. marketing vs. sales)—and where you're overpaying for capability you don't need
– Cost vs performance tradeoffs: Real examples of $6/task vs $0.65/task on hard execution work—and how to route workflows to avoid token burn
– Monday takeaway: A repeatable framework for auditing your workflows, mapping them to models, and setting re-eval triggers when new releases ship
Can’t attend live?
Register today, and we'll send you a recording after the webinar ends.
Inscris-toi maintenant