Live Model Rankings 2026

Best AI Model for Agents 2026

Which model should you run in your agent? Compare the models people actually use — Claude, GPT-5, Gemini, DeepSeek, Qwen, GLM, Kimi and more — ranked by output quality, agentic ability, speed, reliability, and value for money, with live pricing from OpenRouter. 25,378 community ratings so far — add yours.

Live Ranking

The Best AI Models for Agents, Ranked

Overall ranks by capability for agents — weighted toward agentic ability and output quality, calibrated against public benchmarks (SWE-bench, Terminal-Bench, Artificial Analysis, LMArena). Price isn't in this score: switch to Value for $ to rank by quality per dollar instead. Tap any model for the full breakdown.

1
Claude Opus 4.8Anthropic · 1M ctx · $$$$
The output-quality benchmark for agentic coding — the model most coding agents reach for on hard tasks.
94/100
$11/1M · 94 community
2
GPT-5.5OpenAI · 1.1M ctx · $$$$
OpenAI's flagship — frontier reasoning and the model behind Codex's hardest runs.
93/100
$13/1M · 93 community
3
Claude Sonnet 4.6Anthropic · 1M ctx · $$$$
The default daily driver for agents — most of Opus's quality at a fraction of the price.
91/100
$6.60/1M · 91 community
4
GPT-5.4OpenAI · 1.1M ctx · $$$$
The value flagship of the GPT-5 line — most of 5.5's ability for less than half the price.
90/100
$6.25/1M · 90 community
5
Gemini 3.1 ProGoogle · 1.0M ctx · $$$
Google's flagship — frontier quality with a massive multimodal 1M-token context.
89/100
$5.00/1M · 89 community
6
GLM 5.2Z.AI · 1.0M ctx · $$$
Z.AI's flagship open-weight model — strong agentic coding with a 1M context.
89/100
$1.61/1M · 89 community
7
Gemini 3.5 FlashGoogle · 1.0M ctx · $$$
Fast, capable, and cheap — a favourite default for high-throughput agents.
89/100
$3.75/1M · 89 community
8
Qwen3.7 MaxAlibaba (Qwen) · 1M ctx · $$$
Qwen's top hosted model — frontier-class quality with a 1M context at a fair price.
89/100
$2.00/1M · 89 community
9
GPT-5.3-CodexOpenAI · 400K ctx · $$$
OpenAI's coding-tuned model — the default behind Codex CLI's local runs.
89/100
$5.42/1M · 89 community
10
Kimi K2.7 CodeMoonshot AI · 262K ctx · $$
Open-weight model tuned for agentic coding — a community favourite for cheap, capable dev work.
88/100
$1.35/1M · 88 community
11
DeepSeek V4 ProDeepSeek · 1.0M ctx · $$
Open-weight near-frontier quality at roughly a tenth of flagship pricing.
88/100
$0.57/1M · 88 community
12
Kimi K2.6Moonshot AI · 262K ctx · $$
Moonshot's flagship open-weight generalist — broad capability with a long context.
87/100
$1.49/1M · 87 community
13
MiniMax M3MiniMax · 1.0M ctx · $$
An open-weight model built for agents — strong tool use with a 1M context at low cost.
85/100
$0.57/1M · 85 community
14
Claude Haiku 4.5Anthropic · 200K ctx · $$$
Anthropic's fast, cheap model for high-volume agent steps and subagents.
84/100
$2.20/1M · 84 community
15
Qwen3.7 PlusAlibaba (Qwen) · 1M ctx · $$
The value workhorse of the Qwen line — cheap, fast, and a 1M context.
84/100
$0.61/1M · 84 community
16
Grok 4.3xAI · 1M ctx · $$$
xAI's flagship — frontier quality at a surprisingly low price, with a 1M context.
84/100
$1.63/1M · 84 community
17
DeepSeek V4 FlashDeepSeek · 1.0M ctx · $
One of the cheapest capable open models — strong for high-volume agent loops.
83/100
$0.12/1M · 83 community
18
Devstral 2Mistral · 262K ctx · $$
Mistral's coding-and-agents specialist — built specifically for SWE-style agent tasks.
83/100
$0.88/1M · 83 community
19
Grok 4.20xAI · 2M ctx · $$$
The 2M-token member of the Grok line — built for very large agent contexts.
83/100
$1.63/1M · 83 community
20
GPT-5.4 MiniOpenAI · 400K ctx · $$$
OpenAI's cheap, fast model for high-volume agent calls.
82/100
$1.88/1M · 82 community
21
GLM 4.7Z.AI · 203K ctx · $$
The refinement of GLM 4.6 — slightly stronger coding at the same friendly price.
82/100
$0.81/1M · 82 community
22
Qwen3 235B A22BAlibaba (Qwen) · 262K ctx · $
A large open-weight MoE you can self-host — capable and extremely cheap to run hosted.
82/100
$0.09/1M · 82 community
23
Mistral Large 3Mistral · 262K ctx · $$
Mistral's open-weight flagship — a capable European generalist at a fair price.
80/100
$0.80/1M · 80 community
24
MiniMax M2.1MiniMax · 205K ctx · $$
The proven agent workhorse of the MiniMax line — excellent value for tool-heavy runs.
80/100
$0.49/1M · 80 community
25
GLM 4.6Z.AI · 203K ctx · $$
The breakout open-weight coding model — exceptional value that won over the agent community.
80/100
$0.82/1M · 80 community
26
DeepSeek V3.2DeepSeek · 131K ctx · $$
The proven open-weight value model — a staple of budget agent stacks.
77/100
$0.26/1M · 77 community
27
Gemini 3.1 Flash LiteGoogle · 1.0M ctx · $$
One of the cheapest 1M-context models — built for cheap, high-volume calls.
77/100
$0.63/1M · 77 community
28
Llama 4 MaverickMeta · 1.0M ctx · $$
Meta's open-weight MoE — cheap, fast, and a 1M context, with the broadest tooling support.
76/100
$0.28/1M · 76 community
29
DeepSeek R1 (0528)DeepSeek · 164K ctx · $$
The open-weight reasoning model that put long chain-of-thought in everyone's hands.
75/100
$0.99/1M · 75 community
30
CodestralMistral · 256K ctx · $$
Mistral's fast code-completion model — cheap and quick for inline assistance.
72/100
$0.48/1M · 72 community

The model is half the story — the agent is the other half

The model picks the moves; the agent runs the loop, the tools, and the guardrails. Once you've chosen a model, see which agent gets the most out of it.

Compare AI agents →

Frequently Asked Questions

For raw quality, Claude Opus 4.8 and GPT-5.5 lead; for the best balance of quality and price, Claude Sonnet 4.6 and GPT-5.4 are the common defaults. For value, open-weight models like DeepSeek V4 Pro, GLM 5.2, and Qwen3.7 win — they deliver near-frontier output at a fraction of the cost. The right pick depends on the agent and the task; our live community votes show what people actually run.