Which model should you run in your agent? Compare the models people actually use — Claude, GPT-5, Gemini, DeepSeek, Qwen, GLM, Kimi and more — ranked by output quality, agentic ability, speed, reliability, and value for money, with live pricing from OpenRouter. 25,378 community ratings so far — add yours.
Overall ranks by capability for agents — weighted toward agentic ability and output quality, calibrated against public benchmarks (SWE-bench, Terminal-Bench, Artificial Analysis, LMArena). Price isn't in this score: switch to Value for $ to rank by quality per dollar instead. Tap any model for the full breakdown.
Pick one you know — score it in 10 seconds. Only rate what you've run.
Average community score (out of 100) per dimension, from 25,378 ratings by people who've run these models in real agents. Rate a model you've used →
Correctness and depth of what it produces
Tool calls, instruction-following, and multi-step tasks
Tokens per second and time-to-first-token
How much capability you get per dollar
Consistent results — fewer refusals, loops, and format breaks
Most agents let you choose the model. These guides rank the best picks for each one — by quality, value, and how well they drive that agent.
Pick one from each column to see specs, pricing, a verdict, and live community votes.
The model picks the moves; the agent runs the loop, the tools, and the guardrails. Once you've chosen a model, see which agent gets the most out of it.