Models Reference

Last updated: 2026-05-20 | Auto-synced daily

API pricing, context windows, and SWE-Bench scores for coding AI models.
Compiled May 2026.

Anthropic (Claude)

Current as of May 2026. Source: platform.claude.com

Model	Input /1M	Output /1M	Batch (50% off)	Cache Writes (5m)	Cache Hits
Opus 4.7	$5.00	$25.00	✓	$6.25/MTok	$0.50/MTok
Opus 4.6	$5.00	$25.00	✓	$6.25/MTok	$0.50/MTok
Opus 4.5	$5.00	$25.00	✓	$6.25/MTok	$0.50/MTok
Sonnet 4.6	$3.00	$15.00	✓	$3.75/MTok	$0.30/MTok
Haiku 4.5	$1.00	$5.00	✓	$1.25/MTok	$0.10/MTok

Opus 4.7: 87.6% SWE-Bench Verified (#2). Opus 4.5: 80.9%. Opus 4.6: 80.8%. Sonnet 4.6: 79.6%.

Google Gemini

Current as of May 2026. Source: ai.google.dev, SWE-Bench

Current Models

Model	Input /1M	Output /1M	Context	Max Output	SWE-Bench Verified	Notes
Gemini 3.1 Pro Preview	$2.00 ($4.00 >200K)	$12.00 ($18.00 >200K)	2M	16K	80.6%	Preview. Top-tier reasoning. 2M ctx
Gemini 3.1 Flash-Lite Preview	$0.25	$1.50	1M	64K	—	Fast, high-volume agentic tasks
Gemini 2.5 Pro	$1.25 ($2.50 >200K)	$10.00 ($15.00 >200K)	2M	64K	—	Complex reasoning, coding, long docs
Gemini 2.5 Flash	$0.30	$2.50	1M	64K	—	Balanced cost and capability
Gemini 2.5 Flash-Lite	$0.10	$0.40	1M	64K	—	Lowest-cost current Gemini route

Batch / Flex Pricing (50% off)

Model	Batch Input /1M	Batch Output /1M
Gemini 3.1 Pro (≤200K)	$1.00	$6.00
Gemini 3.1 Flash-Lite	$0.125	$0.75
Gemini 2.5 Pro (≤200K)	$0.625	$5.00
Gemini 2.5 Flash	$0.15	$1.25
Gemini 2.5 Flash-Lite	$0.05	$0.20

Deprecated

Model	Input /1M	Output /1M	Note
Gemini 2.0 Flash	$0.10	$0.40	Shutdown Jun 1 2026

Gemini 3.1 Pro is a preview model (restrictive rate limits). Free tier available for development and small projects. Gemini 3.1 Pro scores 80.6% on SWE-Bench Verified — competitive with Claude Opus 4.6 (80.8%) and DeepSeek V4 Flash (79%).

DeepSeek

Current as of May 2026. Source: api-docs.deepseek.com

DeepSeek V4 is the current flagship, launched March 2026. 671B total params, 37B active MoE, 1M context. SWE-Bench Verified: V4 Pro Max / V4 Pro 80.6%, V4 Flash 79%. V4 Flash is the default workhorse; V4 Pro is premium (75% off until May 31 2026).

New: DeepSeek V4 Pro Max

Released Apr 24 2026. 1.6T params, 49B active MoE, 1M context, open-weight on HuggingFace. 80.6% SWE-Bench Verified. Available at V4 Pro pricing (same API endpoint).

Model	Cache Hit Input /1M	Cache Miss Input /1M	Output /1M	Context	Notes
deepseek-v4-flash	$0.0028	$0.14	$0.28	1M	Default route. 384K max output
deepseek-v4-pro (promo)	$0.003625	$0.435	$0.87	1M	75% off until May 31 2026 15:59 UTC
deepseek-v4-pro (full)	$0.0145	$1.74	$3.48	1M	Full price after promo ends

Cache hit prices reduced to 1/10 of launch price from Apr 26 2026. Older aliases deepseek-chat and deepseek-reasoner map to V4 Flash (non-thinking / thinking) and retire after Jul 24 2026. New accounts get 5M free tokens.

Legacy Models

Model	Input /1M	Output /1M	Cache Hit	Context	Notes
DeepSeek V3.2 (Chat)	$0.28	$0.42	$0.028	128K	Previous gen, still available
DeepSeek R1	$0.55	$2.19	$0.14	128K	Dedicated reasoning model

DeepSeek V3.2: 73.0% SWE-Bench Verified. R1: chain-of-thought reasoning, ~96% cheaper than OpenAI o1. DeepSeek web chat at chat.deepseek.com is free for individual users.

OpenAI (ChatGPT)

Current as of May 2026. Source: openai.com/api/pricing

GPT-5 Family (Current Flagship)

Model	Input /1M	Output /1M	Cached Input	Context	Notes
GPT-5.5 (≤272K)	$5.00	$30.00	$0.50	1M	88.7% SWE-Bench (#1), 58.6% SWE-Bench Pro. Flagship reasoning + coding
GPT-5.5 (>272K)	$10.00	$45.00	$1.00	1M	Long context tier >272K tokens
GPT-5.5 Pro	$30.00	$180.00	—	1M	Premium tier for research-grade problems
GPT-5.4 (≤272K)	$2.50	$15.00	$0.25	1M	~80% SWE-Bench Verified. 59.1% SWE-Bench Pro
GPT-5.4 (>272K)	$5.00	$22.50	$0.50	1M	Long context tier >272K tokens
GPT-5.4 Mini	$0.75	$4.50	$0.075	400K	Affordable reasoning. Supports reasoning effort control
GPT-5.4 Nano	$0.20	$1.25	—	400K	Fastest, cheapest 5.4 tier. Ideal for summaries, classification
GPT-5.3 Codex	$1.75	$14.00	—	400K	85.0% SWE-Bench Verified (#3). 56.8% SWE-Bench Pro. Coding specialist

GPT-4.1 Family (Production Workhorse)

Model	Input /1M	Output /1M	Cached Input	Context	Notes
GPT-4.1	$2.00	$8.00	$0.50	1M	Recommended production model. Strong coding + long context
GPT-4.1 Mini	$0.40	$1.60	$0.10	1M	Good balance of power and affordability
GPT-4.1 Nano	$0.10	$1.40	—	1M	Cheapest OpenAI model. Classification, extraction, routing

o-Series (Reasoning Models)

Model	Input /1M	Output /1M	Cached Input	Context	Notes
o4-mini	$1.10	$4.40	$0.275	200K	Best-value reasoning. Math, science, complex logic
o3	$2.00	$8.00	—	—	Flagship reasoning. Chain-of-thought built in

Batch API saves 50% on all models. Prompt caching discounts: up to 90% off (GPT-5.5), 75% off (GPT-4.1). GPT-5.5 scores 88.7% SWE-Bench Verified and 58.6% SWE-Bench Pro. GPT-5.4 scores ~80% SWE-Bench Verified and 59.1% SWE-Bench Pro. GPT-5.5 Pro tier ($30/$180) is available for research-grade problems. GPT-4.1 is OpenAI’s recommended production default for most workloads.

MiniMax

Current as of May 2026. Source: platform.minimax.io, OpenRouter

Coding Models

Model	Input /1M	Output /1M	Context	Max Output	SWE-Bench	Speed
M2.7	$0.279	$1.20	205K	131K	—	Released Mar 18 2026
M2.5 Standard	$0.15	$1.20	256K	—	80.2%	~50 TPS
M2.5 Lightning	$0.30	$2.40	256K	—	80.2%	~100 TPS

M2.5 Standard: One of the best value coding models. Automatic cache (no config needed). Near Claude Opus 4.6 (80.8%). OpenCode Go estimates: M2.5 ~6,300 req/5h, M2.7 ~3,400 req/5h.

Subscription Plans

Plan	Price	Description
Token Plan	Subscription	Quotas for individual builders and Teams
Credits	Prepaid	Same resource coverage as Token Plan
Pay-as-you-go	Per-token	Standard API endpoint billing

Qwen (Alibaba)

Current as of May 2026. Source: DashScope direct pricing

Current Gen (Qwen3.6)

Model	Input /1M	Output /1M	Context	SWE-Bench	Notes
Qwen3.6 Plus	$0.325	$1.95	1M	78.8% Verified	Apr 2 2026. Hybrid attention + MoE. Reasoning by default
Qwen3.6 Flash	$0.25	$1.50	1M	—	Cost-optimized tier
Qwen3.6 Max Preview	$1.30	$7.80	256K	SWE-Bench Pro #1	Apr 20 2026. Closed-weights flagship. Leads SWE-Bench Pro, Terminal-Bench 2.0, SkillsBench, SciCode

Qwen3.6 Plus: within 2 points of Claude Opus 4.6 (80.8%) at 1/30th the input price. 1M native context, 65K max output. Reasoning enabled by default (no mode toggle). Qwen3.6-27B (dense, Apache 2.0): 77.2% SWE-Bench Verified — strong self-hosting option. Qwen3.6-Max-Preview (Apr 20 2026): First closed-weights Qwen flagship. $1.30/$7.80 per MTok. 256K context. Tops SWE-Bench Pro + 5 other coding benchmarks at launch.

Previous Gen (Qwen3.5)

Model	Input /1M	Output /1M	Context	Notes
Qwen3.5 Plus	$0.26	$1.56	1M	Feb 2026 release. 65K max output
Qwen3.5 397B A17B	Free	Free	262K	Open-weight MoE flagship

Qwen-Max (Legacy Flagship)

Model	Input /1M	Output /1M	Context
qwen3-max (0-32K)	$1.20	$6.00	252K
qwen3-max (32K-128K)	$2.40	$12.00	252K
qwen3-max (128K-252K)	$3.00	$15.00	252K
qwen-max (older)	$1.60	$6.40	—

All Qwen models support native tool-calling, JSON-mode, and OpenAI-compatible API shapes. Batch calling: 50% off. Context caching discounts available on supported models.

GLM / Z.ai

Current as of May 2026. Source: docs.z.ai

Flagship Models (GLM-5 Series)

Model	Context	SWE-Bench	Input /1M	Output /1M	Cached Input	License
GLM-5.1	203K	Pro 58.4% (best-in-class)	$1.40	$4.40	$0.26	MIT, 754B params
GLM-5	202K	Verified 77.8%	$1.00	$3.20	$0.20	MIT, 744B/40B MoE
GLM-5-Turbo	202K	—	$1.20	$4.00	$0.24	Proprietary

GLM-5.1 (Apr 7 2026): 8-hour autonomous runs, 1,700 agentic steps. Surpasses GPT-5.4 and Claude Opus 4.6 on SWE-Bench Pro. GLM-5: 744B params, 40B active MoE, 28.5T token pretraining.

Previous Gen (GLM-4 Series)

Model	Context	Input /1M	Output /1M	Cached Input	Notes
GLM-4.7	128K	$0.60	$2.20	$0.11	73.8% SWE-Bench Verified
GLM-4.7-FlashX	203K	$0.07	$0.40	$0.01	Fast inference variant
GLM-4.6	128K	$0.60	$2.20	$0.11	Previous generation
GLM-4.5-X	128K	$2.20	$8.90	$0.45	Premium tier
GLM-4.5	128K	$0.60	$2.20	$0.11	Standard tier
GLM-4.5-Air	128K	$0.20	$1.10	$0.03	Lightweight, Haiku-class
GLM-4.5-AirX	128K	$1.10	$4.50	$0.22	Fast Air variant
GLM-4-32B-0414-128K	128K	$0.10	$0.10	—	Budget open-weight

Free Models

Model	Context	Input	Output
GLM-4.7-Flash	203K	Free	Free
GLM-4.5-Flash	—	Free	Free

Vision Models

Model	Input /1M	Output /1M	Cached Input
GLM-5V-Turbo	$1.20	$4.00	$0.24
GLM-4.6V	$0.30	$0.90	$0.05
GLM-4.6V-FlashX	$0.04	$0.40	$0.004
GLM-OCR	$0.03	$0.03	—
GLM-4.6V-Flash	Free	Free	Free

Xiaomi MiMo

Current as of May 2026. V2 launched Mar 18 2026, V2.5 launched Apr 22 2026. Source: mimo-v2.com

Model	Input /1M	Output /1M	Context	Modalities	Notes
MiMo-V2-Pro (≤256K)	$1.00	$3.00	1M	Text	78.0% SWE-Bench. 1T params, 42B active
MiMo-V2-Pro (256K–1M)	$2.00	$6.00	1M	Text	Long-context tier
MiMo-V2.5-Pro (≤256K)	$1.00 ($0.20 cached)	$3.00	1M	Text	Apr 22 2026. MIT license. 1T params. 57.2% SWE-Bench Pro
MiMo-V2.5-Pro (256K–1M)	$2.00	$6.00	1M	Text	Long-context tier
MiMo-V2-Omni	~$1.00	~$3.00	256K	Text, Image, Audio, Video	Multimodal flagship
MiMo-V2-Flash	$0.10	$0.30	256K	Text	Open-source foundation model
MiMo-V2-TTS	Free	Free	—	Audio	Limited time promo

API at platform.xiaomimimo.com. OpenAI-compatible. Credit plans available: Lite $6/mo, Standard $16/mo, Pro $50/mo, Max $100/mo.

Kimi / Moonshot AI (K2.6)

Current as of May 2026. Source: kimi.com, OpenRouter

Both models: 1T params, 32B active MoE, 384 experts, MIT license.

Model	Cache Hit /1M	Cache Miss /1M	Output /1M	Context	SWE-Bench
kimi-k2.6	$0.16	$0.95	$4.00	262K	Verified 80.2%, Pro 58.6%, BrowseComp 83.2%
kimi-k2.5	—	$0.40	$1.90	256K	Verified 76.8%, BrowseComp 78.4%

K2.6: 300 parallel sub-agents, 4,000+ tool calls, 12+ hr continuous execution. K2.5: 100 parallel sub-agents.

Membership Plans

Plan	Price/mo	Agent Usage
Adagio	Free	6
Moderato	$15	60
Allegretto	$31	150
Allegro	$79	360
Vivace	$159	720

OpenCode Go

Source: docs.openclaw.ai. Dollar-value limits ($12/5h, $30/week, $60/month).

Available Models

Model Ref	Name
opencode-go/glm-5	GLM-5
opencode-go/glm-5.1	GLM-5.1
opencode-go/kimi-k2.5	Kimi K2.5
opencode-go/kimi-k2.6	Kimi K2.6 (3x limits)
opencode-go/deepseek-v4-pro	DeepSeek V4 Pro
opencode-go/deepseek-v4-flash	DeepSeek V4 Flash
opencode-go/mimo-v2-omni	MiMo V2 Omni
opencode-go/mimo-v2-pro	MiMo V2 Pro
opencode-go/mimo-v2.5	MiMo V2.5
opencode-go/mimo-v2.5-pro	MiMo V2.5 Pro
opencode-go/minimax-m2.5	MiniMax M2.5
opencode-go/minimax-m2.7	MiniMax M2.7
opencode-go/qwen3.5-plus	Qwen3.5 Plus
opencode-go/qwen3.6-plus	Qwen3.6 Plus
opencode-go/qwen3.6-max-preview	Qwen3.6 Max Preview

Request Estimates (May 19 2026)

Model	Per 5h	Per Week	Per Month
GLM-5.1	880	2,150	4,300
GLM-5	1,150	2,880	5,750
Kimi K2.5	1,850	4,630	9,250
MiMo-V2-Pro	1,290	3,225	6,450
MiMo-V2.5-Pro	1,290	3,225	6,450
MiMo-V2-Omni	2,150	5,450	10,900
Qwen3.6 Plus	3,300	8,200	16,300
Qwen3.6 Max Preview	820	2,050	4,100
MiniMax M2.7	3,400	8,500	17,000
MiniMax M2.5	6,300	15,900	31,800
Qwen3.5 Plus	10,200	25,200	50,500

MiniMax M2.5: 80.2% SWE-Bench — near Claude Opus 4.6 (80.8%).

Notes

BytePlus ModelArk: Quota shared across Claude Code, Cursor, Cline, Codex CLI, Kilo Code, Roo Code, OpenCode
GitHub Copilot: Premium requests shared across all features; extra $0.04 each on Pro/Pro+
Claude Code: Exact request counts not published — only relative multipliers
GLM quota multipliers: Peak hours drain 3x quota; off-peak 2x; GLM-4.7/4.5-Air always 1x
MiMo: Pure credit pool, no 5h/windows, credits expire month-end
Kimi: API billed separately — not included in membership

Benchmark Note: SWE-Bench Verified measures a model’s ability to resolve real-world GitHub issues from code repositories. Not all providers publish scores — the chart above only includes models with verified data.