THE GOOD ARCHITECT

CLARITY.NOT HYPE.

We break down emerging technologies so engineers can build with confidence.

29
SEP

Anthropic Unveils Claude Sonnet 4.5: The New Coding Champion

AI & ML, Developer Tools

Anthropic just dropped Sonnet 4.5, claiming it's the best coding model out there.

SWE-bench Verified Scores

Claude Sonnet 4.5
77.2%
Claude Opus 4.1
74.5%
GPT-5
68%

Key Features

  • 200K context window: Read approximately 500 pages of code at once
  • Agent SDK with state persistence: Remembers context between API requests
  • Multi-step workflows: Chain tasks like "find bugs, write tests, fix them"

Pricing per Million Tokens

Claude Sonnet 4.5
Input$3
Output$15
Claude Opus 4.1
Input$15
Output$75
GPT-5
Input$1.25
Output$10
Our Take:

Claude Sonnet 4.5 is the new coding leader. It scores 77.2% on SWE-bench, ahead of GPT-5 at 68% and Opus 4.1 at 74.5%.

You'll pay a bit more for that performance. Sonnet costs about 50% more on output ($15 vs $10 per million tokens), and since coding is mostly output-heavy, you can expect total costs to be roughly 1.5 times higher than GPT-5.

Even so, the value is clear. Sonnet outperforms Opus while being five times cheaper, making it the smarter choice when quality matters.

Recommendation:

Use Sonnet 4.5 when accuracy and reliability are the priority.

Choose GPT-5 if you need to control costs and for simpler tasks.

Avoid Opus 4.1 since it's more expensive and less capable for most tasks.

Architect Playbook: When to Choose Each Model

Choose Claude Sonnet 4.5 When:

  • Quality matters more than cost
    You need the best coding performance available (77.2% SWE-bench)
  • Complex code generation
    Multi-step workflows, large refactoring, architectural changes
  • Large codebases
    The 200K context window handles entire repos (1M-token option in beta for select enterprise and tier-4 API users)
  • Budget: Mid-tier
    You can afford 1.5x GPT-5 pricing for better results

Choose GPT-5 When:

  • Cost-sensitive projects
    You need good-enough quality at lower price
  • High-volume API calls
    Scale matters and costs add up quickly
  • Simple code tasks
    Routine debugging, basic generation, standard patterns
  • Budget: Tight
    Approximately 40% cheaper than Sonnet for acceptable results

Choose Claude Opus 4.1 When:

  • Only for GUI automation
    Opus scores 61.4% on OSWorld (GUI tasks) vs Sonnet's 44%. For coding, Sonnet beats Opus at 5x less cost.

Questions?

Common questions about Claude Sonnet 4.5, GPT-5, and Opus 4.1 performance, pricing, and use cases.

Is Claude Sonnet 4.5 better than GPT-5 for coding?

Yes, Claude Sonnet 4.5 scores 77.2% on SWE-bench Verified compared to GPT-5's 68%. However, it costs more ($3/$15 vs $1.25/$10 per million tokens). Choose Sonnet 4.5 when quality matters more than cost.

How much does Claude Sonnet 4.5 cost compared to GPT-5?

Claude Sonnet 4.5: $3 input / $15 output per million tokens
GPT-5: $1.25 input / $10 output per million tokens

Since coding is mostly output heavy, expect total costs to be roughly 1.5 times higher than GPT-5.

Should I choose Claude Sonnet 4.5 or Opus 4.1?

Claude Sonnet 4.5 (77.2% SWE-bench) beats Opus 4.1 (74.5%) at 5x less cost. Sonnet costs $3/$15 per million tokens while Opus costs $15/$75. Choose Opus only for GUI automation where it excels (61.4% vs 44% on OSWorld).

What is Agent SDK with state persistence?

The Agent SDK allows Claude to remember context between separate API requests. Instead of starting fresh each time, it maintains state across a session. This means it can handle multi-step tasks like "find bugs, write tests, fix them" without losing context between steps.

What is the 200K context window good for?

The 200K context window lets you process approximately 500 pages of code at once, enabling:

  • Whole-codebase analysis and refactoring
  • Comprehensive code reviews across multiple files
  • Large-scale architectural changes without chunking
  • End-to-end test generation from full spec documents

1M Token Window (Beta): For tier-4 and enterprise users with custom rate limits, a 1-million-token context window is available in beta (use context-1m-2025-08-07 header). This allows processing 5x larger codebases but costs more (2x input, 1.5x output rates). Available on Claude API, Amazon Bedrock, and Google Cloud Vertex AI.

Sources & References