We use cookies to deliver and improve our services, analyze site usage, and if you agree, to customize or personalize your experience and market our services to you. You can read our Cookie Policy here.
This feature is eligible for Zero Data Retention (ZDR). When your organization has a ZDR arrangement, data sent through this feature is not stored after the API response is returned.
The effort parameter allows you to control how eager Claude is about spending tokens when responding to requests. This gives you the ability to trade off between response thoroughness and token efficiency, all with a single model. The effort parameter is available on all supported models with no beta header required.
The effort parameter is supported by Claude Opus 4.8, Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6, and Claude Opus 4.5.
For Claude Opus 4.6 and Sonnet 4.6, effort replaces budget_tokens as the recommended way to control thinking depth. Combine effort with adaptive thinking (thinking: {type: "adaptive"}) for the best experience. While budget_tokens is still accepted on Opus 4.6 and Sonnet 4.6, it is deprecated and will be removed in a future model release. At high (default) and max effort, Claude will almost always think. At lower effort levels, it may skip thinking for simpler problems.
By default, Claude uses high effort, spending as many tokens as needed for excellent results. You can raise the effort level to max for the absolute highest capability, or lower it to be more conservative with token usage, optimizing for speed and cost while accepting some reduction in capability.
Setting effort to "high" produces exactly the same behavior as omitting the effort parameter entirely.
The effort parameter affects all tokens in the response, including:
This approach has two major advantages:
| Level | Description | Typical use case |
|---|---|---|
max | Absolute maximum capability with no constraints on token spending. Available on Claude Opus 4.8, Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. | Tasks requiring the deepest possible reasoning and most thorough analysis |
xhigh | Extended capability for long-horizon work. Available on Claude Opus 4.8 and Claude Opus 4.7. | Long-running agentic and coding tasks (over 30 minutes) with token budgets in the millions |
high | High capability. Equivalent to not setting the parameter. | Complex reasoning, difficult coding problems, agentic tasks |
medium | Balanced approach with moderate token savings. | Agentic tasks that require a balance of speed, cost, and performance |
low | Most efficient. Significant token savings with some capability reduction. | Simpler tasks that need the best speed and lowest costs, such as subagents |
Effort is a behavioral signal, not a strict token budget. At lower effort levels, Claude will still think on sufficiently difficult problems, but it will think less than it would at higher effort levels for the same problem.
Sonnet 4.6 defaults to high effort. Explicitly set effort when using Sonnet 4.6 to avoid unexpected latency:
Start with xhigh for coding and agentic use cases, and use high as the minimum for most intelligence-sensitive workloads. Step down to medium for cost-sensitive workloads, or up to max only when your evals show measurable headroom at xhigh.
The API default is high. To use xhigh, set effort explicitly; the value you pass overrides the default.
| Effort | Guidance for Claude Opus 4.7 |
|---|---|
low | Efficient, but best for short, scoped tasks. Pair low with explicit checklists if your task has multiple sections. |
medium | The drop-in for the average workflow where you want good results while reducing costs. |
high | Advanced use cases that still need a balance of intelligence and token consumption. This is often the sweet spot balancing quality and token efficiency. |
xhigh | The recommended starting point for coding and agentic work, and for exploratory tasks such as repeated tool calling, detailed web search, and knowledge-base search. Expect meaningfully higher token usage than high. |
max | Reserve for genuinely frontier problems. On most workloads max adds significant cost for relatively small quality gains, and on some structured-output or less intelligence-sensitive tasks it can lead to overthinking. |
Claude Opus 4.7 also respects effort levels more strictly than Claude Opus 4.6, especially at low and medium. At lower effort levels, the model scopes its work to what was asked rather than going above and beyond. If you observe shallow reasoning on complex problems with Claude Opus 4.7, raise effort rather than prompting around it. If you must keep effort low for latency, add targeted guidance like "This task involves multi-step reasoning. Think carefully before responding."
When running Claude Opus 4.7 at xhigh or max effort, set a large max_tokens so the model has room to think and act across subagents and tool calls. Starting at 64k tokens and tuning from there is a reasonable default.
The guidance for Claude Opus 4.7 above also applies to Claude Opus 4.8. Start with xhigh for coding and agentic use cases, use high for most other intelligence-sensitive workloads, and step down to medium or low only when you've measured that the lower level holds quality on your evals.
The default is high on all surfaces, including the Claude API and Claude Code. Set effort explicitly to use a different level; the value you pass overrides the default.
When running Claude Opus 4.8 at xhigh or max effort, set a large max_tokens so the model has room to think and act across subagents and tool calls. Starting at 64k tokens and tuning from there is a reasonable default.
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=4096,
messages=[
{
"role": "user",
"content": "Analyze the trade-offs between microservices and monolithic architectures",
}
],
output_config={"effort": "medium"},
)
print(response.content[0].text)Claude Code's ultracode mode: ultracode appears in Claude Code's effort menu, but it is not an additional API effort level. The values documented on this page are the complete set the API accepts. Ultracode pairs the xhigh effort level with standing permission for Claude Code to launch multi-agent workflows, granted through Mid-conversation system messages. To build similar behavior with the API, see Build an orchestration mode.
When using tools, the effort parameter affects both the explanations around tool calls and the tool calls themselves. Lower effort levels tend to:
Higher effort levels may:
The effort parameter works alongside extended thinking. Its behavior depends on the model:
thinking: {type: "adaptive"}), where effort is the recommended control for thinking depth. Manual extended thinking (thinking: {type: "enabled", budget_tokens: N}) is not supported and returns a 400 error. The model decides when and how much to think based on each request, so it triggers thinking only as needed. At high, xhigh, and max effort, Claude almost always thinks deeply. At lower levels, it may skip thinking for simpler problems. Set thinking: {type: "adaptive"} to enable thinking; without it, requests run without thinking.thinking configuration required). thinking: {type: "disabled"} is rejected. Effort controls thinking depth the same way as on Opus 4.7 and Opus 4.6.thinking: {type: "adaptive"}), where effort is the recommended control for thinking depth. Manual extended thinking (thinking: {type: "enabled", budget_tokens: N}) is no longer supported on Opus 4.7; use adaptive thinking with effort instead. At high, xhigh, and max effort, Claude almost always thinks deeply. At lower levels, it may skip thinking for simpler problems.thinking: {type: "adaptive"}), where effort is the recommended control for thinking depth. While budget_tokens is still accepted on Opus 4.6, it is deprecated and will be removed in a future release. At high and max effort, Claude almost always thinks deeply. At lower levels, it may skip thinking for simpler problems.thinking: {type: "enabled", budget_tokens: N}) is still functional but deprecated.thinking: {type: "enabled", budget_tokens: N}), where effort works alongside the thinking token budget. Set the effort level for your task, then set the thinking token budget based on task complexity.The effort parameter can be used with or without extended thinking enabled. When used without thinking, it still controls overall token spend for text responses and tool calls.
high, but the right starting point depends on your model and workload.Was this page helpful?