Claude Extended Thinking: How To Trigger And Use The Deep Reasoning Mode?

SSupported by cloud service provider DigitalOcean – Try DigitalOcean now and receive a $200 when you create a new account!
Listen to this article

Claude Extended Thinking (also known as “Thinking Mode” or “Deep Reasoning”) is an inference-scaling feature developed by Anthropic that allows Claude models to pause, plan, self reflect, and reason step by step before delivering a final answer.

Instead of generating a response instantly, the model utilizes an internal “scratchpad” to break down complex tasks. This capability is natively built into flagship Anthropic models like Claude 3.7 Sonnet and the Claude 4 series.

You can learn what is Claude AI and how to use it for writing using step by step guide

How Claude’s Extended Thinking Works?

When Extended Thinking is active, Claude divides its generation phase into two parts:

  • The Thinking Block: An internal reasoning chain where Claude maps out logic, spots its own coding bugs, or self corrects errors. This step by step thought process is completely visible to the user via an expandable UI toggle.
  • The Text Block: The final, structured answer presented directly to you based on the insights gained during the thinking phase.

Core Capabilities

  • Adaptive Thinking: Instead of forcing you to guess how many tokens to allocate, newer models evaluate the complexity of your prompt and dynamically decide how much reasoning effort is required.
  • Effort Level Controls: Through the API or developer tools like Claude Code, users can set preference parameters (e.g., low, medium, high, max) to manually balance speed, token cost, and depth of intelligence.
  • Interleaved Tool Use: The model can think, execute a tool (like a web search or local file edit), review the tool’s results, and resume thinking before concluding its final answer.

Best Use Cases (When to Turn It On)

Extended Thinking is an “inference scaling” architecture, similar to OpenAI’s “o” series or DeepSeek R1, meaning more time spent thinking correlates to higher success rates on difficult problems. It shines best during:

  • Complex Coding & Architecture: Debugging intricate, multi file repositories or designing full system codebases.
  • Advanced Math & Science: Solving multi step physics formulas, logic puzzles, or mathematical proofs.
  • Long Horizon AI Agents: Sustaining logic and coherence across hundreds of consecutive autonomous steps.

Anthropic banner for Claude’s extended thinking featuring a neural network tree graphic

Limitations and Costs

  • Token Costs: You are billed for “thinking tokens” at the standard output token rate. Extensive thinking will increase your overall prompt cost.
  • Overthinking Traps: Anthropic’s research shows that forcing Claude to use extended thinking on highly intuitive or simple tasks (like creative writing or basic Q&A) can occasionally degrade output quality by up to 36%.
  • Latency: Because the model is systematically writing out a massive string of backend logic before answering, you will experience a delay before the final text response begins streaming.

How To Trigger And Use Claude’s Deep Reasoning Mode? Step By Step Guide

To trigger and use Claude’s deep reasoning mode (officially called Extended Thinking), you must activate a toggle or configure a parameter that instructs the model to process its logic step by step before answering.

Depending on whether you are using the web interface, the developer console, or the API, here is the complete beginner’s guide to unlocking this feature.

Method 1: Using the Claude.ai Web Interface

The web interface is the easiest, no code way for beginners to access deep reasoning.

  1. Log In: Open your browser and navigate to Claude.ai.
  2. Select a Compatible Model: Start a new chat and choose Claude 3.7 Sonnet or later from the model selector drop down menu.
  3. Turn on Thinking Mode: Look for the “Extended Thinking” toggle icon (often represented by a lightbulb or a circular brain icon) located directly inside or right below the text chat box. Click it to turn it On.
  4. Enter Your Prompt: Type out a highly complex prompt, such as an intricate coding bug or a multi layered logic puzzle.
  5. Watch the Process: Press enter. You will see a box that says “Thinking…”. You can click this box to expand it and watch Claude’s internal scratchpad draft and refine its logic in real time before it writes the final answer.

Method 2: Using the Anthropic Developer Console

If you want precise control over how long Claude spends “thinking,” use the Anthropic Workbench.

  1. Access the Workbench: Go to the Anthropic Console Workbench.
  2. Set the Model: In the right hand panel, select claude-3-7-sonnet (or any newer Claude model).
  3. Enable Thinking: In the model configuration settings panel on the right, find the Extended Thinking section and switch it to Enabled.
  4. Adjust the Budget: A slider or dropdown will appear allowing you to select the thinking effort level:
    Low/Medium: Best for standard logic or quicker debugging.
    High/Max: Allocates thousands of extra tokens for deep, rigorous scientific or algorithmic reasoning.
  5. Run the Prompt: Write your prompt in the center user window and click Run.

Method 3: Triggering via the API (For Developers)

If you are writing a script, you trigger deep reasoning by passing a thinking parameter object inside your API call payload.

python
import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-3-7-sonnet-20250219",
    max_tokens=4000,
    # 1. Trigger thinking mode by adding this block:
    thinking={
        "type": "enabled",
        "budget_tokens": 2048 # Max tokens dedicated purely to reasoning
    },
    messages=[
        {"role": "user", "content": "Optimize this complex SQL database schema for high-throughput write operations."}
    ]
)

print(response.content)

Note: When using the API, your max_tokens value must always be strictly greater than your budget_tokens, because the total tokens include both the hidden thinking process and the final text response.

Best Practices for Beginners

  • Don’t Use it for Everything: Avoid using deep reasoning for basic tasks like summarizing text, translating languages, or writing casual emails. It will unnecessarily slow down response times and eat up your token limits.
  • Write Open Ended Prompts: Give Claude room to think. Prompts like “Analyze all potential edge cases of this system design and list the vulnerabilities” work much better than simple yes/no questions.
  • Review the Thinking Block: If Claude gives you an incorrect answer, expand the thinking block. Reading its thought chain helps you understand exactly where its logic went off track so you can prompt it with a correction.

How Startups Use Claude Extended Thinking To Automate Full Stack Software Tasks?

Startups use Claude’s Extended Thinking to transition from simple “code generation” to fully autonomous, agentic software engineering. By giving the model a token allocated reasoning budget, startups build workflows where an AI agent acts like a self correcting junior developer rather than a passive text autocomplete tool.

Here is exactly how startups deploy this capability across full-stack engineering tasks.

The Autonomous “Code-Review-Fix” Loop

Traditional AI models generate code in a single burst. If that code contains a syntax error or architectural flaw, the user must manually copy the error back into the prompt. Startups use Extended Thinking to loop internally before delivering a final file.

  • How it works: Startups run tools like Claude Code inside their development environments.
  • The Extended Thinking action: When given a task (e.g., “Add a new Stripe webhook endpoint”), the model uses its thinking block to draft the code, mock-execute it, anticipate edge cases (like network timeouts), and rewrite its own draft before writing a single line to the actual repository.

Multi File Architectural Refactoring

Standard LLMs struggle with large scale refactoring because changing a database schema requires simultaneous, coordinated updates to backend routes, API controllers, and frontend state management.

  • How it works: Startups pass their entire code graph or relevant folder trees to the model via the API.
  • The Extended Thinking action: The model utilizes its extended reasoning budget to map out dependencies. In its hidden scratchpad, it creates an internal checklist:
    “First, I need to modify the Prisma schema file.”
    “Next, I must update the TypeScript interfaces to prevent compilation errors.”
    “Then, I need to adjust the React useFetch hooks to handle the new payload.”
  • The Result: It executes these edits across multiple files in a single, coherent pass, ensuring the full stack system doesn’t break.

Test Driven Development (TDD) Automation

Startups must move fast without breaking production code, making automated testing crucial but tedious to write manually.

  • How it works: A developer creates a basic feature ticket. The Extended Thinking agent is tasked with writing the tests first, then writing the feature code until all tests pass.
  • The Extended Thinking action: The model generates a Jest or Cypress test suite. It then writes the feature code. Using interleaved tool use, it executes the test command in a secure sandbox, reads the terminal output failure logs, reasons through the error inside its thinking block, adjusts the feature code, and re-runs the tests until it achieves a 100% green pass rate.

Self Healing CI/CD Pipelines

When a deployment fails on platforms like GitHub Actions or Vercel, startups use Extended Thinking to automatically intercept and patch the breaking build.

  • How it works: A webhook triggers an API call to Claude whenever a production build fails, passing the entire raw terminal error log.
  • The Extended Thinking action: Instead of hallucinating a generic fix, the model spends thousands of reasoning tokens tracing the exact dependency mismatch or environment variable error. It self corrects its assumptions, generates a git patch branch, and automatically submits a Pull Request to fix the build.

Startup Tech Stack Breakdown

To orchestrate this, engineering teams typically pair Claude’s API with specific developer tools:

Component How Startups Implement It Role of Extended Thinking
Agent Framework LangGraph or CrewAI Controls the high level loop and tool permissions.
Execution Sandbox E2B Data Sandbox or Docker Provides a secure, isolated cloud environment to run code.
Reasoning Engine claude-3-7-sonnet (Thinking Enabled) Evaluates terminal outputs, plans file edits, and tracks logic state.

 

,