Choosing an LLM model setting

In Agent Designer, you can tailor your agent's performance to best suit your specific use case, ensuring optimal results.

Use the Model Configuration section to set how the agent processes requests, balancing response speed with reasoning power and high-quality output.

You can choose between two different modes for your agent:

Standard - Provides responses with moderate to high latency. Best for well-defined tasks with predictable output structure where intermediate reasoning steps do not improve accuracy.
Fast - Provides responses with low latency. It is best for simple tasks like extraction, sentiment analysis, summarization, and formatting.

Configure the Large Language Model (LLM) based on the agent’s requirements. For example, if your agent performs simple, single-turn tasks, such as code and text formatting, you can select the Fast setting to reduce latency.

What is Extended Thinking?

You can combine Extended Thinking with Standard or Fast mode to enhance your agent's responses.

Use Extended Thinking when you need the agent to:

Perform complex reasoning tasks requiring multi-step deductive or inferential reasoning, complex code debugging, or planning under ambiguity
Coordinate across multiple tools and data sources

Extended Thinking allows the agent to reason internally (“thinking blocks”), using step-by-step analysis and problem-solving before generating its final output.

In the agent trace, the rationale section provides detailed reasoning that can assist with troubleshooting your agent tasks and instructions. It uses adaptive thinking to automatically decide when and how much to think based on the complexity of the action.

When Extended Thinking is on and the agent has no guardrails set, the chat interface streams the agent's plan in real time instead of a static "Thinking..." message. With guardrails set, the interface shows the standard "Thinking..." indicator instead.

Keep in mind that Extended Thinking increases latency. Test your agent to find the right balance between response depth and speed for your use case. Refer to Testing and Troubleshooting an agent to learn more about testing agents.

Which model configuration is best for my agent?

As you iteratively test your agent, decide which configuration is best for your use case. We recommend you start with the default Standard configuration with Extended Thinking turned on.

Review the following decision matrix to help you decide on a model setting.

Model setting decision matrix

Model setting	Recommended use cases	Performance and considerations
Standard + Extended Thinking (Default recommended)	Complex reasoning tasks Multi-step analysis Tasks requiring multi-step deductive or inferential reasoning Complex code reasoning and debugging Planning and decision-making under ambiguity Multi-hop question answering	Moderate to high latency High token usage Full chain-of-thought reasoning is visible in the Rationale trace Best when correctness and reasoning depth matter more than speed
Standard	Code generation for well-defined, moderately complex tasks Data analysis with a known schema and clear aggregation logic Structured content generation (reports, documentation) Single-turn tasks where output structure is predictable	Moderate to high latency Standard token usage No reasoning trace is visible in Rationale Suited for tasks where output structure is well-defined and intermediate reasoning steps do not improve accuracy
Fast + Extended Thinking	Multi-document synthesis Multi-step logic requiring speed Tool-calling agents coordinating across multiple sources in one turn Guided troubleshooting with conditional diagnostic paths	Low to moderate latency High token usage Reasoning is visible in the Rationale trace Use when the agent must reason across multiple inputs but response speed is a UX priority
Fast	Sentiment analysis and intent classification Field extraction and text formatting Summarisation Simple, single-step code generation (e.g. format conversion snippets) Single-hop Q&A Simple classification Real-time chat for simple, well-scoped user intents	Low latency Standard token usage No reasoning trace Chain-of-thought adds overhead without accuracy benefit for pattern-matching tasks Avoid for open-ended Q&A, multi-hop reasoning, or non-trivial code generation

Important considerations

Default configuration: The default model setting is Standard with Extended Thinking turned on.
When Extended Thinking mode is off, the rationale does not show the agent's reasoning. Instead, it displays “Applying optimizations to think and respond faster.”
When Extended Thinking is on and the agent has no guardrails set, the chat interface streams the agent's plan in real time instead of a static "Thinking..." message. With guardrails set, the interface shows the standard "Thinking..." indicator instead.

Existing agents and backward compatibility with Model Configuration

We’ve updated the legacy setting, Quick Inference, to Fast mode. AI agents that are in draft or in production before March 14, 2026 retain their current settings. No changes are made to agents already built. If the agent previously had:

Quick Inference turned on - Agent Model Configuration is now Fast with Extended Thinking turned off.
Quick Inference turned off - Agent Model Configuration is now Standard with Extended Thinking turned on.

Setting the model configuration for your agent

To set the agent's LLM configuration:

Navigate to Agentstudio > Agent Garden.
Open an existing agent or select Create New Agent. Refer to Building an agent for a complete step-by-step tutorial on building an agent in the Agent Designer.
In Profile, select one of the following options in the Model Configuration section:
- Standard - Provides responses with moderate to high latency. Best for well-defined tasks with predictable output structure, such as code generation for moderately complex tasks, data analysis with known schemas, and structured content generation.
- Fast - Provides responses with low latency. Best for pattern-matching tasks like sentiment analysis, field extraction, text formatting, simple classification, and single-hop Q&A. Avoid for open-ended Q&A, multi-hop reasoning, or non-trivial code generation.
Optional: Select Extended Thinking when you want the agent to perform a deep, step-by-step analysis to generate a response. Extended Thinking increases latency and token usage. Use with Standard mode for complex reasoning tasks where correctness and reasoning depth matter more than speed. Use with Fast mode when the agent must reason across multiple inputs but response speed is a UX priority. The agent’s reasoning and thinking display in the agent trace Rationale section. Review What is Extended Thinking? for details.

Next steps

Once you've selected a model configuration, test your agent to confirm it performs as expected for your use case. In the agent trace, review the Rationale section (when Extended Thinking is turned on) to verify that agent reasoning is as expected, or to troubleshoot unexpected outputs.

Refer to Testing and troubleshooting agents for guidance on interpreting results and refining your configuration.
Refer to Best practices for creating tasks and instructions to further optimize your agent's responses.
Refer to Building an agent if you haven't completed your agent setup yet.