Choosing an LLM model setting
In Agent Designer, you can tailor your agent's performance to best suit your specific use case, ensuring optimal results.
Use the Model Configuration section to set how the agent processes requests, balancing response speed with reasoning power and high-quality output.
You can choose between two different modes for your agent:
- Standard - Provides responses with moderate to high latency. Best for well-defined tasks with predictable output structure where intermediate reasoning steps do not improve accuracy.
- Fast - Provides responses with low latency. It is best for simple tasks like extraction, sentiment analysis, summarization, and formatting.
Configure the Large Language Model (LLM) based on the agent’s requirements. For example, if your agent performs simple, single-turn tasks, such as code and text formatting, you can select the Fast setting to reduce latency.
What is Extended Thinking?
You can combine Extended Thinking with Standard or Fast mode to enhance your agent's responses.
Use Extended Thinking when you need the agent to:
- Perform complex reasoning tasks requiring multi-step deductive or inferential reasoning, complex code debugging, or planning under ambiguity
- Coordinate across multiple tools and data sources
Extended Thinking allows the agent to reason internally (“thinking blocks”), using step-by-step analysis and problem-solving before generating its final output.
In the agent trace, the rationale section provides detailed reasoning that can assist with troubleshooting your agent tasks and instructions. It uses adaptive thinking to automatically decide when and how much to think based on the complexity of the action.
Keep in mind that Extended Thinking increases latency. Test your agent to find the right balance between response depth and speed for your use case. Refer to Testing and Troubleshooting an agent to learn more about testing agents.
Which model configuration is best for my agent?
As you iteratively test your agent, decide which configuration is best for your use case. We recommend you start with the default Standard configuration with Extended Thinking turned on.
Review the following decision matrix to help you decide on a model setting.
Model setting decision matrix
| Model setting | Recommended use cases | Performance and considerations |
|---|---|---|
| Standard + Extended Thinking (Default recommended) |
|
|
| Standard |
|
|
| Fast + Extended Thinking |
|
|
| Fast |
|
|
Important considerations
- Default configuration: The default model setting is Standard with Extended Thinking turned on.
- When Extended Thinking mode is off, the rationale does not show the agent's reasoning. Instead, it displays “Applying optimizations to think and respond faster.”
Existing agents and backward compatibility with Model Configuration
We’ve updated the legacy setting, Quick Inference, to Fast mode. AI agents that are in draft or in production before March 14, 2026 retain their current settings. No changes are made to agents already built. If the agent previously had:
- Quick Inference turned on - Agent Model Configuration is now Fast with Extended Thinking turned off.
- Quick Inference turned off - Agent Model Configuration is now Standard with Extended Thinking turned on.
Setting the model configuration for your agent
To set the agent's LLM configuration:
- Navigate to Agentstudio > Agent Garden.
- Open an existing agent or select Create New Agent. Refer to Building an agent for a complete step-by-step tutorial on building an agent in the Agent Designer.
- In Profile, select one of the following options in the Model Configuration section:
- Standard - Provides responses with moderate to high latency. Best for well-defined tasks with predictable output structure, such as code generation for moderately complex tasks, data analysis with known schemas, and structured content generation.
- Fast - Provides responses with low latency. Best for pattern-matching tasks like sentiment analysis, field extraction, text formatting, simple classification, and single-hop Q&A. Avoid for open-ended Q&A, multi-hop reasoning, or non-trivial code generation.
- Optional: Select Extended Thinking when you want the agent to perform a deep, step-by-step analysis to generate a response. Extended Thinking increases latency and token usage. Use with Standard mode for complex reasoning tasks where correctness and reasoning depth matter more than speed. Use with Fast mode when the agent must reason across multiple inputs but response speed is a UX priority. The agent’s reasoning and thinking display in the agent trace Rationale section. Review What is Extended Thinking? for details.
Next steps
Once you've selected a model configuration, test your agent to confirm it performs as expected for your use case. In the agent trace, review the Rationale section (when Extended Thinking is turned on) to verify that agent reasoning is as expected, or to troubleshoot unexpected outputs.
- Refer to Testing and troubleshooting agents for guidance on interpreting results and refining your configuration.
- Refer to Best practices for creating tasks and instructions to further optimize your agent's responses.
- Refer to Building an agent if you haven't completed your agent setup yet.