Skip to main content
Feedback

Constructing Agent Metrics

note

The invocation metrics object should include a maximum of 50 invocation metrics per request, or the total request size should not exceed 5 MB.

You can instrument your AI agent application to collect invocation metrics and export them to a centralized metrics ingestion endpoint. Once integrated, you can monitor agent performance, including latency, token usage, tool calls, errors, and guardrail activity, for analytics purposes.

The integration works with any framework, regardless of which one you choose. Whether you are using CrewAI, LangChain, LlamaIndex, OpenAI Agents SDK, AutoGen, or any other orchestration layer, the approach remains the same: wrap the agent invocation with lightweight instrumentation, collect the metrics, and send a single JSON payload to the endpoint after each run, or batch and send an array of JSON metrics.

Supported Frameworks: This applies to all supported agent providers, including CrewAI, LangChain, LangGraph, LlamaIndex, OpenAI Agents SDK, Microsoft AutoGen, Semantic Kernel, PydanticAI, SmolAgents, Strands Agents, Mastra, and others. See for framework-specific extraction guidance.

Integration Flow

The integration follows a four-stage pipeline that runs alongside your existing agent logic:

constructing agent metrics

  1. Agent Invocation — Your agent runs as usual, with no changes to business logic.
  2. Collect Metrics — During execution, capture start and end timestamps, token usage from the LLM, tool call results, and any errors.
  3. Build Payload — After execution completes, structure the collected data into the resourceMetrics JSON format. Refer Metric payload schema
  4. POST to Endpoint — Send the payload to the ingestion endpoint using a single HTTPS POST request.
  5. Design Principle: The instrumentation is non-blocking — it won't pause or delay your agent's execution. Run your agent as normal and send metrics after each invocation completes. The POST request must not be part of the critical execution path and should not impact the execution flow.

Metrics Payload Schema

Every POST request to the ingestion endpoint must include a JSON body containing a top-level resourceMetrics array. Each element in this array represents a single agent invocation.

Envelope

{
"resourceMetrics": [ <ResourceMetric>, ... ]
}

ResourceMetric Object

FieldTypeRequiredDescription
extAccountAliasIdstringRequiredYour account UUID provided during onboarding.
providerTypestringRequiredAgent framework constant (e.g. CREWAI, LANGCHAIN). Refer Framework Specific Examples.
operationstringRequiredOperation type. Use InvokeAgent for standard agent runs.
sessionIdstringRequiredA unique UUID (v4) per run. Generated fresh for every invocation.
schemaVersionstringRequiredPayload version. Always send "1.0.0".
timenumberRequiredUnix timestamp in milliseconds at invocation start. Example: 1775730591000
extModelIdstringOptionalLLM model identifier (e.g. gpt-4o, claude-3-5-sonnet, llama3.2).
promptTypestringOptionalPrompt classification. Common values: CHAT, COMPLETION, Manual.
totalTimenumberOptionalTotal wall-clock time of the invocation in milliseconds.
ttftnumberOptionalTime-to-first-token in milliseconds. Send 0 if streaming is not used or not measurable.
modelLatencynumberOptionalTime spent inside the LLM call in milliseconds, excluding tool execution time.
modelInvocationCountintegerOptionalTotal LLM calls made during this run. Increment if your agent loops. Default: 1
inputTokenCountintegerOptionalTotal prompt/input tokens across all LLM calls in this run. Default: 0
outputTokenCountintegerOptionalTotal completion/output tokens across all LLM calls in this run. Default: 0
invocationServerErrorsintegerOptionalCount of HTTP 5xx errors during the agent invocation. Default: 0
invocationClientErrorsintegerOptionalCount of HTTP 4xx errors during the agent invocation. Default: 0
modelInvocationThrottlesintegerOptionalCount of HTTP 429 rate-limit responses from the LLM provider. Default: 0
modelInvocationClientErrorsintegerOptionalLLM API-level 4xx errors. Default: 0
modelInvocationServerErrorsintegerOptionalLLM API-level 5xx errors and connection failures. Default: 0
modelInvocationUnknownErrorsintegerOptionalErrors not fitting other categories. Default: 0
guardrailHitsintegerOptionalNumber of times a guardrail, content filter, or safety check triggered. Default: 0
toolsarrayOptionalArray of ToolMetric objects. Omit if your agent used no tools. Refer Tool Metric Object
note

All fields use camelCase. Always include fields marked as Required.

ToolMetric Object

Include one ToolMetric entry per tool category used.

FieldTypeDescription
toolTypestringCategory. Use "api" for REST/HTTP tools and "mcp" for MCP tools.
toolCallsintegerTotal number of calls attempted for this tool type.
successCountintegerNumber of calls that completed successfully.
failureCountintegerNumber of calls that resulted in an error or exception.

Integration Guide

Follow these five steps to instrument any agent framework. The approach remains the same across frameworks — only the API calls you use to extract data will differ.

1. Track Session and Timing

Before invoking your agent, capture the start time and generate a unique session ID. After the agent completes, record the end time.

import uuid, time

# Before agent call
session_id = str(uuid.uuid4()) # unique per run
start_time = time.perf_counter() # monotonic — for duration
start_epoch = int(time.time() * 1000) # Unix ms — for 'time' field

# ... run your agent ...

# After agent call
end_time = time.perf_counter()
total_time = (end_time - start_time) * 1000 # milliseconds
tip

Use time.perf_counter() (Python) or performance.now() (JavaScript) to measure durations, because system clock changes do not affect them. Use time.time() or Date.now() only for absolute timestamps.

2. Capture LLM Token Usage

Most agent frameworks expose token usage within the invocation response. Extract them from the response object after the execution completes.

FrameworkInput TokensOutput Tokens
CrewAIresult.token_usage.prompt_tokensresult.token_usage.completion_tokens
LangChain / LangGraphresult.usage_metadata["input_tokens"]result.usage_metadata["output_tokens"]
LlamaIndexresponse.metadata["token_usage"].prompt_tokensresponse.metadata["token_usage"].completion_tokens
OpenAI Agents SDKresult.raw_responses[-1].usage.input_tokensresult.raw_responses[-1].usage.output_tokens
Microsoft AutoGenresponse.usage.prompt_tokensresponse.usage.completion_tokens
Semantic Kernelresult.metadata["usage"].prompt_tokensresult.metadata["usage"].completion_tokens
PydanticAIresult.usage().request_tokensresult.usage().response_tokens
Mastra / AI SDK (JS)result.usage.promptTokensresult.usage.completionTokens
Custom / OtherParse usage fields from raw LLM API response
note

Agentic Loops: If your agent executes multiple LLM calls in a single run (e.g., ReAct-style reasoning), aggregate token usage across all calls and report totals. Also increment modelInvocationCount for each LLM call made.

3. Count Tool Calls

Iterate through all tool calls executed by your agent and categorize them by type. Most frameworks expose tool calls through the final result object or event/callback hooks.

# Pseudo-code — adapt field names to your framework
api_calls = api_success = api_failure = 0
mcp_calls = mcp_success = mcp_failure = 0

for tool_call in result.tool_calls: # framework-specific iteration
if tool_call.type == "api":
api_calls += 1
if tool_call.error:
api_failure += 1
else:
api_success += 1
elif tool_call.type == "mcp":
mcp_calls += 1
if tool_call.error:
mcp_failure += 1
else:
mcp_success += 1

If your framework does not expose per-call success or failure details, count all calls as successful and set failureCount at 0.

4. Classify Errors

Wrap the agent invocation in a try/except (or equivalent) block and map any exceptions to the appropriate error counters.

try:
result = agent.run(input)

except RateLimitError: # HTTP 429
model_invocation_throttles = 1

except ServerError as e: # HTTP 5xx
if e.status_code >= 500:
invocation_server_errors = 1

except ClientError as e: # HTTP 4xx
if e.status_code >= 400:
invocation_client_errors = 1

except ConnectionError: # Network failure
model_invocation_server_errors = 1

except Exception: # Anything else
model_invocation_unknown_errors = 1

Always send metrics, even when an error occurs. Error metrics are still valuable. If you catch an exception, build and send the payload and set the appropriate error counter to 1. Set all other unknown numeric fields to 0.

5. Send Metrics to the Endpoint

After you collect and compute all metrics, assemble them into the final payload and send a POST request to the ingestion endpoint.

Authentication & Configuration

Required HTTP Headers

HeaderValueNotes
Content-Typeapplication/jsonAlways required
AuthorizationBasic BOOMI_TOKEN.user@boomi.com:api-tokenPlatform API token you create for authentication
OriginYour application originOptional. Required only if the endpoint enforces CORS policy
VariableDescriptionExample
AI_METRICS_ENDPOINTFull URL of the ingestion endpointhttps://api.example.com/v1/metrics
AI_METRICS_ORIGINOrigin header value (optional)https://app.your-agent.com
EXT_ACCOUNT_ALIAS_IDYour account UUID from onboarding3362d163-b990-49a6-...
PROVIDER_TYPEYour agent framework identifierLANGCHAIN

Retry on Transient Failures: Implement exponential backoff with at least 3 retry attempts. Retry the request only for the following HTTP status codes: 429, 500, 502, 503, 504

Framework-Specific Examples

Provider Type Constants

Use the exact providerType constant that matches your framework:

ConstantFramework Name
CUSTOM_PROVIDERCustom / bring-your-own framework
ADOBE_EXPERIENCE_PLATFORM_AGENT_ORCHESTRATORAdobe Experience Platform Agent Orchestrator
AG2AG2
AGENT_GARDENAgent Garden
AGNOAgno
AISDKAI SDK (Vercel)
AKKIOAkkio
AUTO_GPTAuto-GPT
BABYAGIBabyAGI
BEDROCKAmazon Bedrock
CAMEL_AICAMEL-AI
CHATFUELChatFuel
CLOUDFLARE_AGENTSCloudflare Agents
CREWAICrewAI
DATAROBOT_NO_CODE_AI_APPSDataRobot No-Code AI Apps
DIFYDify
GOOGLE_AGENT_DEVELOPMENT_KITGoogle Agent Development Kit
HUGGING_FACE_TRANSFORMERS_AGENTSHugging Face Transformers Agents
IBM_WATSONX_ASSISTANTIBM Watsonx Assistant
KUBIYA_AIKubiya.ai
LANGCHAINLangChain
LANGFLOWLangflow
LANGGRAPHLangGraph
LLAMAINDEXLlamaIndex
LYZRLyzr
MASTRAMastra
METAGPTMetaGPT
MICROSOFT_AUTOGENMicrosoft AutoGen
MICROSOFT_COPILOT_STUDIOMicrosoft Copilot Studio
MICROSOFT_SEMANTIC_KERNELMicrosoft Semantic Kernel
N8Nn8n
OPENAI_AGENTS_SDKOpenAI Agents SDK
OPENAI_CHATGPT_TEAM_ENTERPRISEOpenAI / ChatGPT Team / Enterprise
PHIDATAPhidata
PYDANTIC_AIPydanticAI
RASARasa
SALESFORCE_AGENTFORCESalesforce Agentforce
SERVICENOW_AI_AGENTSServiceNow AI Agents
SMOLAGENTSSmolAgents
STRANDS_AGENTSStrands Agents
SUPERAGISuperAGI
WORKDAY_AI_AGENTSWorkday AI Agents

Token & Tool Extraction by Framework

  1. CrewAI
result = crew.kickoff(inputs={"user_input": query})

# Token counts
input_tokens = result.token_usage.prompt_tokens
output_tokens = result.token_usage.completion_tokens

# Tool calls — iterate task messages
for task_output in result.tasks_output:
for msg in (task_output.messages or []):
for tc in (msg.get("tool_calls") or []):
tool_name = tc.get("function", {}).get("name", "")
# classify tool_name as api or mcp
  1. LangChain / LangGraph
from langchain_core.callbacks import BaseCallbackHandler

class MetricsCallback(BaseCallbackHandler):
def on_llm_end(self, response, **kwargs):
usage = response.llm_output.get("token_usage", {})
self.input_tokens += usage.get("prompt_tokens", 0)
self.output_tokens += usage.get("completion_tokens", 0)
self.invocation_count += 1

def on_tool_end(self, output, **kwargs):
self.tool_success += 1

def on_tool_error(self, error, **kwargs):
self.tool_failure += 1
  1. OpenAI Agents
from agents import Agent, Runner

result = await Runner.run(agent, input=query)

# Token usage
usage = result.raw_responses[-1].usage
input_tokens = usage.input_tokens
output_tokens = usage.output_tokens

# Tool calls
for item in result.new_items:
if hasattr(item, "type") and item.type == "tool_call_item":
# classify by item.raw_item.name
pass
  1. LlamaIndex
from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler

token_counter = TokenCountingHandler()
Settings.callback_manager = CallbackManager([token_counter])

response = query_engine.query(query)

input_tokens = token_counter.prompt_llm_token_count
output_tokens = token_counter.completion_llm_token_count
  1. Microsoft AutoGen
# After: chat_result = user_proxy.initiate_chat(...)
# Parse chat_result.chat_history for usage blocks in messages:
for msg in chat_result.chat_history:
if "usage" in msg:
input_tokens += msg["usage"].get("prompt_tokens", 0)
output_tokens += msg["usage"].get("completion_tokens", 0)
  1. Custom / OpenAI-compatible API
raw = llm_client.chat.completions.create(...)
input_tokens = raw.usage.prompt_tokens
output_tokens = raw.usage.completion_tokens

Complete Payload Example

A fully-populated resourceMetrics payload for a single invocation with tools:

{
"resourceMetrics": [
{
"extAccountAliasId": "3362d163-b990-49a6-b53d-ffbbaa536ada",
"providerType": "CREWAI",
"operation": "InvokeAgent",
"extModelId": "GPT",
"promptType": "CHAT",
"totalTime": 65.526,
"ttft": 1745848680506,
"modelLatency": 3700,
"modelInvocationCount": 1,
"inputTokenCount": 377,
"outputTokenCount": 233,
"invocationServerErrors": 0,
"invocationClientErrors": 0,
"modelInvocationThrottles": 0,
"modelInvocationClientErrors": 0,
"modelInvocationServerErrors": 0,
"modelInvocationUnknownErrors":0,
"guardrailHits": 3,
"sessionId": "03d3987e-362a-4fa1-848f-fe34e8a7d188",
"tools": [
{ "toolType": "api", "toolCalls": 3, "successCount": 2, "failureCount": 1 },
{ "toolType": "mcp", "toolCalls": 2, "successCount": 2, "failureCount": 0 }
],
"time": 1775730591000,
"schemaVersion": "1.0.0"
}
]
}

Validation & Troubleshooting

Pre-Ship Checklist

  • POST request returns HTTP 200 or 202.
  • All Required fields are present and non-null.
  • time is a 13-digit Unix timestamp in milliseconds.
  • Token counts are non-negative integers (0 is valid).
  • Send metrics even when the agent throws an exception.
  • Authorization header is correctly configured.

Common Issues

SymptomLikely CauseFix
HTTP 401 UnauthorizedMissing or expired JWT tokenSet AI_METRICS_JWT_TOKEN and ensure the token is valid and not expired
HTTP 400 Bad RequestMissing required field or incorrect data typeEnsure all Required fields (Refer Resource Metric Object) are included and properly typed
Connection timeoutIncorrect endpoint URL or network issueVerify AI_METRICS_ENDPOINT is correct and reachable; check firewall or network restrictions

Quick Smoke Test

Validate your endpoint and credentials without running a real agent using curl:

curl -X POST "<AI_METRICS_ENDPOINT>" \
-H "Content-Type: application/json" \
-H "Authorization: Basic BOOMI_TOKEN.user@boomi.com:api-token" \
-d '{
"resourceMetrics": [{
"extAccountAliasId": "<refer-from-api-specification>",
"providerType": "<refer-from-api-specification>",
"operation": "InvokeAgent",
"sessionId": "test-session-001",
"time": 1775730591000,
"schemaVersion": "1.0.0",
"invocationServerErrors": 0,
"invocationClientErrors": 0,
"modelInvocationCount": 1,
"modelInvocationThrottles": 0,
"modelInvocationClientErrors": 0,
"modelInvocationServerErrors": 0,
"modelInvocationUnknownErrors": 0,
"guardrailHits": 0
}]
}'

A 200 or 202 response confirms your credentials and endpoint URL are correct.

On this Page