§9 Agent Loop
The agent loop enables multi-turn tool-calling workflows. It calls the LLM, inspects the response for tool calls, executes them, appends results to the conversation, and re-calls the LLM—repeating until the LLM produces a normal (non-tool-call) response.
Public API:
turn(path_or_agent, inputs, tools?) → resultturn_async(path_or_agent, inputs, tools?) → result
Both MUST emit a turn trace span that wraps
the entire loop including all inner execute and execute_tool spans.
§9.1 Constants
Section titled “§9.1 Constants”| Constant | Default | Notes |
|---|---|---|
MAX_ITERATIONS | 10 | MAY be configurable at runtime |
MAX_LLM_RETRIES | 3 | MAY be configurable at runtime (§9.10) |
§9.2 Algorithm
Section titled “§9.2 Algorithm”function turn(path_or_agent, inputs, tools=null) → result: // Step 1: Resolve agent if path_or_agent is a string path: agent = load(path_or_agent) else: agent = path_or_agent
// Step 2: Prepare initial messages messages = prepare(agent, inputs)
// Step 3: Merge runtime tools into the agent if tools is not null: merge tools into agent.tools merge tool handlers into tool registry
// Step 4: Iteration counter iteration = 0
// Step 5: Loop loop: // 5a. Guard against infinite loops if iteration >= MAX_ITERATIONS: raise RuntimeError( "Agent loop exceeded " + MAX_ITERATIONS + " iterations" )
// 5b. Call the LLM (with retry — see §9.10) llm_attempts = 0 loop: try: response = execute_llm(agent, messages) break catch error: llm_attempts += 1 if llm_attempts >= MAX_LLM_RETRIES: raise ExecuteError( message: str(error), messages: messages // MUST include conversation state ) backoff = min(2^llm_attempts + jitter(), 60) sleep(backoff)
// 5c. Process response result = process(agent, response)
// 5d. Check for tool calls if result is a list of ToolCall: tool_calls = result tool_results = []
// Execute each tool call for tool_call in tool_calls: TRACE: emit "execute_tool" span for tool_call.name
// Look up handler — two-layer dispatch (§11.2) tool_def = find_tool_definition(agent, tool_call.name)
// Layer 1: explicit name override handler = get_tool(tool_call.name)
// Parse arguments (with resilient fallback — see §9.8) args = resilient_json_parse(tool_call.arguments)
// Apply bindings (inject bound values from inputs) args = apply_bindings(tool_def, args, inputs)
// Execute tool handler (with error safety — see §9.9) try: if handler is not null: // Name registry hit — direct call tool_result = handler(args) else: // Layer 2: kind handler fallback kind_handler = get_tool_handler(tool_def.kind) if kind_handler is null: raise ValueError( "No handler registered for tool: " + tool_call.name + " (kind: " + tool_def.kind + ")" ) tool_result = kind_handler(tool_def, args, agent, inputs) catch error: // Tool handler failures MUST NOT kill the agent loop (§9.9) tool_result = "Error: Tool '" + tool_call.name + "' failed: " + str(error) emit event("error", { message: tool_result })
tool_results.append({ tool_call_id: tool_call.id, result: str(tool_result) })
// Delegate message formatting to the executor (§9.4) executor = get_executor(agent.model.provider) text_content = extract_text_content(response) tool_messages = executor.FormatToolMessages( response, tool_calls, tool_results, text_content ) append tool_messages to messages
iteration += 1 continue loop
// 5e. Normal response (no tool calls) — return return result§9.3 Streaming in Agent Mode
Section titled “§9.3 Streaming in Agent Mode”When streaming is enabled during agent mode, implementations SHOULD forward content chunks to the caller where possible rather than buffering the entire response. The key constraint: tool call arguments arrive incrementally and MUST be fully accumulated before tool execution.
Detection strategy: LLM streaming APIs send tool_calls deltas from the
start of a response — they do not appear after content deltas. Implementations
SHOULD use the first chunk’s delta to determine the response type:
When response is a stream: 1. Begin consuming chunks through the processor. 2. If tool_calls are detected (present in early chunks): - MUST accumulate ALL chunks to collect complete tool call data (function names + full argument JSON). - MUST NOT yield content to the caller for this iteration. - Execute tools, append results, re-loop. 3. If only content is detected (no tool_calls): - This is the final response — SHOULD yield content chunks through a PromptyStream to the caller as they arrive. - Return the stream (caller consumes at their pace).This means intermediate iterations (tool calls) are buffered internally,
while the final iteration (content only) is streamed through to the
caller. The caller sees a normal PromptyStream for the final answer.
Implementations that cannot distinguish early MAY fall back to fully consuming the stream before deciding, but this is not preferred.
§9.4 Provider-Specific Tool Message Formats
Section titled “§9.4 Provider-Specific Tool Message Formats”Each provider has a different wire format for tool-call messages. The agent loop MUST produce messages in the correct format for the active provider.
OpenAI Chat Completions:
// Assistant message with tool calls{ "role": "assistant", "tool_calls": [ { "id": "call_123", "type": "function", "function": { "name": "get_weather", "arguments": "{\"city\":\"Paris\"}" } } ]}
// Tool result message{ "role": "tool", "content": "72°F and sunny", "tool_call_id": "call_123"}Anthropic:
// Assistant message — MUST preserve ALL content blocks (text + tool_use){ "role": "assistant", "content": ["<original content blocks from API response>"]}
// Tool results — ALL results in ONE user message{ "role": "user", "content": [ { "type": "tool_result", "tool_use_id": "toolu_123", "content": "72°F and sunny" }, { "type": "tool_result", "tool_use_id": "toolu_456", "content": "Pizza Palace" } ]}OpenAI Responses API:
// MUST include original function_call item in input{ "type": "function_call", "id": "fc_123", "call_id": "call_123", "name": "get_weather", "arguments": "{\"city\":\"Paris\"}"}
// Function call output{ "type": "function_call_output", "call_id": "call_123", "output": "72°F and sunny"}§9.5 Bindings Injection
Section titled “§9.5 Bindings Injection”During tool execution, bound parameters MUST be injected into the arguments before calling the handler:
function apply_bindings(tool, args, inputs) → dict: if tool.bindings is null: return args
for param_name, binding in tool.bindings: input_name = binding.input // e.g., "preferred_unit" if input_name in inputs: args[param_name] = inputs[input_name]
return argsBindings MUST override any value the LLM may have generated for the same parameter name.
§9.6 PromptyTool Execution
Section titled “§9.6 PromptyTool Execution”A PromptyTool references another .prompty file to be invoked as a tool:
function execute_prompty_tool(tool, args, parent_inputs) → result: // Resolve path relative to the parent .prompty file child_agent = load(tool.path)
// Merge: LLM-provided args + bindings from parent inputs merged = apply_bindings(tool, args, parent_inputs)
match tool.mode: "single": // One LLM call — no agent loop return run(child_agent, merged) "agentic": // Full agent loop — child may call tools too return turn(child_agent, merged)Child PromptyTool execution MUST inherit the parent’s tracer registry,
producing nested trace spans that show the full call hierarchy.
§9.8 Resilient Argument Parsing
Section titled “§9.8 Resilient Argument Parsing”LLMs frequently produce malformed JSON in tool call arguments — markdown
code fences wrapping JSON, trailing commas, or JSON embedded in prose text.
Implementations SHOULD attempt recovery using the following fallback chain
when json_parse fails on the raw argument string:
function resilient_json_parse(raw_arguments) → dict: // Strategy 1: Direct parse try: return json_parse(raw_arguments) catch: pass
// Strategy 2: Strip markdown code fences stripped = regex_replace(raw_arguments, /^\s*```(?:json)?\s*\n?(.*?)\n?\s*```\s*$/s, "$1") if stripped != raw_arguments: try: return json_parse(stripped) catch: pass
// Strategy 3: Extract first balanced JSON block block = extract_first_json_block(raw_arguments) if block is not null: try: return json_parse(block) catch: pass
// Strategy 4: Strip trailing commas before } or ] cleaned = regex_replace(raw_arguments, /,\s*([}\]])/g, "$1") try: return json_parse(cleaned) catch: pass
// All strategies failed — return error as tool result return null // caller MUST convert to error tool resultRequirements:
- Implementations SHOULD attempt all four strategies in order.
- When a non-direct strategy succeeds, implementations SHOULD log a warning indicating which fallback was used.
- If all strategies fail, implementations MUST NOT substitute a silent empty
object (
{}). The parse failure MUST be reported as a string tool result so the LLM can see the error and retry. extract_first_json_blockMUST respect string escapes (do not match braces inside quoted strings).
§9.9 Tool Execution Error Safety
Section titled “§9.9 Tool Execution Error Safety”Tool handlers are user-provided code. Implementations MUST catch exceptions (or panics, in languages that distinguish them) raised by tool handlers during execution.
Requirements:
- Caught errors MUST be converted to a string tool result:
"Error: Tool '{name}' failed: {message}" - An
errorevent (§13.1) MUST be emitted with the error details. - The agent loop MUST NOT terminate due to a tool handler failure — the error result is fed back to the LLM, allowing the model to recover.
- For languages with both exceptions and panics (e.g., Rust), both MUST be caught.
ValueErrorfor “Tool not registered” is NOT subject to this rule — a missing handler indicates a configuration error and SHOULD still raise.
§9.10 LLM Call Retry
Section titled “§9.10 LLM Call Retry”Long-running agent loops accumulate valuable state across iterations. A
transient LLM failure at iteration N should not discard the work from
iterations 1 through N-1. Implementations SHOULD retry the execute_llm
call within the agent loop before raising to the caller.
Algorithm:
// Inside the agent loop, replacing the direct execute_llm call:llm_attempts = 0loop: try: response = execute_llm(agent, messages) break // success catch error: llm_attempts += 1 if llm_attempts >= MAX_LLM_RETRIES: raise ExecuteError( message: str(error), messages: messages // MUST include conversation state ) backoff = min(2^llm_attempts + jitter(), 60) sleep(backoff)Requirements:
- This retry is independent of any HTTP-level retry inside the executor.
- When all retries are exhausted, the raised error MUST include the
accumulated
messageslist so the caller can resume by passing them back as thread input on a subsequentturn()call. - Implementations SHOULD emit a
statusevent before each retry. - Retry MUST respect the cancellation token (§13.2) — if cancellation is signaled during a backoff wait, the loop MUST stop retrying immediately.