§9 Agent Loop

The agent loop enables multi-turn tool-calling workflows. It calls the LLM, inspects the response for tool calls, executes them, appends results to the conversation, and re-calls the LLM—repeating until the LLM produces a normal (non-tool-call) response.

Public API:

turn(path_or_agent, inputs, tools?) → result
turn_async(path_or_agent, inputs, tools?) → result

Both MUST emit a turn trace span that wraps the entire loop including all inner execute and execute_tool spans.

§9.1 Constants

Constant	Default	Notes
`MAX_ITERATIONS`	`10`	MAY be configurable at runtime
`MAX_LLM_RETRIES`	`3`	MAY be configurable at runtime (§9.10)

§9.2 Algorithm

function turn(path_or_agent, inputs, tools=null) → result:
  // Step 1: Resolve agent
  if path_or_agent is a string path:
    agent = load(path_or_agent)
  else:
    agent = path_or_agent

  // Step 2: Prepare initial messages
  messages = prepare(agent, inputs)

  // Step 3: Merge runtime tools into the agent
  if tools is not null:
    merge tools into agent.tools
    merge tool handlers into tool registry

  // Step 4: Iteration counter
  iteration = 0

  // Step 5: Loop
  loop:
    // 5a. Guard against infinite loops
    if iteration >= MAX_ITERATIONS:
      raise RuntimeError(
        "Agent loop exceeded " + MAX_ITERATIONS + " iterations"
      )

    // 5b. Call the LLM (with retry — see §9.10)
    llm_attempts = 0
    loop:
      try:
        response = execute_llm(agent, messages)
        break
      catch error:
        llm_attempts += 1
        if llm_attempts >= MAX_LLM_RETRIES:
          raise ExecuteError(
            message: str(error),
            messages: messages   // MUST include conversation state
          )
        backoff = min(2^llm_attempts + jitter(), 60)
        sleep(backoff)

    // 5c. Process response
    result = process(agent, response)

    // 5d. Check for tool calls
    if result is a list of ToolCall:
      tool_calls = result
      tool_results = []

      // Execute each tool call
      for tool_call in tool_calls:
        TRACE: emit "execute_tool" span for tool_call.name

        // Look up handler — two-layer dispatch (§11.2)
        tool_def = find_tool_definition(agent, tool_call.name)

        // Layer 1: explicit name override
        handler = get_tool(tool_call.name)

        // Parse arguments (with resilient fallback — see §9.8)
        args = resilient_json_parse(tool_call.arguments)

        // Apply bindings (inject bound values from inputs)
        args = apply_bindings(tool_def, args, inputs)

        // Execute tool handler (with error safety — see §9.9)
        try:
          if handler is not null:
            // Name registry hit — direct call
            tool_result = handler(args)
          else:
            // Layer 2: kind handler fallback
            kind_handler = get_tool_handler(tool_def.kind)
            if kind_handler is null:
              raise ValueError(
                "No handler registered for tool: " + tool_call.name
                + " (kind: " + tool_def.kind + ")"
              )
            tool_result = kind_handler(tool_def, args, agent, inputs)
        catch error:
          // Tool handler failures MUST NOT kill the agent loop (§9.9)
          tool_result = "Error: Tool '" + tool_call.name + "' failed: " + str(error)
          emit event("error", { message: tool_result })

        tool_results.append({ tool_call_id: tool_call.id, result: str(tool_result) })

      // Delegate message formatting to the executor (§9.4)
      executor = get_executor(agent.model.provider)
      text_content = extract_text_content(response)
      tool_messages = executor.formatToolMessages(
        response, tool_calls, tool_results, text_content
      )
      append tool_messages to messages

      iteration += 1
      continue loop

    // 5e. Normal response (no tool calls) — return
    return result

§9.3 Streaming in the Agent Loop

When streaming is enabled during an agent loop, implementations SHOULD forward content chunks to the caller where possible rather than buffering the entire response. The key constraint: tool call arguments arrive incrementally and MUST be fully accumulated before tool execution.

Detection strategy: LLM streaming APIs send tool_calls deltas from the start of a response — they do not appear after content deltas. Implementations SHOULD use the first chunk’s delta to determine the response type:

When response is a stream:
  1. Begin consuming chunks through the processor.
  2. If tool_calls are detected (present in early chunks):
     - MUST accumulate ALL chunks to collect complete tool call data
       (function names + full argument JSON).
     - MUST NOT yield content to the caller for this iteration.
     - Execute tools, append results, re-loop.
  3. If only content is detected (no tool_calls):
     - This is the final response — SHOULD yield content chunks
       through a PromptyStream to the caller as they arrive.
     - Return the stream (caller consumes at their pace).

This means intermediate iterations (tool calls) are buffered internally, while the final iteration (content only) is streamed through to the caller. The caller sees a normal PromptyStream for the final answer.

Implementations that cannot distinguish early MAY fall back to fully consuming the stream before deciding, but this is not preferred.

§9.4 Provider-Specific Tool Message Formats

Each provider has a different wire format for tool-call messages. The agent loop MUST produce messages in the correct format for the active provider.

OpenAI Chat Completions:

// Assistant message with tool calls
{
  "role": "assistant",
  "tool_calls": [
    {
      "id": "call_123",
      "type": "function",
      "function": { "name": "get_weather", "arguments": "{\"city\":\"Paris\"}" }
    }
  ]
}

// Tool result message
{
  "role": "tool",
  "content": "72°F and sunny",
  "tool_call_id": "call_123"
}

Anthropic:

// Assistant message — MUST preserve ALL content blocks (text + tool_use)
{
  "role": "assistant",
  "content": ["<original content blocks from API response>"]
}

// Tool results — ALL results in ONE user message
{
  "role": "user",
  "content": [
    { "type": "tool_result", "tool_use_id": "toolu_123", "content": "72°F and sunny" },
    { "type": "tool_result", "tool_use_id": "toolu_456", "content": "Pizza Palace" }
  ]
}

OpenAI Responses API:

// MUST include original function_call item in input
{
  "type": "function_call",
  "id": "fc_123",
  "call_id": "call_123",
  "name": "get_weather",
  "arguments": "{\"city\":\"Paris\"}"
}

// Function call output
{
  "type": "function_call_output",
  "call_id": "call_123",
  "output": "72°F and sunny"
}

§9.5 Bindings Injection

During tool execution, bound parameters MUST be injected into the arguments before calling the handler:

function apply_bindings(tool, args, inputs) → dict:
  if tool.bindings is null:
    return args

  for param_name, binding in tool.bindings:
    input_name = binding.input    // e.g., "preferred_unit"
    if input_name in inputs:
      args[param_name] = inputs[input_name]

  return args

Bindings MUST override any value the LLM may have generated for the same parameter name.

§9.6 PromptyTool Execution

A PromptyTool references another .prompty file to be invoked as a tool:

function execute_prompty_tool(tool, args, parent_inputs) → result:
  // Resolve path relative to the parent .prompty file
  child_agent = load(tool.path)

  // Merge: LLM-provided args + bindings from parent inputs
  merged = apply_bindings(tool, args, parent_inputs)

  // One child prompt invocation — no nested agent loop.
  return invoke(child_agent, merged)

Child PromptyTool execution MUST inherit the parent’s tracer registry, producing nested trace spans that show the full call hierarchy.

§9.8 Resilient Argument Parsing

LLMs frequently produce malformed JSON in tool call arguments — markdown code fences wrapping JSON, trailing commas, or JSON embedded in prose text. Implementations SHOULD attempt recovery using the following fallback chain when json_parse fails on the raw argument string:

function resilient_json_parse(raw_arguments) → dict:
  // Strategy 1: Direct parse
  try:
    return json_parse(raw_arguments)
  catch: pass

  // Strategy 2: Strip markdown code fences
  stripped = regex_replace(raw_arguments,
    /^\s*```(?:json)?\s*\n?(.*?)\n?\s*```\s*$/s, "$1")
  if stripped != raw_arguments:
    try:
      return json_parse(stripped)
    catch: pass

  // Strategy 3: Extract first balanced JSON block
  block = extract_first_json_block(raw_arguments)
  if block is not null:
    try:
      return json_parse(block)
    catch: pass

  // Strategy 4: Strip trailing commas before } or ]
  cleaned = regex_replace(raw_arguments, /,\s*([}\]])/g, "$1")
  try:
    return json_parse(cleaned)
  catch: pass

  // All strategies failed — return error as tool result
  return null  // caller MUST convert to error tool result

Requirements:

Implementations SHOULD attempt all four strategies in order.
When a non-direct strategy succeeds, implementations SHOULD log a warning indicating which fallback was used.
If all strategies fail, implementations MUST NOT substitute a silent empty object ({}). The parse failure MUST be reported as a string tool result so the LLM can see the error and retry.
extract_first_json_block MUST respect string escapes (do not match braces inside quoted strings).

§9.9 Tool Execution Error Safety

Tool handlers are user-provided code. Implementations MUST catch exceptions (or panics, in languages that distinguish them) raised by tool handlers during execution.

Requirements:

Caught errors MUST be converted to a string tool result: "Error: Tool '{name}' failed: {message}"
An error event (§13.1) MUST be emitted with the error details.
The agent loop MUST NOT terminate due to a tool handler failure — the error result is fed back to the LLM, allowing the model to recover.
For languages with both exceptions and panics (e.g., Rust), both MUST be caught.
ValueError for “Tool not registered” is NOT subject to this rule — a missing handler indicates a configuration error and SHOULD still raise.

§9.10 LLM Call Retry

Long-running agent loops accumulate valuable state across iterations. A transient LLM failure at iteration N should not discard the work from iterations 1 through N-1. Implementations SHOULD retry the execute_llm call within the agent loop before raising to the caller.

Algorithm:

// Inside the agent loop, replacing the direct execute_llm call:
llm_attempts = 0
loop:
  try:
    response = execute_llm(agent, messages)
    break  // success
  catch error:
    llm_attempts += 1
    if llm_attempts >= MAX_LLM_RETRIES:
      raise ExecuteError(
        message: str(error),
        messages: messages   // MUST include conversation state
      )
    backoff = min(2^llm_attempts + jitter(), 60)
    sleep(backoff)

Requirements:

This retry is independent of any HTTP-level retry inside the executor.
When all retries are exhausted, the raised error MUST include the accumulated messages list so the caller can resume by passing them back as thread input on a subsequent turn() call.
Implementations SHOULD emit a status event before each retry.
Retry MUST respect the cancellation token (§13.2) — if cancellation is signaled during a backoff wait, the loop MUST stop retrying immediately.