Skip to content

Agent Mode

Agent mode provides an automatic tool-calling loop. Instead of returning a single completion, the LLM can request one or more tool calls — function invocations that Prompty executes on its behalf. The results are appended to the conversation and the LLM is called again. This cycle repeats until the model produces a final text response (or a safety limit is hit).

This lets you build agents that can query databases, call APIs, search files, or perform any action you expose as a Python function — all driven by the LLM’s reasoning.

flowchart TD
    A["Send Messages to LLM"] --> B["Receive Response"]
    B --> C{"has tool_calls?\n(finish_reason)"}
    C -- Yes --> D["Execute Tool Functions"]
    D --> E["Append Tool Results"]
    E -.-> A
    C -- No --> F["Return Final Response"]

    G["Error Paths"] ~~~ H
    H["Bad JSON in tool args"] --> I["Send error string to LLM"]
    J["Tool function throws"] --> K["Send error string to LLM"]
    L["Max iterations exceeded"] --> M["Raise ValueError"]

    style A fill:#3b82f6,stroke:#1d4ed8,color:#fff
    style B fill:#3b82f6,stroke:#1d4ed8,color:#fff
    style C fill:#f59e0b,stroke:#d97706,color:#fff
    style D fill:#10b981,stroke:#059669,color:#fff
    style E fill:#10b981,stroke:#059669,color:#fff
    style F fill:#1d4ed8,stroke:#1e3a8a,color:#fff
    style G fill:none,stroke:none,color:#6b7280
    style H fill:#fef2f2,stroke:#ef4444,color:#ef4444
    style I fill:#fef2f2,stroke:#ef4444,color:#ef4444
    style J fill:#fef2f2,stroke:#ef4444,color:#ef4444
    style K fill:#fef2f2,stroke:#ef4444,color:#ef4444
    style L fill:#fef2f2,stroke:#ef4444,color:#ef4444
    style M fill:#fef2f2,stroke:#ef4444,color:#ef4444

Define one or more tool functions, then pass them to turn() along with a loaded agent. The executor calls the LLM, dispatches any tool requests to your functions, and loops until the model is done.

from prompty import load, turn, tool, bind_tools
# 1. Define tool functions with @tool
@tool
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
return f"72°F and sunny in {city}"
@tool
def get_time(timezone: str) -> str:
"""Get the current time in a timezone."""
return f"3:42 PM in {timezone}"
# 2. Load the agent prompt
agent = load("agent.prompty")
# 3. Validate handlers against the .prompty declarations
tools = bind_tools(agent, [get_weather, get_time])
# 4. Run the agent loop
result = turn(
agent,
inputs={"question": "What's the weather in Seattle?"},
tools=tools,
max_iterations=10,
max_llm_retries=3,
)
print(result) # "It's currently 72°F and sunny in Seattle!"

Agent prompts declare their tools in the frontmatter using FunctionTool entries. The LLM sees these as available functions it can call.

agent.prompty
---
name: weather-agent
description: An agent that can check weather and time
model:
id: gpt-4o
provider: openai
apiType: chat
connection:
kind: key
endpoint: ${env:OPENAI_API_ENDPOINT:https://api.openai.com/v1}
apiKey: ${env:OPENAI_API_KEY}
options:
temperature: 0
inputs:
question:
kind: string
description: The user's question
default: What's the weather?
tools:
- name: get_weather
kind: function
description: Get the current weather for a city
parameters:
- name: city
kind: string
description: City name, e.g. "Seattle"
required: true
strict: true
- name: get_time
kind: function
description: Get the current time in a timezone
parameters:
- name: timezone
kind: string
description: IANA timezone, e.g. "America/Los_Angeles"
required: true
---
system:
You are a helpful assistant with access to weather and time tools.
Answer the user's question using the available tools.
user:
{{question}}

For async applications, use turn_async(). Your tool functions can be either sync or async — the executor detects coroutine functions automatically and awaits them.

import asyncio
import prompty
async def get_weather(city: str) -> str:
"""Async weather lookup."""
# Imagine an async HTTP call here
return f"72°F and sunny in {city}"
async def main():
agent = await prompty.load_async("agent.prompty")
result = await prompty.turn_async(
agent,
inputs={"question": "Weather in Tokyo?"},
tools={"get_weather": get_weather},
max_iterations=10,
)
print(result)
asyncio.run(main())

The agent loop is designed to be resilient at three levels: malformed tool arguments, tool execution failures, and transient LLM errors. Instead of crashing, the loop recovers and feeds error information back to the LLM so the model can retry or adjust its approach.

LLMs sometimes return malformed JSON in tool call arguments — markdown code fences wrapping JSON, trailing commas, or JSON embedded in prose. Prompty uses a four-strategy fallback chain before giving up:

  1. Direct parse — try JSON.parse as-is
  2. Strip markdown fences — remove ```json ... ``` wrappers
  3. Extract first JSON block — find the first { to its matching }
  4. Strip trailing commas — remove , before } or ]

If all four strategies fail, the parse error is sent back to the LLM as a tool result string (never a silent empty {}). The model typically corrects the JSON on the next attempt.

tool message → "Error: Invalid JSON in tool arguments: Expecting ',' delimiter: line 1 column 42"

If your tool function raises any exception (or panics in Rust), Prompty catches it and sends the error message back to the LLM as the tool result. The agent loop never terminates due to a tool handler failure — the model decides whether to retry with different arguments or inform the user.

tool message → "Error: Tool 'get_weather' failed: ConnectionTimeout: API unreachable"

Transient LLM failures (429 rate limits, 500 server errors) can derail a long and expensive agent loop. Prompty retries the LLM call with exponential backoff before giving up — preserving the conversation state accumulated across iterations.

ParameterDefaultDescription
max_llm_retries3Maximum retry attempts per LLM call

The backoff formula is min(2^attempt + jitter, 60s) — exponential with random jitter, capped at 60 seconds.

When all retries are exhausted, Prompty raises an ExecuteError that includes the full conversation history. This lets you resume a failed agent loop without losing work:

from prompty import turn, ExecuteError
try:
result = turn(
"agent.prompty",
inputs={"question": "Plan my trip"},
tools=tools,
max_llm_retries=3,
)
except ExecuteError as e:
print(f"Failed after retries: {e}")
# e.messages contains the full conversation — resume later
saved_messages = e.messages

If the LLM requests a tool that doesn’t exist in the tools dict, an error message is returned instead of crashing:

tool message → "Error: tool 'unknown_tool' not found in tools dict"

If the loop runs for more than max_iterations cycles without the model producing a final response, a ValueError is raised. This prevents infinite loops when the model gets stuck in a tool-calling cycle.

try:
result = prompty.turn(agent, inputs, tools, max_iterations=5)
except ValueError as e:
print(e) # "Agent loop exceeded max_iterations (5)"
import { load, turn, tool, bindTools } from "@prompty/core";
const getWeather = tool(
(city: string) => `72°F and sunny in ${city}`,
{
name: "get_weather",
description: "Get the current weather",
parameters: [{ name: "city", kind: "string", required: true }],
},
);
const agent = await load("agent.prompty");
const tools = bindTools(agent, [getWeather]);
const result = await turn(agent, {
question: "What's the weather in London?",
}, { tools, maxIterations: 10, maxLlmRetries: 3 });
console.log(result);

Under the hood, the agent loop in the executor follows these steps:

  1. Collect the full response — the agent loop works with both streaming and non-streaming requests. When streaming is enabled, the loop consumes the stream and accumulates tool calls from the streamed chunks. When streaming is off, it reads tool calls directly from the response. Either way, tool calls are fully collected before any are executed.

  2. Call the LLM (with retry) — send the current message list plus tool definitions via the chat completions API. If the call fails, retry with exponential backoff up to max_llm_retries times (§9.10).

  3. Check finish_reason — if the response’s finish_reason is "tool_calls", the model wants to invoke tools. If it’s "stop", the model is done.

  4. Extract tool calls — each tool call has an id, a function.name, and function.arguments (a JSON string).

  5. Parse arguments (resilient) — parse the JSON arguments using the four-strategy fallback chain (§9.8). If all strategies fail, send the error back to the LLM as a tool result.

  6. Execute (with error safety) — for each tool call, find the matching function and call it. If the function throws, catch the error and send it back to the LLM as a tool result (§9.9) — the loop continues.

  7. Append results — add the assistant’s tool-call message and one tool role message per call result back to the conversation.

  8. Repeat — go back to step 2 with the updated message list.

  9. Return — when the model produces a final response (no tool calls), pass it through the processor and return the result.

# Simplified pseudocode of the agent loop (with resilience)
from prompty import ExecuteError
from prompty.core.tool_dispatch import resilient_json_parse
messages = prepare(agent, inputs)
for i in range(max_iterations):
# LLM call with retry (§9.10)
for attempt in range(max_llm_retries):
try:
response = client.chat.completions.create(
model=agent.model.id, messages=messages, tools=tool_defs)
break
except Exception as e:
if attempt + 1 >= max_llm_retries:
raise ExecuteError(str(e), messages=messages)
time.sleep(min(2 ** (attempt + 1) + random(), 60))
if response.finish_reason != "tool_calls":
return process(response)
messages.append(response.message)
for tool_call in response.tool_calls:
# Resilient parsing (§9.8)
args = resilient_json_parse(tool_call.function.arguments)
try:
# Error safety (§9.9) — catch tool failures
result = tools[tool_call.function.name](**args)
except Exception as e:
result = f"Error: Tool '{tool_call.function.name}' failed: {e}"
messages.append({"role": "tool", "tool_call_id": tool_call.id, "content": str(result)})
raise ValueError(f"Agent loop exceeded max_iterations ({max_iterations})")
  • Keep tool descriptions clear and concise. The LLM uses the description field to decide when to call a tool. Vague descriptions lead to incorrect or missed tool calls.

  • Use strict: true on FunctionTool. This enables OpenAI’s structured output mode for tool parameters, ensuring the model produces valid JSON matching your schema. It requires all parameters to be required and adds additionalProperties: false automatically.

  • Set a reasonable max_iterations. Most tool-using conversations complete in 2–5 iterations. Setting the limit too high risks runaway costs; setting it too low may cut off legitimate multi-step reasoning.

  • Return structured strings from tools. The LLM processes your tool’s return value as text. Returning well-formatted data (JSON, key-value pairs) helps the model extract information accurately.

  • Test with mocked tools first. Use simple stub functions that return hardcoded data while developing your prompt. Switch to real implementations once the agent’s reasoning flow is solid.