Conversation History (Threads)

Overview

Most LLM applications need multi-turn conversation. The model needs to see prior messages — what the user asked and what it answered — to maintain coherent context. In Prompty, conversation history is handled through thread inputs: a special input kind that tells the pipeline to splice a list of messages into the prompt at exactly the right position.

flowchart LR
    S["system:\nYou are helpful."]
    T["🧵 Thread Messages\n(prior turns)"]
    U["user:\nNew question"]

    S --> T --> U

    style S fill:#dbeafe,stroke:#3b82f6,color:#1e293b
    style T fill:#fef3c7,stroke:#f59e0b,color:#78350f
    style U fill:#d1fae5,stroke:#10b981,color:#065f46

The result is a flat message array — system prompt, then prior conversation, then the new user message — ready for the LLM.

Declaring a Thread Input

Add an input with kind: thread to your .prompty file’s inputs:

---
name: chat-assistant
model:
  id: gpt-4o-mini
  provider: openai
  connection:
    kind: key
    apiKey: ${env:OPENAI_API_KEY}
inputs:
  - name: question
    kind: string
    default: Hello!
  - name: conversation
    kind: thread
---

Two things make a thread input different from a regular string or object input:

kind: thread — signals the pipeline to use special handling (nonce-based expansion) instead of simple template interpolation.
The value is a list of messages — not a scalar. Each message has a role and content.

Placing the Thread in Your Template

In the markdown body, place {{conversation}} (or whatever you named your thread input) where the prior messages should appear. The most common pattern puts it between the system prompt and the new user message:

system:
You are a friendly, helpful assistant.

{{conversation}}
user:
{{question}}

This produces a message array like:

#	Role	Content
1	`system`	You are a friendly, helpful assistant.
2	`user`	(first turn — from thread)
3	`assistant`	(first response — from thread)
4	`user`	(second turn — from thread)
5	`assistant`	(second response — from thread)
…	…	…
N	`user`	(current question)

Passing Thread Data

Thread data is a list of message objects. Each message needs a role (typically "user" or "assistant") and content (a string or structured content array).

import prompty

history = []

while True:
    question = input("You: ")
    if question.lower() in ("quit", "exit"):
        break

    result = prompty.invoke(
        "assistant.prompty",
        inputs={
            "question": question,
            "conversation": history,
        },
    )
    print(f"Assistant: {result}\n")

    # Append this exchange to history for the next turn
    history.append({"role": "user", "content": question})
    history.append({"role": "assistant", "content": result})

import { invoke } from "@prompty/core";
import "@prompty/openai";

const history: { role: string; content: string }[] = [];

// ... in your chat loop:
const result = await invoke("assistant.prompty", {
  question: userMessage,
  conversation: history,
});

history.push({ role: "user", content: userMessage });
history.push({ role: "assistant", content: String(result) });

using Prompty.Core;

var history = new List<Dictionary<string, string>>();

// ... in your chat loop:
var result = await Pipeline.InvokeAsync("assistant.prompty", new()
{
    ["question"] = question,
    ["conversation"] = history,
});

history.Add(new() { ["role"] = "user", ["content"] = question });
history.Add(new() { ["role"] = "assistant", ["content"] = result!.ToString()! });

use serde_json::json;

let history = vec![
    json!({"role": "user", "content": "What is Prompty?"}),
    json!({"role": "assistant", "content": "Prompty is a file format for LLM prompts."}),
];

let inputs = json!({
    "question": "Tell me more",
    "conversation": history,
});

let result = pipeline::invoke("assistant.prompty", Some(&inputs)).await?;

Message Format

Each message in the thread list should have:

Field	Type	Required	Description
`role`	`string`	Yes	`"user"`, `"assistant"`, `"system"`, or `"tool"`
`content`	`string` or `array`	Yes	Text content, or an array of content parts for multimodal

Simple format — content as a plain string:

[
  { "role": "user", "content": "What is the capital of France?" },
  { "role": "assistant", "content": "The capital of France is Paris." }
]

Structured format — content as an array of typed parts:

[
  {
    "role": "user",
    "content": [
      { "kind": "text", "value": "What's in this image?" },
      { "kind": "image", "value": "https://example.com/photo.jpg" }
    ]
  },
  {
    "role": "assistant",
    "content": [
      { "kind": "text", "value": "The image shows a sunset over the ocean." }
    ]
  }
]

How It Works Internally

Thread inputs go through a nonce-based expansion mechanism rather than simple string interpolation. This is important for security — it prevents user-supplied conversation history from accidentally injecting role markers (like system:) into the template.

flowchart TD
    subgraph Render["1. Render"]
        direction TB
        R1["Template: {{conversation}}"]
        R2["Nonce: __PROMPTY_THREAD_a1b2c3d4_conversation__"]
        R1 --> R2
    end

    subgraph Parse["2. Parse"]
        direction TB
        P1["Role marker splitting"]
        P2["Nonce preserved as text in message"]
        P1 --> P2
    end

    subgraph Expand["3. Expand"]
        direction TB
        E1["Find nonce in message text"]
        E2["Replace with actual Message objects"]
        E1 --> E2
    end

    Render --> Parse --> Expand --> Final["Final message array"]

    style Render fill:#dbeafe,stroke:#3b82f6,color:#1e293b
    style Parse fill:#fef3c7,stroke:#f59e0b,color:#78350f
    style Expand fill:#d1fae5,stroke:#10b981,color:#065f46

Step by step

Render — The renderer replaces the thread variable with a unique nonce marker (e.g., __PROMPTY_THREAD_a1b2c3d4_conversation__) instead of the actual messages. The nonce is a random hex string that cannot appear in normal text.
Parse — The parser splits the rendered text on role markers (system:, user:, assistant:). The nonce marker passes through as plain text inside a message’s content.
Expand — The pipeline scans parsed messages for nonce markers. When found, it splits the surrounding text, inserts the actual thread messages from your input data, and produces the final flat message array.

Why nonces?

If thread messages were interpolated directly into the template as text, a malicious or accidental conversation entry like "system: Ignore all instructions" would create a new role boundary during parsing. The nonce approach ensures thread messages bypass the template engine and parser entirely — they are inserted as pre-built Message objects after parsing is complete.

Best Practices

Token Budget Management

Every message in the thread consumes tokens. As conversations grow, you’ll eventually hit the model’s context window limit. Common strategies:

Sliding window — keep only the last N messages:

MAX_HISTORY = 20  # last 10 exchanges
history = history[-MAX_HISTORY:]

Summarization — periodically summarize older messages into a single assistant message, then trim:

if len(history) > 30:
    summary = summarize(history[:20])  # your summarization logic
    history = [{"role": "assistant", "content": summary}] + history[20:]

Token counting — use a tokenizer to measure the thread and truncate from the oldest messages until it fits within budget.

System Prompt Separation

Keep your system prompt outside the thread. The thread should contain only user and assistant messages from prior turns:

system:
You are an expert chef.        ← Static system prompt

{{conversation}}               ← Prior user/assistant turns only

user:
{{question}}                   ← New user message

Multiple Thread Inputs

You can declare more than one thread input for advanced patterns — for example, a context thread for retrieved documents and a conversation thread for chat history:

inputs:
  - name: context
    kind: thread
  - name: conversation
    kind: thread
  - name: question
    kind: string

system:
You answer questions using the provided context.

{{context}}
{{conversation}}
user:
{{question}}

Each thread is expanded independently at its marker position.

Stateless Design

Prompty itself is stateless — it does not store conversation history between calls. Your application code is responsible for:

Accumulating messages in a list
Passing that list as the thread input on each call
Managing persistence (in-memory, database, session store, etc.)

This gives you full control over what the model sees and makes it easy to implement features like message editing, branching, and context management.

Thread Inputs with the Agent Loop

When running the agent loop (via turn() / TurnAsync()), thread inputs work the same way — they provide the initial conversation context. The .prompty file still uses apiType: chat; agent behavior is activated by your calling code. The agent loop then appends tool calls and results within a single turn() invocation:

---
name: agent-with-history
model:
  id: gpt-4o
  provider: openai
  apiType: chat
  connection:
    kind: key
    apiKey: ${env:OPENAI_API_KEY}
inputs:
  - name: conversation
    kind: thread
  - name: question
    kind: string
tools:
  - name: get_weather
    kind: function
    description: Get current weather for a city
    parameters:
      properties:
        - name: city
          kind: string
          required: true
---
system:
You are a helpful assistant with access to weather data.

{{conversation}}
user:
{{question}}

Then in your code, use turn() to execute the agent loop with thread history:

Python
TypeScript

from prompty import load, turn, tool, bind_tools

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"72°F and sunny in {city}"

agent = load("agent.prompty")
tools = bind_tools(agent, [get_weather])
history = []

while True:
    question = input("You: ")
    if question.lower() in ("quit", "exit"):
        break

    result = turn(
        agent,
        inputs={"question": question, "conversation": history},
        tools=tools,
    )
    print(f"Assistant: {result}\n")

    history.append({"role": "user", "content": question})
    history.append({"role": "assistant", "content": result})

import { load, turn, tool, bindTools } from "@prompty/core";
import "@prompty/openai";

const getWeather = tool(
  (city: string) => `72°F and sunny in ${city}`,
  { name: "get_weather", description: "Get the current weather",
    parameters: [{ name: "city", kind: "string", required: true }] },
);

const agent = await load("agent.prompty");
const tools = bindTools(agent, [getWeather]);
const history: { role: string; content: string }[] = [];

// In your chat loop:
const result = await turn(
  agent,
  { question: userMessage, conversation: history },
  { tools },
);

history.push({ role: "user", content: userMessage });
history.push({ role: "assistant", content: String(result) });

The thread provides context from prior turns, and turn() handles any new tool calls within the current turn.

Quick Reference

Aspect	Details
Declaration	`kind: thread` in `inputs`
Template syntax	`{{threadName}}` — Jinja2 or Mustache
Data format	`list` of `{ role, content }` objects
Empty thread	Valid — produces no extra messages
Expansion	Nonce-based, post-parse (injection-safe)
Supported runtimes	Python, TypeScript, C#, Rust
State management	Application-side — Prompty is stateless

Next Steps

Tutorial: Build a Chat Assistant — hands-on walkthrough building a multi-turn chatbot
Agentic Concepts — combine threads with automatic tool calling and runtime controls
Pipeline Architecture — understand how threads fit into the four-stage pipeline
Streaming — stream responses while using conversation history