Skip to content

Pipeline Architecture

Prompty processes .prompty files through a four-stage pipeline. Each stage is defined by a protocol/interface — concrete implementations are discovered at runtime via a registry. This design means you can swap any stage without touching the others: use a different template engine, a custom parser, or your own LLM provider.

flowchart TD
    File[".prompty file"]
    File --> Load

    subgraph Pipeline["Four-Stage Pipeline"]
        direction TB
        Load["load()
        Parse frontmatter + body
        Resolve references"]
        Render["render()
        Fill template variables
        (Jinja2 / Mustache)"]
        Parse["parse()
        Split on role markers
        → Message list"]
        Execute["execute()
        Call LLM provider
        (OpenAI / Azure / Anthropic)"]
        Process["process()
        Extract content from
        API response"]

        Load --> Render
        Render --> Parse
        Parse --> Execute
        Execute --> Process
    end

    Process --> Result["Final Result"]

    style File fill:#eff6ff,stroke:#3b82f6,color:#1d4ed8
    style Load fill:#f0fdf4,stroke:#10b981,color:#065f46
    style Render fill:#fffbeb,stroke:#f59e0b,color:#92400e
    style Parse fill:#fffbeb,stroke:#f59e0b,color:#92400e
    style Execute fill:#fef2f2,stroke:#ef4444,color:#991b1b
    style Process fill:#fef2f2,stroke:#ef4444,color:#991b1b
    style Result fill:#f0fdf4,stroke:#10b981,color:#065f46
    style Pipeline fill:none,stroke:#94a3b8,stroke-dasharray:5 5
StepFunctionWhat happens
Loadload()Parses the .prompty file — splits YAML frontmatter from the markdown body, resolves ${env:} / ${file:} references, and returns a typed Prompty object.
Renderrender()Fills in template variables (Jinja2 or Mustache) using the provided inputs. Produces a single string with role markers still embedded.
Parseparse()Splits the rendered string on role markers (system:, user:, assistant:) into a structured list[Message].
Executeexecute()Sends the messages to the LLM provider (OpenAI, Azure, Anthropic) via the appropriate SDK. Returns the raw API response.
Processprocess()Extracts clean output from the raw response — a string for chat, vectors for embeddings, a URL for images, or parsed JSON for structured output.

The Renderer takes a Prompty object and a dictionary of inputs, then renders the template (the instructions field) with those values. The result is a single rendered string containing role markers and filled-in variables.

PropertyValue
Registration keyagent.template.format.kind
Built-in implementationsJinja2Renderer ("jinja2"), MustacheRenderer ("mustache")
InputPrompty + dict of inputs
Outputstr — rendered template

The renderer also handles thread markers — when an input has kind: thread, the renderer emits special nonce markers that the pipeline later expands into Message objects for conversation history.

system:
You are an AI assistant helping {{ firstName }}.
user:
{{ question }}

The Parser takes the rendered string and splits it into a structured list of messages using role markers — lines ending with a colon that indicate who is speaking.

PropertyValue
Registration keyagent.template.parser.kind
Built-in implementationsPromptyChatParser ("prompty")
Inputstr — rendered template
Outputlist[Message] — structured message objects

Recognized role markers:

system: → { role: "system", content: "..." }
user: → { role: "user", content: "..." }
assistant: → { role: "assistant", content: "..." }

The Executor takes the list of messages and calls the LLM provider. It handles API type dispatch — routing to the appropriate SDK method based on agent.model.apiType.

PropertyValue
Registration keyagent.model.provider
Built-in implementationsOpenAIExecutor ("openai"), FoundryExecutor ("foundry", aliased as "azure")
Inputlist[Message] + Prompty (for config)
OutputRaw SDK response object

API type dispatch:

apiTypeSDK methodUse case
"chat" (default)chat.completions.create()Conversational prompts
"embedding"embeddings.create()Text → vector embeddings
"image"images.generate()DALL-E image generation
"responses"responses.create()OpenAI Responses API (latest features)

The Processor takes the raw SDK response and extracts clean, usable content. What “clean” means depends on the response type.

PropertyValue
Registration keyagent.model.provider
Built-in implementationsOpenAIProcessor ("openai"), FoundryProcessor ("foundry", aliased as "azure")
InputRaw SDK response + Prompty
OutputProcessed result (string, list, dict, parsed JSON, etc.)

Processing by response type:

Response typeOutput
Chat completionstr — the message content
Embeddinglist[float] or list[list[float]]
Imagestr — URL or base64 data
StreamingPromptyStream / AsyncPromptyStream iterator
Structured outputParsed dict matching outputs

You don’t always need the full pipeline. Prompty provides convenience functions that map to specific stage groupings:

flowchart LR
    subgraph invoke ["invoke() — full pipeline"]
        direction LR
        L["Load"] --> R["Render"] --> P["Parse"] --> E["Execute"] --> PR["Process"]
    end

    style invoke fill:#eff6ff,stroke:#3b82f6
    style L fill:#f0fdf4,stroke:#10b981,color:#065f46
    style R fill:#fffbeb,stroke:#f59e0b,color:#92400e
    style P fill:#fffbeb,stroke:#f59e0b,color:#92400e
    style E fill:#fef2f2,stroke:#ef4444,color:#991b1b
    style PR fill:#fef2f2,stroke:#ef4444,color:#991b1b

The individual functions map to subsets of this pipeline:

FunctionStagesDescription
render()Renderer onlyFill template variables → rendered string
parse()Parser onlySplit role markers → Message[]
prepare()Render + ParseRender, parse, and expand thread markers
run()Execute + ProcessSend to LLM and extract result
invoke()All five stagesLoad → Render → Parse → Execute → Process
from prompty import load, prepare, invoke
from prompty.core.pipeline import render, parse, run, process
agent = load("chat.prompty")
inputs = {"firstName": "Jane", "question": "What is AI?"}
# Stage 1 only — render the template
rendered = render(agent, inputs)
# Stage 2 only — parse rendered string into messages
messages = parse(agent, rendered)
# Stages 1 + 2 — render, parse, and expand threads
messages = prepare(agent, inputs)
# Stages 3 + 4 — execute LLM call and process response
result = run(agent, messages)
# Full pipeline — load + prepare + run
result = invoke("chat.prompty", inputs=inputs)

Each runtime discovers stage implementations through a registry. The mechanism varies by language, but the concept is the same: register implementations by key, and the pipeline looks them up at runtime.

Python uses entry points — the same mechanism that powers CLI tools and pytest plugins. Each implementation registers itself under a group name in pyproject.toml. The discovery module caches lookups so entry points are resolved once per key.

pyproject.toml
[project.entry-points."prompty.renderers"]
jinja2 = "prompty.renderers.jinja2:Jinja2Renderer"
mustache = "prompty.renderers.mustache:MustacheRenderer"
[project.entry-points."prompty.parsers"]
prompty = "prompty.parsers.prompty:PromptyChatParser"
[project.entry-points."prompty.executors"]
openai = "prompty.providers.openai.executor:OpenAIExecutor"
foundry = "prompty.providers.foundry.executor:FoundryExecutor"
azure = "prompty.providers.foundry.executor:FoundryExecutor"
anthropic = "prompty.providers.anthropic.executor:AnthropicExecutor"
[project.entry-points."prompty.processors"]
openai = "prompty.providers.openai.processor:OpenAIProcessor"
foundry = "prompty.providers.foundry.processor:FoundryProcessor"
azure = "prompty.providers.foundry.processor:FoundryProcessor"
anthropic = "prompty.providers.anthropic.processor:AnthropicProcessor"
GroupResolved fromExample keys
Renderersagent.template.format.kindjinja2, mustache
Parsersagent.template.parser.kindprompty
Executorsagent.model.provideropenai, foundry, azure, anthropic
Processorsagent.model.provideropenai, foundry, azure, anthropic

You can write your own implementation for any stage by implementing the corresponding protocol and registering it as an entry point.

Each protocol defines sync and async methods. Here’s an example custom executor:

from __future__ import annotations
from prompty.core.types import Message
class AnthropicExecutor:
"""Executor for the Anthropic Claude API."""
def execute(self, agent, messages: list[Message]) -> object:
import anthropic
client = anthropic.Anthropic()
return client.messages.create(
model=agent.model.id,
messages=[{"role": m.role, "content": m.content} for m in messages],
)
async def execute_async(self, agent, messages: list[Message]) -> object:
import anthropic
client = anthropic.AsyncAnthropic()
return await client.messages.create(
model=agent.model.id,
messages=[{"role": m.role, "content": m.content} for m in messages],
)

In your package’s pyproject.toml:

[project.entry-points."prompty.executors"]
anthropic = "my_package.executor:AnthropicExecutor"

After installing/registering the provider, any .prompty file with model.provider: "anthropic" will automatically route to your executor.

model:
id: claude-sonnet-4-20250514
provider: anthropic
connection:
kind: key
apiKey: ${env:ANTHROPIC_API_KEY}
from prompty import invoke
result = invoke("claude-chat.prompty", inputs={"question": "Hello!"})