# Pydantic Logfire: the full overview

Logfire is general-purpose observability with first-class AI support. It is the point where the Pydantic Stack converges. Your LLM calls are one part of a larger application, so Logfire instruments the whole app (traces, metrics, logs) and adds views built for the AI-specific parts on top of the same data.

The story is a loop, not a feature list:

build (Pydantic AI) -> observe (Logfire) -> evaluate (Pydantic Evals) -> optimize (Logfire) -> control cost (Pydantic AI Gateway)

Every stage reports back into Logfire, so the loop closes in one place. One identity, one audit trail, one bill. Underneath it is standard OpenTelemetry, so there is no lock-in, and you can run arbitrary SQL over every trace, span, log, and metric.

Maturity labels in this document are honest. `GA` means generally available. `Preview` means shipped but flag-gated or advisory. `Beta` means available with rough edges. Where something is not exposed over the MCP server yet, it says so.

---

## The Pydantic Stack

Logfire reads as one part of a system. The five products:

- **Pydantic** -- data validation. The type layer the rest of the stack is built on. https://pydantic.dev/
- **Pydantic AI** -- the agent framework. A small stable core plus a Harness of optional capabilities. https://pydantic.dev/docs/ai/overview/
- **Pydantic Logfire** -- observability with first-class AI support. Where the stack converges. https://pydantic.dev/docs/logfire/
- **Pydantic Evals** -- evaluation. The same Evaluator classes run offline in CI and online in production. https://pydantic.dev/docs/ai/evals/evals/
- **Pydantic AI Gateway** -- routing and cost control. Now inside Logfire. https://pydantic.dev/docs/ai/overview/gateway/

"Building agent stacks" means composition over a small stable core. The core stays small and stable; everything else lives in the Harness, where it can move fast.

---

## The improvement loop

These capabilities turn the traces you already send into a way to make the agent better.

### Online evaluations (GA)

The job: you shipped an agent and want to know if its answers are getting worse in production, without blocking the user while you check.

Background evaluators (an LLM judge, or an assertion/heuristic) grade production calls after they return. Per-evaluator pass-rate trends stream into an Evals live-monitoring view. Results ride on OpenTelemetry spans as `gen_ai.evaluation.result`, so you query the scores with SQL like any other telemetry. This is what "your evals become a query surface, not just a dashboard" means. Use the same Evaluator classes you wrote for offline evals; a `sample_rate` keeps eval cost bounded. SDK: `pydantic_evals.online` `@evaluate(...)` plus the `OnlineEvaluation` agent capability.

Docs: https://pydantic.dev/docs/ai/evals/online-evaluation/

### Offline evals: datasets and experiments (GA)

The job: before you ship a prompt change, run it against a fixed set of cases and see whether the scores went up or down.

A dataset-centric workflow: datasets hold cases, experiments run evaluators across them, and you compare results, inspect schema, and apply labels. The UI can help generate cases. Everything exports to the pydantic-evals format, so the same suite runs in code, in CI, or on demand. The dataset you build in the UI is the same one your CI runs against.

Docs: https://pydantic.dev/docs/ai/evals/evals/ and https://pydantic.dev/docs/ai/evals/getting-started/quick-start/

### Managed prompts and variables (GA)

The job: change a prompt in production without a redeploy, roll it out to a fraction of traffic first, and roll back in one click if it misbehaves.

Typed runtime config that lives outside your code. Variables support versioning, labeled / canary / A-B rollouts, and instant rollback. Prompts are managed variables, composed with `@{fragment}@` references and `{{handlebars}}` templating. Trace baggage records which version served each request, so you can attribute an outcome (or an eval score) back to a specific prompt version. SDK: `logfire.var()`; a UI editor; authorable over MCP with the `variable_*` and `prompt_*` tools.

Docs: https://pydantic.dev/docs/logfire/manage/managed-variables/

### Optimizer (preview)

The job: your eval scores dropped and you do not want to read 200 traces by hand to figure out what to change.

An AI agent inside Logfire reads recent failing or low-quality traces and proposes an improved managed-variable value or agent system prompt. Each proposal is justified against the specific trace evidence it read, so a human checks the reasoning before applying.

Maturity: the managed-variable optimizer has shipped. The agent system-prompt optimizer is advisory and flag-gated. The optimizer is UI and backend only; it is not exposed over the MCP server yet.

Docs: https://pydantic.dev/docs/logfire/manage/managed-variables/

### Workflows, also called Logfire agents (preview)

The job: have an agent check "what errored in the last hour" every hour and post a summary, instead of you watching dashboards.

User-defined, scheduled Pydantic AI agents that run an observability task against your project telemetry through the Logfire MCP toolset. They persist memory across runs and emit a markdown report or structured SRE findings, and can notify channels or fire alerts.

Maturity: shipped on staging. The workflows runner is UI and backend only; it is not exposed over the MCP server yet.

Docs: https://pydantic.dev/docs/logfire/

---

## Foundational observability

The base layer. Standard OpenTelemetry underneath, arbitrary SQL on top, and a live view of every trace. This part is not AI-specific.

### Arbitrary SQL over telemetry (GA)

The job: a specific slow request is hurting p95 latency, and you want to slice it by route, model, and customer, with a join, not a fixed dashboard filter.

Query all of your telemetry with real SQL on the Explore page or through the query API. The engine is Apache DataFusion with Postgres-like syntax, so you write the query you would write against a database, including joins and aggregates, over traces, spans, logs, and metrics. This is the signature "only Logfire" differentiator, and most other capabilities here write spans you can query the same way.

Docs: https://pydantic.dev/docs/logfire/observe/explore/ , SQL reference https://pydantic.dev/docs/logfire/reference/sql/

### Traces as logs, live view (GA)

The job: something just broke and you want to watch requests arrive in real time, then click into the one that failed.

A progressive, streaming live view over WebSocket. Traces render as a live log stream that you filter by level, service, scope, and tags, then drill into a single span. It also converts a natural-language query into SQL search. Live and historical traces sit in the same view. Your first instrumented run shows up here within seconds, which is how you confirm setup worked.

Docs: https://pydantic.dev/docs/logfire/observe/live/

### OpenTelemetry-wrapping SDK (GA)

The job: good tracing without betting your stack on a proprietary agent, instrumenting the whole app rather than only the LLM calls.

The Logfire SDK is a thin ergonomic layer over OpenTelemetry. You get standard OTel underneath, so existing OTel instrumentation works and you keep ecosystem compatibility, with no vendor lock-in. It handles traces, metrics, and logs for the whole application. For Pydantic AI, one call, `logfire.instrument_pydantic_ai()`, captures agent runs, tool usage, and execution flow.

Docs: https://pydantic.dev/docs/logfire/ , Pydantic AI integration https://pydantic.dev/docs/logfire/integrations/llms/pydanticai/

### Query API (GA)

Export telemetry programmatically over HTTP. Region endpoints at `https://logfire-us.pydantic.dev/v2/query` and `https://logfire-eu.pydantic.dev/v2/query`. Python clients (`LogfireQueryClient`, `AsyncLogfireQueryClient`) return JSON, Arrow, or CSV, and there is a PEP 249 (DB API 2.0) interface for pandas, marimo, and Jupyter SQL magic.

Docs: https://pydantic.dev/docs/logfire/manage/query-api/

---

## AI and LLM views

Views built for the AI-specific parts of your app, on top of the same telemetry.

### LLMs and providers page (GA)

The job: know which model costs the most, which errors most, and which is slow, broken down by provider.

A per-(provider, model) breakdown: calls, error rate, latency, throughput, token mix, cost, truncation rate, and tool-call rate. Drill into a model for a timeseries and recent calls, each linking back to the live view. This is where you compare candidate models on real cost and latency before you commit.

Docs: https://pydantic.dev/docs/logfire/observe/llms/

### LLM Playground (GA)

The job: test a prompt against a few models side by side before wiring it into the agent.

An interactive playground, part of the Gateway, for trying prompts and models. Because it runs through the Gateway, the calls are routed and observed the same way your production calls are, so behavior matches. Prototype here, then promote the winner into a managed variable.

Docs: https://pydantic.dev/docs/ai/overview/gateway/

---

## AI Gateway, now inside Logfire

The routing and cost-control layer. The gateway merge makes convergence literal: one identity, one audit trail, one bill.

### Multi-provider routing (GA)

The job: switch providers, or fail over when one is down, without changing application code or juggling several SDKs.

A multi-provider proxy with cross-provider routing and bring-your-own-key, reaching OpenAI, Anthropic, Google, Groq, and AWS Bedrock through one path. Routing groups handle failover and load balancing. Adaptive routing is in preview.

### Hierarchical spend limits (GA)

The job: a hard ceiling so a runaway agent or a single user cannot run up an unbounded bill.

Spend caps that nest from organization down through project, user, and session, with caching and OIDC trust policies. You set a ceiling and watch spend against it in real time, for example $3.42 used of a $20 cap. Set a session cap before you let an autonomous agent loop.

### Keyless tool runs and BYOK (GA)

The job: run a coding agent or CLI tool without scattering provider API keys across machines.

`logfire gateway launch` runs coding tools through the Gateway with no API keys on disk. Keys live behind the Gateway, so access is one identity with one audit trail.

### PII guardrails (beta)

The job: stop personal data from reaching a provider, or redact it, before the call leaves your trust boundary.

Guardrails at the Gateway that redact or block PII, using Presidio or regex rules. They run at the proxy, so the policy sits at the trust boundary and applies to every app routing through the Gateway. Behind a beta flag.

Docs for the Gateway: https://pydantic.dev/docs/ai/overview/gateway/ , migration https://pydantic.dev/docs/logfire/manage/gateway-migration/

---

## Operate and monitor

The day-two surface, all driven by the same SQL.

### Alerts (GA)

SQL queries over your telemetry that notify a channel when their conditions are met. Choose whether to fire on any results, on transitions, only when a problem begins, or when result data changes. Channels include Slack and Discord webhooks and Opsgenie. Because an alert is just a query, anything you can query you can alert on, including a falling eval pass-rate. Alerts v2 is flag-gated. Docs: https://pydantic.dev/docs/logfire/observe/alerts/

### Issues (GA)

Logfire detects exceptions and groups similar ones by fingerprint into issues. Each issue moves through open, resolved, and ignored states, supports bulk actions and AI-assisted debugging, and can notify channels. Thousands of raw exceptions collapse into a handful of triageable items. Docs: https://pydantic.dev/docs/logfire/observe/issues/

### Dashboards (GA)

Dashboards in the Perses format, with panels backed by SQL, per-panel variables, and layout groups. Standard dashboards (usage, exceptions, web-server metrics, LLM tokens and costs, system metrics) ship ready to use; custom dashboards are fully editable. Full CRUD is available over the MCP server. Docs: https://pydantic.dev/docs/logfire/observe/dashboards/

### Scheduled queries and saved searches (GA)

Saved SQL that runs on a schedule, in the same system as alerts and schedules, so a recurring query (a daily cost rollup, a recurring health check) can drive a notification.

### Metrics and infrastructure (GA)

A metrics page with `metric_*` SQL functions, Services RED metrics (rate, errors, duration, with p95 and p99 sparklines), and Hosts and Kubernetes pages for infrastructure telemetry. This is the general-purpose half of Logfire that has nothing to do with LLMs, and it lets you correlate an LLM latency spike with host pressure without switching tools. Docs: https://pydantic.dev/docs/logfire/instrument/add-metrics/

---

## Automation surface for agents

Logfire is itself addressable by agents.

### Logfire MCP server (GA)

A hosted MCP server that exposes most of the platform as tools: `query_*` (SQL over telemetry), `dashboard_*`, `alert_*`, `channel_*`, `issue_*`, `prompt_*`, `variable_*`, `schedule_*`, `project_*`, and a dev session. Connect it from clients like Cursor, Claude, and VS Code. An agent can run SQL, build a dashboard, and edit a managed prompt through one server.

Not over MCP yet: the Optimizer and the Workflows runner are UI and backend only.

Docs: https://pydantic.dev/docs/logfire/guides/mcp-server/

---

## How to build an agent stack with Logfire

A concrete path from a bare agent to a self-improving, cost-capped stack. Each step uses a capability above.

### Step 1: instrument the agent

Build with Pydantic AI. Call `logfire.instrument_pydantic_ai()`. Agent runs stream into Logfire as OpenTelemetry traces. Because the SDK wraps OTel, the rest of the application (HTTP, database, queues) is instrumented the same way. Uses: OpenTelemetry-wrapping SDK.

### Step 2: see traces, then query them

Watch the first runs in the live view to confirm setup. Move to Explore and write SQL over the spans to answer real questions (slowest route, most expensive model, error rate by customer). Check cost, latency, and token mix on the LLMs page. Uses: live view, SQL over telemetry, LLMs page.

### Step 3: add online evals

Attach `OnlineEvaluation` as an agent capability with a `sample_rate`. Evaluators grade live calls in the background and write `gen_ai.evaluation.result` spans, so pass-rate becomes a query and, if you want, an alert. Reuse the Evaluator classes from your offline dataset so online and offline scoring match. Uses: online evaluations, offline evals.

### Step 4: externalize prompts as managed variables

Move system prompts and config out of code into `logfire.var()`. Version them, roll out by label or canary, and roll back in one click. Each request records the variable version that served it, so a later eval score attributes to a specific prompt. Uses: managed prompts and variables.

### Step 5: let the optimizer suggest fixes

When eval scores drop, the optimizer reads the failing traces and proposes an improved variable value or system prompt, justified against the evidence. You review the proposal and the traces behind it before applying. Schedule a workflow to run routine investigation (for example, triage errors hourly) on its own. Uses: optimizer, workflows. Both are preview and UI/backend only today.

### Step 6: route and cap cost through the Gateway

Point the agent at the Gateway for cross-provider routing and BYOK. Set hierarchical spend caps (org, project, user, session) so cost stays bounded. Turn on PII guardrails once real user data flows. Run local coding tools with `logfire gateway launch` so no provider keys sit on disk. Routing and spend report back into Logfire, which closes the loop. Uses: multi-provider routing, hierarchical spend limits, keyless tool runs, PII guardrails.

### What you end up with

An agent whose every run is a queryable trace, whose output is scored in production, whose prompts change without a redeploy, whose regressions come with a proposed fix grounded in evidence, and whose cost is capped and routed across providers, all observed in one place. That is the loop, closed in Logfire.

---

## Canonical docs

- Logfire: https://pydantic.dev/docs/logfire/
- Pydantic AI: https://pydantic.dev/docs/ai/overview/
- Pydantic Evals: https://pydantic.dev/docs/ai/evals/evals/
- Pydantic AI Gateway: https://pydantic.dev/docs/ai/overview/gateway/
- The company and the stack: https://pydantic.dev/
