2026-04-218 picks

LLM evals & observability stack — ship changes with confidence

LangSmith, Langfuse, Braintrust, Arize Phoenix, Helicone, Galileo, Patronus — where teams look when prompts become production systems.

This cluster groups the platforms people reach for once an LLM app stops being a toy: trace every call, replay failures, score with LLM-as-judge or human rubric, and watch cost and latency at P95. Each card opens a neutral blurb and a link to the vendor. Match depth of LangChain/LlamaIndex integration, self-host vs SaaS, and dataset-management model against your actual stack before committing.

Tools in this cluster

LangSmith
LangChain’s eval and trace platform for LLM apps—datasets, scorers, live monitoring, and human review with the deepest LangChain/LangGraph integration.
Evals / Observability
Langfuse
Open-source LLM observability and eval platform with traces, datasets, scorers, and prompt management—self-host via Docker to keep data on your own network.
Evals / Observability
Braintrust
Braintrust: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Evals / Observability
Arize Phoenix
Arize Phoenix: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Evals / Observability
Helicone
Helicone: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Evals / Observability
Galileo
Galileo: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Evals / Observability
Patronus AI
Patronus AI: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Evals / Observability
Weights & Biases
Weights & Biases: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Learning / Data

Related MCP servers

Configure in Claude, Cursor, or Zed (any MCP client) to give agents access to external tools, data sources, or execution environments.

PostgreSQL
Official
Reference MCP server for read-only Postgres access and schema introspection—ideal for data-analysis agents. Lock down with a read-only role and schema allow-list.
Database / DatastdioModel Context Protocol
Sentry
Official
Sentry: official MCP server from Model Context Protocol—confirm version, auth scopes, and transport with the upstream docs before production use.
Cloud / DevOpsstdioModel Context Protocol