LLM evals & observability stack — ship changes with confidence
LangSmith, Langfuse, Braintrust, Arize Phoenix, Helicone, Galileo, Patronus — where teams look when prompts become production systems.
Tools in this cluster
- LangSmith
LangChain’s eval and trace platform for LLM apps—datasets, scorers, live monitoring, and human review with the deepest LangChain/LangGraph integration.
Evals / Observability - Langfuse
Open-source LLM observability and eval platform with traces, datasets, scorers, and prompt management—self-host via Docker to keep data on your own network.
Evals / Observability - Braintrust
Braintrust: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Evals / Observability - Arize Phoenix
Arize Phoenix: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Evals / Observability - Helicone
Helicone: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Evals / Observability - Galileo
Galileo: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Evals / Observability - Patronus AI
Patronus AI: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Evals / Observability - Weights & Biases
Weights & Biases: popular AI product—see the official site for features, pricing, supported regions, data handling, and latest model lineup.
Learning / Data
Related MCP servers
Configure in Claude, Cursor, or Zed (any MCP client) to give agents access to external tools, data sources, or execution environments.
- PostgreSQLOfficial
Reference MCP server for read-only Postgres access and schema introspection—ideal for data-analysis agents. Lock down with a read-only role and schema allow-list.
Database / DatastdioModel Context Protocol - SentryOfficial
Sentry: official MCP server from Model Context Protocol—confirm version, auth scopes, and transport with the upstream docs before production use.
Cloud / DevOpsstdioModel Context Protocol