The fastest way to know what your AI agent is actually doing โ and prove it on a public leaderboard.
You wrote an agent. It works. Sometimes. It calls an LLM, it calls a tool, sometimes it loops, occasionally it spends โน400 on a single user query and you have no idea why.
This org exists to fix that. Open source, framework-agnostic, built so you can go from git clone to a traced agent with a leaderboard rank in under five minutes.
๐ Discord ยท GitHub ยท genai-otel-instrument ยท SmolTrace
# pip install genai-otel-instrument
from genai_otel_instrument import instrument
instrument(
service_name="my-first-agent",
otlp_endpoint="http://localhost:4318", # or point at the public TraceMind Space
redact_pii=True, # PII off your traces by default
)
# That's it. Run your agent. Every LLM call, tool call, token, rupee, and
# millisecond of latency is now visible.
No SDK lock-in. No daemons. No "you must use our framework." Works with LangGraph, CrewAI, OpenAI Agents SDK, AutoGen, smolagents, vanilla openai โ anything that hits an LLM API.
| Project | What you get |
|---|---|
genai-otel-instrument |
One-line OpenTelemetry instrumentation for any GenAI agent. Captures LLM calls, tool calls, cost, tokens, latency. Auto-redacts PII by default. |
SmolTrace |
Public benchmark + leaderboard for agent evals. Submit an agent, get a rank, compare on cost, latency, and quality. |
TraceMind |
Hosted trace viewer. Point your OTLP endpoint at it, see what your agent did, where it broke, what it cost. No signup. |
TraceMind-mcp-server |
An MCP server so your agent can query its own historical traces. Meta-observability for self-improving agents. |
| Surface | Space | Tools |
|---|---|---|
| Food delivery | food-delivery-mcp |
7 |
| Grocery / Instamart | instamart-mcp |
6 |
| Dineout / Reservations | dineout-mcp |
5 |
| Dataset | Tasks |
|---|---|
food-delivery-evals |
111 |
instamart-evals |
100 |
dineout-evals |
100 |
For evaluation across other domains, see the TraceMind-AI Collection โ 41 SmolTrace-format datasets covering:
Same SmolTrace schema, same prompt-template structure as ours. Use them directly โ no need to mirror.
food-delivery-agents โ the binding repo. Reference agents wired with genai-otel-instrument, architecture docs, observability primer, leaderboard CI.print().apply_promo" โ discuss on the Space's tab).genai-otel-instrument, SmolTrace, public TraceMind, TraceMind-mcp-server, 3 live MCP servers (food / grocery / dineout, 18 tools), 3 own eval suites (311 tasks total), 18 mirrored eval datasets, food-delivery-agents binding repo.agents.md standardization across all our Spaces.Need this stack on-premises with autonomous root-cause analysis, compliance audit trails, multi-year retention, and air-gapped deployment? TraceVerse Enterprise is the bigger sibling built for regulated environments โ same telemetry contract, hardened for the bank floor.
genai-otel-instrument on the agent you have right now.