TraceVerse Community

The fastest way to know what your AI agent is actually doing โ€” and prove it on a public leaderboard.

You wrote an agent. It works. Sometimes. It calls an LLM, it calls a tool, sometimes it loops, occasionally it spends โ‚น400 on a single user query and you have no idea why.

This org exists to fix that. Open source, framework-agnostic, built so you can go from git clone to a traced agent with a leaderboard rank in under five minutes.

๐Ÿ”— Discord ยท GitHub ยท genai-otel-instrument ยท SmolTrace


Get a traced agent in 30 seconds

# pip install genai-otel-instrument
from genai_otel_instrument import instrument

instrument(
    service_name="my-first-agent",
    otlp_endpoint="http://localhost:4318",   # or point at the public TraceMind Space
    redact_pii=True,                         # PII off your traces by default
)

# That's it. Run your agent. Every LLM call, tool call, token, rupee, and
# millisecond of latency is now visible.

No SDK lock-in. No daemons. No "you must use our framework." Works with LangGraph, CrewAI, OpenAI Agents SDK, AutoGen, smolagents, vanilla openai โ€” anything that hits an LLM API.


What we ship

Libraries

Project What you get
genai-otel-instrument One-line OpenTelemetry instrumentation for any GenAI agent. Captures LLM calls, tool calls, cost, tokens, latency. Auto-redacts PII by default.
SmolTrace Public benchmark + leaderboard for agent evals. Submit an agent, get a rank, compare on cost, latency, and quality.
TraceMind Hosted trace viewer. Point your OTLP endpoint at it, see what your agent did, where it broke, what it cost. No signup.
TraceMind-mcp-server An MCP server so your agent can query its own historical traces. Meta-observability for self-improving agents.

Live MCP servers (3 servers ยท 18 tools ยท synthetic data ยท no API key)

Surface Space Tools
Food delivery food-delivery-mcp 7
Grocery / Instamart instamart-mcp 6
Dineout / Reservations dineout-mcp 5

Eval datasets (SmolTrace-format)

Cross-domain SmolTrace datasets

For evaluation across other domains, see the TraceMind-AI Collection โ€” 41 SmolTrace-format datasets covering:

Same SmolTrace schema, same prompt-template structure as ours. Use them directly โ€” no need to mirror.

Reference agents + docs


What you'll get from this stack


Who this is for


What we believe

  1. Observability is a precondition for serious agent work. You cannot improve what you cannot see.
  2. Evaluation should be reproducible and public. Benchmarks that live in private notebooks help no one.
  3. Cost and latency are first-class signals. Quality without cost discipline is a research demo, not a product.
  4. The toolkit must work the same on localhost as in production. No magic that only kicks in on day 30.

Community


Roadmap


Production-grade companion

Need this stack on-premises with autonomous root-cause analysis, compliance audit trails, multi-year retention, and air-gapped deployment? TraceVerse Enterprise is the bigger sibling built for regulated environments โ€” same telemetry contract, hardened for the bank floor.


Get involved