Replay & Audit
The hardest part of building AI agents isn’t making them work in development—it’s debugging them in production when they fail in unexpected ways. StateBase’s replay and audit system gives you complete visibility into every decision your agent makes.The Production Debugging Problem
Traditional debugging doesn’t work for AI agents:- Reproduce the exact conversation that led to the failure
- Understand why the agent made each decision
- Test fixes without affecting live users
Replay: Time-Travel Debugging
Replay lets you recreate the exact state of a conversation at any point in time, then fork it to test fixes.How It Works
Every session in StateBase stores:- All turns (input/output pairs)
- All state versions (snapshots after each update)
- All traces (which operations were performed)
Replay in the Dashboard
The StateBase Dashboard provides a visual replay interface:- Navigate to the failed session
- Click the “Replay” tab
- Scrub through the conversation timeline
- Click “Fork from here” to create a debug session
- Test your fix in the forked session
Audit Trails: Understanding Decisions
Every operation in StateBase creates an audit trace that explains why something happened.What Gets Traced?
| Operation | Trace Created | Information Logged |
|---|---|---|
sessions.create() | ✅ | agent_id, user_id, initial_state |
sessions.update_state() | ✅ | reasoning, state_diff, actor |
sessions.add_turn() | ✅ | input, output, reasoning, metadata |
memory.add() | ✅ | content, type, tags, session_id |
sessions.rollback() | ✅ | from_version, to_version, reason |
sessions.fork() | ✅ | source_session, fork_version |
Viewing Traces
The Reasoning Field: Your Debug Log
Every state update and turn should include a reasoning field:Reasoning Best Practices
Debugging Patterns
Pattern 1: Root Cause Analysis
When a session fails, work backwards through the traces:Pattern 2: Comparative Analysis
Compare a successful session with a failed one:Pattern 3: Regression Testing
After fixing a bug, replay the original failure to confirm it’s fixed:Compliance & Audit Requirements
For regulated industries (healthcare, finance), StateBase’s audit trails provide compliance-ready logs:HIPAA Compliance
SOC 2 Compliance
GDPR Right to Explanation
Performance Monitoring
Use traces to measure agent performance:Common Metrics to Track
| Metric | How to Calculate | Healthy Range |
|---|---|---|
| Avg Response Time | sum(latency_ms) / count(turns) | < 2000ms |
| Rollback Rate | count(rollbacks) / count(sessions) | < 2% |
| Tool Call Success Rate | successful_calls / total_calls | > 95% |
| Session Completion Rate | completed / total_sessions | > 80% |
Instant Replay: The Killer Feature
StateBase’s Instant Replay lets you fork any session from any point in time with one click in the Dashboard:- Open a session in the Dashboard
- Navigate to the “State History” tab
- Click “Fork” next to any state version
- A new session is created, starting from that exact state
- Test your fix in the forked session
- Debug production issues without touching live sessions
- A/B test prompts on real user conversations
- Train new models on historical data
- Reproduce edge cases for regression testing
Best Practices
✅ Do This
- Always include reasoning in state updates and turns
- Log metadata (tool calls, latency, model used) for analytics
- Use forking for debugging (never modify production sessions)
- Set up alerts on high rollback rates or slow response times
- Archive traces for compliance (7 years for HIPAA)
❌ Avoid This
- Don’t skip turn logging (you’ll regret it when debugging)
- Don’t log sensitive data in reasoning fields (use metadata with encryption)
- Don’t delete traces (they’re your audit trail)
- Don’t ignore rollback patterns (they indicate systemic issues)
Dashboard Features
The StateBase Dashboard provides visual tools for replay and audit:Session Timeline
- Visual timeline of all turns and state changes
- Hover to preview state at any point
- Click to fork from any version
Trace Explorer
- Filter by action type (state updates, tool calls, rollbacks)
- Search by reasoning (find all “API timeout” traces)
- Export to CSV for external analysis
Performance Dashboard
- Real-time metrics (latency, success rate, rollback rate)
- Alerts for anomalies
- Historical trends (compare this week vs last week)
Next Steps
- Failure Modes: Learn common agent failure patterns
- Debugging Demo: Watch a real debugging session
- Production Playbook: Incident response strategies
Key Takeaway: Replay and audit aren’t just debugging tools—they’re your insurance policy for production AI. When (not if) your agent fails, you’ll have everything you need to understand why and fix it fast.