Cost Control Setup
Set up budgets, cost optimizer rules, and alerts to keep your AI spending under control. Time: ~15 minutes.
What You Will Set Up
- Per-agent budget limits with automatic enforcement
- Cost optimizer rules to route to cheaper models when possible
- Spending alerts so you know before you hit limits
- A complete cost monitoring dashboard
Why Cost Control Matters
AI API calls can add up quickly, especially with GPT-4 class models. A single misconfigured agent loop can burn through hundreds of dollars. Setting up budgets and alerts from day one is essential for production deployments.
Step-by-Step Guide
Set Up a Daily Budget
Create a daily spending limit for an agent. When the limit is hit, the agent is blocked from running.
curl -X POST http://localhost:8000/api/v1/budgets \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "production-daily",
"agent_id": "AGENT_ID",
"period": "daily",
"limit_usd": 25.00,
"action_on_exceed": "block"
}'
# action_on_exceed options:
# "block" - Stops the agent from running (safest)
# "throttle" - Slows down execution rate
# "alert_only" - Sends alert but allows continued runsSet Up a Monthly Budget
Add a monthly budget as a safety net in addition to the daily limit.
curl -X POST http://localhost:8000/api/v1/budgets \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "production-monthly",
"agent_id": "AGENT_ID",
"period": "monthly",
"limit_usd": 500.00,
"action_on_exceed": "block"
}'Check Current Spending
Monitor how much an agent has spent in the current period.
curl http://localhost:8000/api/v1/budgets/BUDGET_ID/usage \
-H "Authorization: Bearer YOUR_API_KEY"
# Response:
# {
# "budget_name": "production-daily",
# "period": "daily",
# "limit_usd": 25.00,
# "used_usd": 3.47,
# "remaining_usd": 21.53,
# "percentage_used": 13.88,
# "resets_at": "2025-01-16T00:00:00Z"
# }Create Cost Optimizer Rules
Automatically route simple queries to cheaper models to save money.
# Rule: Use GPT-4o-mini for short, simple queries
curl -X POST http://localhost:8000/api/v1/cost/rules \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "simple-query-savings",
"agent_id": "AGENT_ID",
"condition": {
"input_tokens_less_than": 100,
"complexity": "low"
},
"action": {
"override_model": "gpt-4o-mini"
},
"description": "Route simple queries to cheaper model"
}'
# Rule: Use cached responses for repeated questions
curl -X POST http://localhost:8000/api/v1/cost/rules \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "cache-repeated",
"agent_id": "AGENT_ID",
"condition": {
"similarity_to_recent": 0.95
},
"action": {
"use_cached_response": true
},
"description": "Serve cached response for near-duplicate queries"
}'View Cost Savings
Check how much the cost optimizer has saved you.
curl http://localhost:8000/api/v1/cost/savings \
-H "Authorization: Bearer YOUR_API_KEY"
# Response:
# {
# "period": "last_30_days",
# "total_without_optimization": 142.50,
# "total_with_optimization": 87.30,
# "savings_usd": 55.20,
# "savings_percentage": 38.7,
# "rules_triggered": 1247
# }Set Up Governance Policies
Add token limits per run to prevent runaway agents.
curl -X POST http://localhost:8000/api/v1/governance/policies \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "token-safety",
"agent_id": "AGENT_ID",
"rules": [
{ "type": "token_limit", "max_tokens_per_run": 4000 },
{ "type": "max_runs_per_hour", "limit": 100 }
]
}'Subscribe to Cost Alerts
Get notified when spending reaches certain thresholds.
# Alert at 80% of daily budget
curl -X POST http://localhost:8000/api/v1/events/subscriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"event_type": "budget.threshold_reached",
"config": {
"agent_id": "AGENT_ID",
"threshold_percent": 80
},
"webhook_url": "https://your-server.com/alerts"
}'Dashboard Monitoring
The dashboard provides visual cost monitoring tools:
- Budgets tab: View all budgets with progress bars showing current usage vs. limits
- Cost Optimizer tab: See rules, savings charts, and model routing statistics
- Analytics tab: Spending trends over time, cost-per-agent breakdown
- Governance tab: Policy violations and health scores
Cost Optimization Recommendations
| Strategy | Expected Savings | Trade-off |
|---|---|---|
| Model downgrade for simple queries | 30-50% | Slightly lower quality on simple tasks |
| Response caching | 20-40% | Stale answers for rapidly changing data |
| Token limits per run | 10-20% | May truncate complex responses |
| Rate limiting | Variable | Slower response times during peaks |
| Off-peak scheduling | 5-15% | Delayed processing for non-urgent tasks |
Start Conservative
Begin with action_on_exceed: "block" for all budgets. Switch to "alert_only" only after you understand your agent's typical spending patterns. It's better to block an agent than to receive an unexpected bill!