Cost Control Setup

Set up budgets, cost optimizer rules, and alerts to keep your AI spending under control. Time: ~15 minutes.

What You Will Set Up

Per-agent budget limits with automatic enforcement
Cost optimizer rules to route to cheaper models when possible
Spending alerts so you know before you hit limits
A complete cost monitoring dashboard

Why Cost Control Matters

AI API calls can add up quickly, especially with GPT-4 class models. A single misconfigured agent loop can burn through hundreds of dollars. Setting up budgets and alerts from day one is essential for production deployments.

Step-by-Step Guide

Set Up a Daily Budget

Create a daily spending limit for an agent. When the limit is hit, the agent is blocked from running.

curl -X POST http://localhost:8000/api/v1/budgets \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "production-daily",
    "agent_id": "AGENT_ID",
    "period": "daily",
    "limit_usd": 25.00,
    "action_on_exceed": "block"
  }'

# action_on_exceed options:
# "block"      - Stops the agent from running (safest)
# "throttle"   - Slows down execution rate
# "alert_only" - Sends alert but allows continued runs

Set Up a Monthly Budget

Add a monthly budget as a safety net in addition to the daily limit.

curl -X POST http://localhost:8000/api/v1/budgets \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "production-monthly",
    "agent_id": "AGENT_ID",
    "period": "monthly",
    "limit_usd": 500.00,
    "action_on_exceed": "block"
  }'

Check Current Spending

Monitor how much an agent has spent in the current period.

curl http://localhost:8000/api/v1/budgets/BUDGET_ID/usage \
  -H "Authorization: Bearer YOUR_API_KEY"

# Response:
# {
#   "budget_name": "production-daily",
#   "period": "daily",
#   "limit_usd": 25.00,
#   "used_usd": 3.47,
#   "remaining_usd": 21.53,
#   "percentage_used": 13.88,
#   "resets_at": "2025-01-16T00:00:00Z"
# }

Create Cost Optimizer Rules

Automatically route simple queries to cheaper models to save money.

# Rule: Use GPT-4o-mini for short, simple queries
curl -X POST http://localhost:8000/api/v1/cost/rules \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "simple-query-savings",
    "agent_id": "AGENT_ID",
    "condition": {
      "input_tokens_less_than": 100,
      "complexity": "low"
    },
    "action": {
      "override_model": "gpt-4o-mini"
    },
    "description": "Route simple queries to cheaper model"
  }'

# Rule: Use cached responses for repeated questions
curl -X POST http://localhost:8000/api/v1/cost/rules \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "cache-repeated",
    "agent_id": "AGENT_ID",
    "condition": {
      "similarity_to_recent": 0.95
    },
    "action": {
      "use_cached_response": true
    },
    "description": "Serve cached response for near-duplicate queries"
  }'

View Cost Savings

Check how much the cost optimizer has saved you.

curl http://localhost:8000/api/v1/cost/savings \
  -H "Authorization: Bearer YOUR_API_KEY"

# Response:
# {
#   "period": "last_30_days",
#   "total_without_optimization": 142.50,
#   "total_with_optimization": 87.30,
#   "savings_usd": 55.20,
#   "savings_percentage": 38.7,
#   "rules_triggered": 1247
# }

Set Up Governance Policies

Add token limits per run to prevent runaway agents.

curl -X POST http://localhost:8000/api/v1/governance/policies \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "token-safety",
    "agent_id": "AGENT_ID",
    "rules": [
      { "type": "token_limit", "max_tokens_per_run": 4000 },
      { "type": "max_runs_per_hour", "limit": 100 }
    ]
  }'

Subscribe to Cost Alerts

Get notified when spending reaches certain thresholds.

# Alert at 80% of daily budget
curl -X POST http://localhost:8000/api/v1/events/subscriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "event_type": "budget.threshold_reached",
    "config": {
      "agent_id": "AGENT_ID",
      "threshold_percent": 80
    },
    "webhook_url": "https://your-server.com/alerts"
  }'

Dashboard Monitoring

The dashboard provides visual cost monitoring tools:

Budgets tab: View all budgets with progress bars showing current usage vs. limits
Cost Optimizer tab: See rules, savings charts, and model routing statistics
Analytics tab: Spending trends over time, cost-per-agent breakdown
Governance tab: Policy violations and health scores

Cost Optimization Recommendations

Strategy	Expected Savings	Trade-off
Model downgrade for simple queries	30-50%	Slightly lower quality on simple tasks
Response caching	20-40%	Stale answers for rapidly changing data
Token limits per run	10-20%	May truncate complex responses
Rate limiting	Variable	Slower response times during peaks
Off-peak scheduling	5-15%	Delayed processing for non-urgent tasks

Start Conservative

Begin with action_on_exceed: "block" for all budgets. Switch to "alert_only" only after you understand your agent's typical spending patterns. It's better to block an agent than to receive an unexpected bill!

Workflow Pipeline

Architecture