Dashboard

Autoscaling

Automatically scale agent workloads based on queue depth, CPU usage, or custom metrics. Ensure your agents can handle traffic spikes.

What is Autoscaling?

Autoscaling automatically adjusts the number of worker instances based on demand. When the queue grows, more workers spin up. When demand drops, workers scale down to save resources.

Create a Scaling Rule

Create Scaling Rulebash
curl -X POST http://localhost:8000/api/v1/scaling/rules \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "name": "runtime-worker-scaling",
    "metric": "queue_depth",
    "target_value": 10,
    "min_instances": 1,
    "max_instances": 10,
    "cooldown_seconds": 300,
    "scale_up_step": 2,
    "scale_down_step": 1
  }'

Available Metrics

MetricDescription
queue_depthNumber of pending runs in the Celery queue
cpu_usageCPU utilization percentage of worker nodes
memory_usageMemory utilization percentage
latency_p9999th percentile response latency

Start with queue_depth as your primary metric. Scale up when queue exceeds 10 items, scale down when below 2. Set a 5-minute cooldown to avoid thrashing.