An Auto Scaling Group (ASG) doesn't scale because it "feels" busy. It scales because a CloudWatch Alarm crossed a threshold and told it to. Understanding that chain — metric, alarm, policy, action — is the difference between an ASG that reacts correctly under load and one that either scales too late or never triggers at all.
The four pieces that have to exist
An ASG needs four things wired together correctly before any scaling decision can happen: a Launch Template (what to launch — AMI, instance type, security groups), the ASG itself (min/max/desired capacity, which subnets), a CloudWatch metric being collected (most commonly CPUUtilization, but it can be a custom metric), and a Scaling Policy that defines what to do when that metric crosses a threshold.
Continue reading
Unlock the full mechanics of cooldowns, the load test results, and where target tracking beats simple alarms.
The two ways to define a scaling policy
Simple/Step Scaling (alarm-driven)
You manually create a CloudWatch Alarm — "if average CPUUtilization across the group exceeds 70% for 2 consecutive 1-minute periods" — and attach a Scaling Policy to it that says "add 2 instances" when that alarm fires. This is explicit and predictable, but you're choosing the threshold and the response size by hand, and it doesn't automatically scale back down when load drops unless you also configure a separate scale-in alarm and policy.
Target Tracking Scaling
You instead state a target value — "keep average CPUUtilization at 50%" — and AWS manages the underlying CloudWatch alarms and instance count adjustments automatically, scaling both out and in to hold that target. This is the option used for the HPA-style load testing — set a target, let AWS handle the alarm math, and just observe the resulting instance count.
The mechanism, step by step
For target tracking, the actual sequence on a scale-out event is:
1. CloudWatch continuously aggregates CPUUtilization
across all instances in the ASG (1-minute granularity
by default)
2. CloudWatch evaluates this against the target tracking
policy's hidden alarm threshold
3. When average utilization exceeds the target for the
evaluation period, the alarm enters ALARM state
4. The Auto Scaling service receives the alarm state change
5. ASG calculates how many instances to add, based on how
far above target the metric is — not just "add 1"
6. New instance(s) launch from the Launch Template
7. New instance(s) pass health checks (ELB or EC2 status
checks) and join the load balancer's target group
8. Traffic begins routing to the new instance(s)
The load test, with real numbers
Starting state: 2 running instances behind a Classic Load Balancer, target tracking policy set to 50% average CPU. A synthetic load generator was pointed at the ALB to push sustained CPU load past the target.
Observed timeline:
- T+0: Load test starts. Average CPU across both instances begins climbing past 50%.
- T+2 min: Average CPU sustained above target for the evaluation period — CloudWatch alarm transitions to ALARM.
- T+2 min: ASG desired capacity increases from 2 to 4 (the policy calculated the gap between current and target load and scaled proportionally, not by a flat +1).
- T+3–4 min: Two new instances launch, pass health checks, and register with the load balancer.
- T+5 min: Average CPU across all 4 instances settles back near the 50% target as load distributes across more capacity.
This 2→4 scale-out, observed live, confirms the policy was reading real-time metrics and reacting proportionally — not just flipping a binary "add one instance" switch.
Cooldown periods: the setting that prevents thrashing
A Cooldown Period is a window (default 300 seconds for simple scaling) during which the ASG suppresses additional scaling activities after one just happened, even if the metric still looks like it warrants another change. Without this, a noisy metric bouncing above and below a threshold could trigger a launch, then a termination, then another launch within minutes — "flapping" — which wastes money on launch/termination cycles and never lets the system stabilize.
Target tracking policies handle this differently and generally better: instead of a flat cooldown timer, they use a continuously recalculated target and naturally avoid flapping because the scaling math itself smooths toward the target rather than reacting to single threshold crossings.
Why CPU isn't always the right metric
CPUUtilization is the default because it's always available with zero setup, but it's a poor proxy for load on anything I/O-bound — a web app waiting on database queries can have low CPU and still be failing requests. For those cases, a custom CloudWatch metric (request count per target, queue depth, or response latency published from the application itself) makes a far more honest scaling trigger than CPU ever will.
The health check detail that catches people out
A newly launched instance only starts receiving real traffic after it passes its configured health check — and if that health check is hitting an application endpoint that takes time to warm up (loading a cache, establishing DB connections), the ASG's Health Check Grace Period needs to be long enough to cover that warm-up, or the instance can get marked unhealthy and replaced in a loop before it ever gets the chance to serve a single request.
An Auto Scaling Group is only as good as the metric driving it. The launch template and the policy are the easy half; choosing a metric that actually reflects when your application is struggling is the half that takes real observation to get right.