7 min read

EC2 Auto Scaling: What Actually Triggers a Scale-Out

CloudWatch alarms, cooldown periods, and a load test that scaled 2→4 instances in real time, with the metrics to prove it.

D
Dev  ·  Senior DevOps & SRE Engineer  ·  ap-south-1
Subscribed — full post unlocked

An Auto Scaling Group (ASG) doesn't scale because it "feels" busy. It scales because a CloudWatch Alarm crossed a threshold and told it to. Understanding that chain — metric, alarm, policy, action — is the difference between an ASG that reacts correctly under load and one that either scales too late or never triggers at all.

The four pieces that have to exist

An ASG needs four things wired together correctly before any scaling decision can happen: a Launch Template (what to launch — AMI, instance type, security groups), the ASG itself (min/max/desired capacity, which subnets), a CloudWatch metric being collected (most commonly CPUUtilization, but it can be a custom metric), and a Scaling Policy that defines what to do when that metric crosses a threshold.

Continue reading

Unlock the full mechanics of cooldowns, the load test results, and where target tracking beats simple alarms.

No spam. One confirmation email via SES. Unsubscribe anytime.

The two ways to define a scaling policy

Simple/Step Scaling (alarm-driven)

You manually create a CloudWatch Alarm — "if average CPUUtilization across the group exceeds 70% for 2 consecutive 1-minute periods" — and attach a Scaling Policy to it that says "add 2 instances" when that alarm fires. This is explicit and predictable, but you're choosing the threshold and the response size by hand, and it doesn't automatically scale back down when load drops unless you also configure a separate scale-in alarm and policy.

Target Tracking Scaling

You instead state a target value — "keep average CPUUtilization at 50%" — and AWS manages the underlying CloudWatch alarms and instance count adjustments automatically, scaling both out and in to hold that target. This is the option used for the HPA-style load testing — set a target, let AWS handle the alarm math, and just observe the resulting instance count.

The mechanism, step by step

For target tracking, the actual sequence on a scale-out event is:

1. CloudWatch continuously aggregates CPUUtilization
   across all instances in the ASG (1-minute granularity
   by default)
2. CloudWatch evaluates this against the target tracking
   policy's hidden alarm threshold
3. When average utilization exceeds the target for the
   evaluation period, the alarm enters ALARM state
4. The Auto Scaling service receives the alarm state change
5. ASG calculates how many instances to add, based on how
   far above target the metric is — not just "add 1"
6. New instance(s) launch from the Launch Template
7. New instance(s) pass health checks (ELB or EC2 status
   checks) and join the load balancer's target group
8. Traffic begins routing to the new instance(s)

The load test, with real numbers

Starting state: 2 running instances behind a Classic Load Balancer, target tracking policy set to 50% average CPU. A synthetic load generator was pointed at the ALB to push sustained CPU load past the target.

Observed timeline:

This 2→4 scale-out, observed live, confirms the policy was reading real-time metrics and reacting proportionally — not just flipping a binary "add one instance" switch.

Cooldown periods: the setting that prevents thrashing

A Cooldown Period is a window (default 300 seconds for simple scaling) during which the ASG suppresses additional scaling activities after one just happened, even if the metric still looks like it warrants another change. Without this, a noisy metric bouncing above and below a threshold could trigger a launch, then a termination, then another launch within minutes — "flapping" — which wastes money on launch/termination cycles and never lets the system stabilize.

Target tracking policies handle this differently and generally better: instead of a flat cooldown timer, they use a continuously recalculated target and naturally avoid flapping because the scaling math itself smooths toward the target rather than reacting to single threshold crossings.

Why CPU isn't always the right metric

CPUUtilization is the default because it's always available with zero setup, but it's a poor proxy for load on anything I/O-bound — a web app waiting on database queries can have low CPU and still be failing requests. For those cases, a custom CloudWatch metric (request count per target, queue depth, or response latency published from the application itself) makes a far more honest scaling trigger than CPU ever will.

The health check detail that catches people out

A newly launched instance only starts receiving real traffic after it passes its configured health check — and if that health check is hitting an application endpoint that takes time to warm up (loading a cache, establishing DB connections), the ASG's Health Check Grace Period needs to be long enough to cover that warm-up, or the instance can get marked unhealthy and replaced in a loop before it ever gets the chance to serve a single request.

An Auto Scaling Group is only as good as the metric driving it. The launch template and the policy are the easy half; choosing a metric that actually reflects when your application is struggling is the half that takes real observation to get right.