How Cloud Scaling Works

TechOps Examples

Hey — It's Govardhana MK 👋

Welcome to another technical edition.

Every Tuesday – You’ll receive a free edition with a byte-size use case, remote job opportunities, top news, tools, and articles.

Every Thursday and Saturday – You’ll receive a special edition with a deep dive use case, remote job opportunities, and articles.

👀 Remote Jobs

📚️ Resources

Looking to promote your company, product, service, or event to 48,000+ Cloud Native Professionals? Let's work together. Advertise With Us

🧠 DEEP DIVE USE CASE

How Cloud Scaling Works

I’ve seen most individuals hit their first scaling problem when their micro / small instance starts to choke under load.

There are two ways to deal with that:

Vertical Scaling (Scaling Up)

You upgrade the instance type from t3.micro to t3.small, t3.medium, and eventually to larger classes like m5.large.

This gives you more CPU, memory, and network throughput in the same machine without changing your app setup. It works well until you reach the upper limit of what one instance can handle.

Horizontal Scaling (Scaling Out)

You launch more instances of the same type, like multiple m5.large, and put them behind a load balancer.

Each instance handles a portion of the load. This helps scale linearly with demand, and is suited for workloads designed to run across many nodes. It needs distributed state handling and better traffic management.

Here are the four key cloud scaling patterns you’ll come across in production:

  1. Scheduled Scaling

  2. Target Tracking Scaling

  3. Step Scaling

  4. Predictive Scaling

1. Scheduled Scaling 

This pattern is used when the workload follows a fixed, known schedule. It’s common in enterprise environments where traffic is driven by office hours, or in systems that run batch jobs, reports, or data sync at fixed times.

Implementation is straightforward. In AWS, scheduled scaling can be configured on Auto Scaling Groups using put-scheduled-update-group-action via CLI or with Terraform using aws_autoscaling_schedule.

In real setups, you create two separate scheduled actions:

One for scaling out before the expected load spike.

aws autoscaling put-scheduled-update-group-action --auto-scaling-group-name report-runner-asg --scheduled-action-name morning-scale-out --start-time "2025-06-28T00:15:00Z" --desired-capacity 3

One for scaling in after the load drops.

aws autoscaling put-scheduled-update-group-action --auto-scaling-group-name report-runner-asg --scheduled-action-name evening-scale-in --start-time "2025-06-28T13:30:00Z" --desired-capacity 1

Things to Watch:

  • Scheduled actions run in UTC, so always convert from your local timezone.

  • Trigger the scale out at least 10 to 15 minutes early to account for instance boot time.

  • During the scheduled window, other scaling policies are ignored until it completes.

  • If the instance launch fails due to AZ capacity or config issues, there is no retry mechanism.

2. Target Tracking Scaling

This scaling pattern keeps a specific metric, like average CPU utilization, close to a target value.

The Auto Scaling Group continuously adjusts the desired instance count based on this metric. When the metric crosses the threshold, the group either adds or removes instances to bring it back to target.

To use this in production, typically you need:

  • An Auto Scaling Group with a working Launch Template.

  • A Load Balancer (ALB or NLB) in front of the group to distribute traffic evenly.

  • A CloudWatch metric, like ASGAverageCPUUtilization, which the scaling policy will monitor.

Example: maintaining CPU at 50 percent across all active EC2 instances.

Here’s how you configure it using AWS CLI:

aws autoscaling put-scaling-policy --auto-scaling-group-name app-asg --policy-name cpu-50-target-tracking --policy-type TargetTrackingScaling --target-tracking-configuration '{"TargetValue": 50.0, "PredefinedMetricSpecification": {"PredefinedMetricType": "ASGAverageCPUUtilization"}, "ScaleInCooldown": 120, "ScaleOutCooldown": 60}'

Things to Watch:

  • Keep a minimum of 2 instances to avoid full scale in during low traffic.

  • Cooldowns should match your average instance warm up time.

  • If CloudWatch stops publishing metrics due to config issues, scaling will stall.

  • Metrics like CPU are fine for compute heavy apps, but for web traffic, request count or ALB target response time may be better choices.

These two patterns are typical ones. Let’s move on to Step and Predictive Scaling patterns, which are a bit more complex and have seen increased adoption in real world setups.

I am giving away 50% OFF on all annual plans of membership offerings for a limited time.

A membership will unlock access to read these deep dive editions on Thursdays and Saturdays.

Get twice the value at half the price

Upgrade to Paid to read the rest.

Become a paying subscriber to get access to this post and other subscriber-only content.

Already a paying subscriber? Sign In.

Paid subscriptions get you:

  • • Access to archieve of 175+ use cases
  • • Deep Dive use case editions (Thursdays and Saturdays)
  • • Access to Private Discord Community
  • • Invitations to monthly Zoom calls for use case discussions and industry leaders meetups
  • • Quarterly 1:1 'Ask Me Anything' power session