close

DEV Community

Shreyans Sonthalia
Shreyans Sonthalia

Posted on

AWS Burstable Instances Explained: CPU Credits, Throttling, and Why Your t3 Instance Isn't What You Think

You launched a t3a.medium with "2 vCPUs" but you're not getting 2 CPUs. Here's what you're actually paying for.


The Misconception

You go to the AWS console, launch a t3a.medium, and see this in the spec:

Spec Value
vCPUs 2
Memory 4 GiB
Price ~$0.047/hr

Most engineers assume they're getting 2 full CPU cores, always available, for $0.047/hr. That's not what's happening.


What the "T" in T3 Means

AWS has several instance families:

Family Type CPU Model Example
T3/T3a/T4g Burstable Shared, credit-based t3a.medium
M5/M6i/M7i General purpose Dedicated m5.large
C5/C6i/C7i Compute optimized Dedicated c5.large

The "T" stands for burstable. When you buy a T-series instance, you're not buying dedicated CPU cores. You're buying a fraction of a CPU with the ability to temporarily use more.

A t3a.medium gives you 20% of each vCPU as a baseline — meaning you can continuously use 0.4 vCPUs (20% x 2). The other 80% is available on-demand, but only if you have CPU credits to spend.

Why is it cheaper?

This is the deal AWS offers: because most workloads don't use 100% CPU all the time, AWS can pack ~5 burstable instances onto the same physical hardware that would serve 1 dedicated instance. You get a discount; AWS gets better hardware utilization.

t3a.medium:  2 vCPUs (burstable, 20% baseline)  →  ~$0.047/hr
m5.large:    2 vCPUs (dedicated, 100% always)    →  ~$0.096/hr
Enter fullscreen mode Exit fullscreen mode

The m5.large costs 2x more because those CPUs are reserved for you, always.


How CPU Credits Work

The credit system is how AWS meters your burst usage.

The Basic Math

1 CPU credit = 1 vCPU running at 100% for 1 minute

A t3a.medium:

  • Earns: 24 credits per hour (12 per vCPU x 2 vCPUs)
  • Baseline: 20% per vCPU (this is what 24 credits/hr translates to)
  • Maximum balance: 576 credits (can bank up to 24 hours worth)

A Real-World Example

Say you're running a Kubernetes node with a service that normally uses 0.01 CPU (1% of one core). That's well under the 0.4 baseline:

Earning:     24 credits/hour
Spending:    ~0.6 credits/hour  (0.01 CPU ≈ 0.5% utilization)
Net:         +23.4 credits/hour accumulating
Max balance: 576 credits
Enter fullscreen mode Exit fullscreen mode

Your credit balance slowly fills up over 24 hours. Life is good.

Now imagine a traffic spike hits and the node needs full CPU:

Hour 1:  2.0 vCPUs used (100%)  → spends 120 credits
Hour 2:  2.0 vCPUs used (100%)  → spends 120 credits
Hour 3:  2.0 vCPUs used (100%)  → spends 120 credits
Hour 4:  2.0 vCPUs used (100%)  → spends 120 credits
...
After ~5 hours at full CPU:     → 576 credits exhausted
Enter fullscreen mode Exit fullscreen mode

The Prepaid Data Plan Analogy

Think of CPU credits like a prepaid mobile data plan:

  • You get 1 GB/day at 4G speed (full CPU)
  • After 1 GB is used up, you're throttled to 2G speed (20% baseline)
  • You can still use the internet, but everything is painfully slow
  • Next day, your quota starts accumulating again

What Happens When Credits Hit Zero

This is where things get serious.

With credits:    2.0 vCPUs available at full speed
Without credits: 2.0 vCPUs CAPPED at 20% → effectively 0.4 vCPUs
Enter fullscreen mode Exit fullscreen mode

The AWS hypervisor literally limits how many CPU cycles your instance can execute. Your instance still shows 2 vCPUs, but each one can only do 20% of the work.

Impact on Kubernetes

On a Kubernetes node throttled to 0.4 vCPUs, everything competes for scraps:

kubelet              → needs CPU for heartbeats every 10s
kube-proxy           → needs CPU for network rules
containerd           → container runtime
OS processes         → systemd, journald, etc.
Your application     → the thing you actually care about
Enter fullscreen mode Exit fullscreen mode

If the kubelet can't send a heartbeat to the API server within 40 seconds (the default node-monitor-grace-period), the API server marks the node as NodeNotReady and starts evicting pods. Your application goes down — not because it was using too much CPU, but because the underlying node was throttled.


T3 Unlimited Mode

AWS offers a way out: T3 Unlimited mode.

# Check current mode
aws ec2 describe-instance-credit-specifications \
  --instance-ids <instance-id>

# Enable unlimited mode
aws ec2 modify-instance-credit-specification \
  --instance-credit-specification InstanceId=<id>,CpuCredits=unlimited
Enter fullscreen mode Exit fullscreen mode

With Unlimited mode:

  • Your instance never gets throttled
  • When credits are exhausted, you keep bursting at full speed
  • You pay a small surcharge for "surplus credits" (~$0.05 per vCPU-hour on t3a)

When Unlimited Mode Costs Extra

You only pay extra when:

  1. Your earned credits are exhausted, AND
  2. You're using more than baseline (20%)

If your average usage is below 20%, Unlimited mode costs nothing extra — you earn enough credits to cover the occasional burst.

Average 10% usage:  Free — credits cover all bursts
Average 20% usage:  Free — exactly at baseline
Average 50% usage:  Extra cost — 30% surplus x $0.05/vCPU-hr
Average 100% usage: Expensive — just use a dedicated instance
Enter fullscreen mode Exit fullscreen mode

Credit Balance: How to Check and What to Look For

Via CloudWatch

aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUCreditBalance \
  --dimensions Name=InstanceId,Value=<instance-id> \
  --start-time $(date -u -d '6 hours ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 --statistics Average
Enter fullscreen mode Exit fullscreen mode

Key Metrics to Monitor

Metric What It Means Alert When
CPUCreditBalance Earned credits remaining Drops below 50
CPUSurplusCreditBalance Surplus credits used (Unlimited mode) Consistently above 0
CPUSurplusCreditsCharged Surplus credits you're paying for Unexpected charges
CPUCreditUsage Credits spent in the period Sustained high usage

Reading the Credit Balance

576 credits  → Full (24 hours of baseline earned)
200 credits  → Healthy — some bursting happening
50 credits   → Warning — approaching exhaustion
0 credits    → Standard mode: THROTTLED / Unlimited mode: paying surplus
Enter fullscreen mode Exit fullscreen mode

Instance Comparison: When to Use What

Scenario Recommended Why
Dev/staging environments t3a.medium Low baseline usage, cost-effective
Kubernetes worker nodes (production) m5.large or m6i.large Predictable performance, no throttling risk
CI/CD build agents t3a.xlarge with Unlimited Burst during builds, idle otherwise
Databases m5/r5 series Never throttle a database
Batch processing c5/c6i series Sustained compute needs dedicated CPU
Single dedicated-node workloads m5.medium over t3a.medium Same vCPU count, guaranteed performance, ~10% more cost

The Hidden Cost of Burstable

A t3a.medium at $0.047/hr seems cheaper than an m5.large at $0.096/hr. But consider:

  • When a t3a node gets throttled and your pod gets evicted, what's the cost of that downtime?
  • When you spend 3 hours debugging why a pod keeps dying, what's the engineering cost?
  • If you enable Unlimited and burst frequently, the surplus charges can approach dedicated instance pricing anyway

For production Kubernetes nodes, the small extra cost of dedicated instances often pays for itself in reliability and reduced debugging time.


Quick Reference: T3/T3a Instance Family

Instance vCPUs RAM Baseline/vCPU Credits/hr Max Balance Price/hr (Mumbai)
t3a.micro 2 1 GiB 10% 12 288 ~$0.012
t3a.small 2 2 GiB 20% 24 576 ~$0.024
t3a.medium 2 4 GiB 20% 24 576 ~$0.047
t3a.large 2 8 GiB 30% 36 864 ~$0.075
t3a.xlarge 4 16 GiB 40% 96 2304 ~$0.150

Note: Baseline percentages are per vCPU. A t3a.medium with 20% baseline on 2 vCPUs gives you 0.4 vCPUs of sustained compute.


Key Takeaways

  1. T-series instances are not dedicated compute. The "2 vCPUs" you see is the burst ceiling, not the sustained capacity. Your sustained capacity is the baseline percentage.

  2. CPU credit exhaustion causes throttling, not failure. Your instance doesn't stop — it slows down. This is often worse than a crash because it causes cascading timeouts and hard-to-diagnose performance issues.

  3. Enable Unlimited mode on all production T-series instances. There's no reason to risk throttling in production. The surplus cost is minimal for occasional bursts.

  4. If you consistently need more than baseline, switch to a dedicated instance. T-series instances are designed for workloads that are mostly idle with occasional spikes — not for sustained high CPU usage.

  5. Monitor CPUCreditBalance in CloudWatch. Set up alerts before credits hit zero so you can react proactively.


This post is part of a series on debugging Kubernetes pod terminations. Read the full incident story: Why Your Kubernetes Pod Keeps Getting Killed — And It's Not an OOMKill

Top comments (0)