Shreyans Sonthalia

Posted on Apr 15

AWS Burstable Instances Explained: CPU Credits, Throttling, and Why Your t3 Instance Isn't What You Think

#aws #cloud #performance #tutorial

You launched a t3a.medium with "2 vCPUs" but you're not getting 2 CPUs. Here's what you're actually paying for.

The Misconception

You go to the AWS console, launch a t3a.medium, and see this in the spec:

Spec	Value
vCPUs	2
Memory	4 GiB
Price	~$0.047/hr

Most engineers assume they're getting 2 full CPU cores, always available, for $0.047/hr. That's not what's happening.

What the "T" in T3 Means

AWS has several instance families:

Family	Type	CPU Model	Example
T3/T3a/T4g	Burstable	Shared, credit-based	t3a.medium
M5/M6i/M7i	General purpose	Dedicated	m5.large
C5/C6i/C7i	Compute optimized	Dedicated	c5.large

The "T" stands for burstable. When you buy a T-series instance, you're not buying dedicated CPU cores. You're buying a fraction of a CPU with the ability to temporarily use more.

A t3a.medium gives you 20% of each vCPU as a baseline — meaning you can continuously use 0.4 vCPUs (20% x 2). The other 80% is available on-demand, but only if you have CPU credits to spend.

Why is it cheaper?

This is the deal AWS offers: because most workloads don't use 100% CPU all the time, AWS can pack ~5 burstable instances onto the same physical hardware that would serve 1 dedicated instance. You get a discount; AWS gets better hardware utilization.

t3a.medium:  2 vCPUs (burstable, 20% baseline)  →  ~$0.047/hr
m5.large:    2 vCPUs (dedicated, 100% always)    →  ~$0.096/hr

The m5.large costs 2x more because those CPUs are reserved for you, always.

How CPU Credits Work

The credit system is how AWS meters your burst usage.

The Basic Math

1 CPU credit = 1 vCPU running at 100% for 1 minute

A t3a.medium:

Earns: 24 credits per hour (12 per vCPU x 2 vCPUs)
Baseline: 20% per vCPU (this is what 24 credits/hr translates to)
Maximum balance: 576 credits (can bank up to 24 hours worth)

A Real-World Example

Say you're running a Kubernetes node with a service that normally uses 0.01 CPU (1% of one core). That's well under the 0.4 baseline:

Earning:     24 credits/hour
Spending:    ~0.6 credits/hour  (0.01 CPU ≈ 0.5% utilization)
Net:         +23.4 credits/hour accumulating
Max balance: 576 credits

Your credit balance slowly fills up over 24 hours. Life is good.

Now imagine a traffic spike hits and the node needs full CPU:

Hour 1:  2.0 vCPUs used (100%)  → spends 120 credits
Hour 2:  2.0 vCPUs used (100%)  → spends 120 credits
Hour 3:  2.0 vCPUs used (100%)  → spends 120 credits
Hour 4:  2.0 vCPUs used (100%)  → spends 120 credits
...
After ~5 hours at full CPU:     → 576 credits exhausted

The Prepaid Data Plan Analogy

Think of CPU credits like a prepaid mobile data plan:

You get 1 GB/day at 4G speed (full CPU)
After 1 GB is used up, you're throttled to 2G speed (20% baseline)
You can still use the internet, but everything is painfully slow
Next day, your quota starts accumulating again

What Happens When Credits Hit Zero

This is where things get serious.

With credits:    2.0 vCPUs available at full speed
Without credits: 2.0 vCPUs CAPPED at 20% → effectively 0.4 vCPUs

The AWS hypervisor literally limits how many CPU cycles your instance can execute. Your instance still shows 2 vCPUs, but each one can only do 20% of the work.

Impact on Kubernetes

On a Kubernetes node throttled to 0.4 vCPUs, everything competes for scraps:

kubelet              → needs CPU for heartbeats every 10s
kube-proxy           → needs CPU for network rules
containerd           → container runtime
OS processes         → systemd, journald, etc.
Your application     → the thing you actually care about

If the kubelet can't send a heartbeat to the API server within 40 seconds (the default node-monitor-grace-period), the API server marks the node as NodeNotReady and starts evicting pods. Your application goes down — not because it was using too much CPU, but because the underlying node was throttled.

T3 Unlimited Mode

AWS offers a way out: T3 Unlimited mode.

# Check current mode
aws ec2 describe-instance-credit-specifications \
  --instance-ids <instance-id>

# Enable unlimited mode
aws ec2 modify-instance-credit-specification \
  --instance-credit-specification InstanceId=<id>,CpuCredits=unlimited

With Unlimited mode:

Your instance never gets throttled
When credits are exhausted, you keep bursting at full speed
You pay a small surcharge for "surplus credits" (~$0.05 per vCPU-hour on t3a)

When Unlimited Mode Costs Extra

You only pay extra when:

Your earned credits are exhausted, AND
You're using more than baseline (20%)

If your average usage is below 20%, Unlimited mode costs nothing extra — you earn enough credits to cover the occasional burst.

Average 10% usage:  Free — credits cover all bursts
Average 20% usage:  Free — exactly at baseline
Average 50% usage:  Extra cost — 30% surplus x $0.05/vCPU-hr
Average 100% usage: Expensive — just use a dedicated instance

Credit Balance: How to Check and What to Look For

Via CloudWatch

aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUCreditBalance \
  --dimensions Name=InstanceId,Value=<instance-id> \
  --start-time $(date -u -d '6 hours ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 --statistics Average

Key Metrics to Monitor

Metric	What It Means	Alert When
`CPUCreditBalance`	Earned credits remaining	Drops below 50
`CPUSurplusCreditBalance`	Surplus credits used (Unlimited mode)	Consistently above 0
`CPUSurplusCreditsCharged`	Surplus credits you're paying for	Unexpected charges
`CPUCreditUsage`	Credits spent in the period	Sustained high usage

Reading the Credit Balance

576 credits  → Full (24 hours of baseline earned)
200 credits  → Healthy — some bursting happening
50 credits   → Warning — approaching exhaustion
0 credits    → Standard mode: THROTTLED / Unlimited mode: paying surplus

Instance Comparison: When to Use What

Scenario	Recommended	Why
Dev/staging environments	t3a.medium	Low baseline usage, cost-effective
Kubernetes worker nodes (production)	m5.large or m6i.large	Predictable performance, no throttling risk
CI/CD build agents	t3a.xlarge with Unlimited	Burst during builds, idle otherwise
Databases	m5/r5 series	Never throttle a database
Batch processing	c5/c6i series	Sustained compute needs dedicated CPU
Single dedicated-node workloads	m5.medium over t3a.medium	Same vCPU count, guaranteed performance, ~10% more cost

The Hidden Cost of Burstable

A t3a.medium at $0.047/hr seems cheaper than an m5.large at $0.096/hr. But consider:

When a t3a node gets throttled and your pod gets evicted, what's the cost of that downtime?
When you spend 3 hours debugging why a pod keeps dying, what's the engineering cost?
If you enable Unlimited and burst frequently, the surplus charges can approach dedicated instance pricing anyway

For production Kubernetes nodes, the small extra cost of dedicated instances often pays for itself in reliability and reduced debugging time.

Quick Reference: T3/T3a Instance Family

Instance	vCPUs	RAM	Baseline/vCPU	Credits/hr	Max Balance	Price/hr (Mumbai)
t3a.micro	2	1 GiB	10%	12	288	~$0.012
t3a.small	2	2 GiB	20%	24	576	~$0.024
t3a.medium	2	4 GiB	20%	24	576	~$0.047
t3a.large	2	8 GiB	30%	36	864	~$0.075
t3a.xlarge	4	16 GiB	40%	96	2304	~$0.150

Note: Baseline percentages are per vCPU. A t3a.medium with 20% baseline on 2 vCPUs gives you 0.4 vCPUs of sustained compute.

Key Takeaways

T-series instances are not dedicated compute. The "2 vCPUs" you see is the burst ceiling, not the sustained capacity. Your sustained capacity is the baseline percentage.
CPU credit exhaustion causes throttling, not failure. Your instance doesn't stop — it slows down. This is often worse than a crash because it causes cascading timeouts and hard-to-diagnose performance issues.
Enable Unlimited mode on all production T-series instances. There's no reason to risk throttling in production. The surplus cost is minimal for occasional bursts.
If you consistently need more than baseline, switch to a dedicated instance. T-series instances are designed for workloads that are mostly idle with occasional spikes — not for sustained high CPU usage.
Monitor CPUCreditBalance in CloudWatch. Set up alerts before credits hit zero so you can react proactively.

This post is part of a series on debugging Kubernetes pod terminations. Read the full incident story: Why Your Kubernetes Pod Keeps Getting Killed — And It's Not an OOMKill

DEV Community