Here's a frustrating scenario many developers face: your AWS EKS cluster scaled up during peak hours, traffic dropped, but your nodes are still running—and your AWS bill keeps climbing. You check the Cluster Autoscaler logs, everything looks normal, but those underutilized nodes just won't disappear.
The culprit? PodDisruptionBudgets (PDBs) are silently blocking your cluster from scaling down, and it's probably happening right now in your production environment.
The Hidden Scale-Down Killer
PodDisruptionBudgets were designed to protect application availability during voluntary disruptions—like node maintenance or scaling events. But here's what most developers don't realize: a single misconfigured PDB can trap entire nodes in your cluster indefinitely.
The Cluster Autoscaler is conservative by design. Before removing any node, it checks whether pods can be safely evicted according to their PDB rules. If even one PDB says "no evictions allowed," the autoscaler backs off and leaves the node running, even if it's using only 10% of its resources.
<> The most common mistake is settingmaxUnavailable: 0, which essentially tells Kubernetes "never evict any pods from this deployment." This blocks scale-down operations completely./>
What makes this particularly painful is that developers often apply PDBs to stateless services that don't actually need high availability guarantees. A typical web API that can handle brief interruptions doesn't need the same protection as a database—but the default "better safe than sorry" PDB configuration treats them identically.
Spotting the Problem in Your Cluster
The first step is visibility. Run this command to see all PDBs in your cluster:
1kubectl get pdb --all-namespaces -o wideLook for these red flags:
ALLOWED DISRUPTIONS: 0maxUnavailable: 0minAvailableequal to your total replica count
Any of these configurations can block node removal. I've seen production clusters where a single overly protective PDB kept dozens of nodes running unnecessarily, costing thousands in monthly AWS charges.
You can also check which PDBs are currently blocking evictions:
1kubectl describe nodes | grep -A 5 "Non-terminated Pods"This shows you which pods are "stuck" on nodes that should be candidates for removal.
The Fix: Percentage-Based PDBs That Scale
The solution isn't to remove PDBs entirely—it's to configure them intelligently. Instead of absolute values that become problematic as deployments scale, use percentage-based configurations:
1apiVersion: policy/v1
2kind: PodDisruptionBudget
3metadata:
4 name: web-api-pdb
5 namespace: production
6spec:
7 minAvailable: 75%
8 selector:
9 matchLabels:
10 app: web-apiThis configuration is much more flexible:
- With 4 replicas: allows 1 pod eviction (keeps 3 running)
- With 8 replicas: allows 2 pod evictions (keeps 6 running)
- Scales naturally as your deployment grows
For truly stateless services, you might even use minAvailable: 50% or configure based on your specific availability requirements:
1apiVersion: policy/v1
2kind: PodDisruptionBudget
3metadata:
4 name: stateless-service-pdb
5spec:
6 maxUnavailable: 25%
7 selector:
8 matchLabels:
9 app: stateless-service
10 tier: frontendBeyond PDBs: The Complete Picture
PDBs are often the primary culprit, but they're not the only factor in scaling issues. The Cluster Autoscaler also requires:
Proper resource requests on all pods. The autoscaler bases decisions on declared resource needs, not actual usage. A pod without resource requests is effectively invisible to scaling logic:
1spec:
2 containers:
3 - name: app
4 resources:
5 requests:
6 cpu: 100m
7 memory: 128Mi
8 limits:
9 cpu: 200m
10 memory: 256MiStrategic PDB placement. Don't apply blanket PDBs to every deployment. Ask yourself: "Does this service actually need protection from brief interruptions?" Batch jobs, development environments, and stateless APIs often don't.
Consideration for system pods. The autoscaler won't evict kube-system pods by default. If you have custom system workloads, configure appropriate PDBs that allow controlled eviction rather than blanket protection.
The Real-World Impact
I've worked with teams spending 40-60% more on their EKS infrastructure than necessary due to PDB-related scaling issues. One team discovered that a single PDB with maxUnavailable: 0 was keeping 12 nodes running overnight when their application load dropped to near zero.
But the cost isn't just financial. Developers waste hours troubleshooting "broken" autoscaling when the real issue is a five-line YAML configuration that made perfect sense when first written but became problematic as the system evolved.
<> The key insight: PDBs should reflect your actual availability requirements, not worst-case-scenario fears. Most applications can tolerate brief disruptions better than teams assume./>
Why This Matters Right Now
If you're running EKS in production, there's a good chance this is affecting you today. Check your PDBs, look for scale-down blockers, and audit whether your configurations match your actual availability needs.
Start with a simple audit:
1. List all PDBs in your cluster
2. Identify any with ALLOWED DISRUPTIONS: 0
3. Review whether those services truly need zero-disruption protection
4. Replace absolute values with percentage-based configurations
Your AWS bill—and your on-call rotation—will thank you.
