
The hidden cost killer in AWS container deployments isn't compute or storage - it's NAT Gateway data processing fees from pulling container images.
I've seen this pattern dozens of times: teams deploy ECS or EKS clusters in private subnets (correctly, for security), configure NAT Gateways for internet access (also correct), then get blindsided by $10K+ monthly bills where 80% comes from data processing charges. The culprit? Every container image pull from Amazon ECR routes through those NAT Gateways at $0.045 per GB.
The Real Cost of "Standard" Container Architecture
Let's break down what actually happens when your containers pull images:
1. Container starts in private subnet
2. Needs to pull image from ECR
3. Routes through NAT Gateway to reach ECR over internet
4. You pay $0.045/GB + $0.045/hour per NAT Gateway
For a typical multi-AZ EKS cluster pulling 40TB of images monthly (not uncommon with frequent deployments and large base images), you're looking at:
- 3 NAT Gateways (multi-AZ HA): $0.045/hour × 24 × 30 × 3 = $97/month
- Data processing: 40TB × $0.045/GB = $1,800/month
- Total NAT costs: ~$1,900/month just for ECR traffic
<> The irony is that ECR traffic never needed to touch the internet at all - it's AWS-to-AWS communication that's taking an expensive detour through your NAT Gateway./>
VPC Endpoints: The 80% Cost Reduction Solution
VPC Endpoints create a private connection directly to AWS services, bypassing NAT Gateways entirely. For ECR, you need three endpoints:
1# ECR API endpoint (for authentication, image manifests)
2com.amazonaws.us-east-1.ecr.api
3
4# ECR Docker Registry endpoint (for image layers)
5com.amazonaws.us-east-1.ecr.dkr
6
7# S3 Gateway endpoint (ECR stores layers in S3)
8com.amazonaws.us-east-1.s3The cost comparison is dramatic:
| Component | NAT Gateway | VPC Endpoints |
|---|---|---|
| **Hourly** | $0.045 (~$33/month/AZ) | $0.01 (~$7.44/month/AZ) |
| **Per GB** | $0.045 | $0.01 |
| **S3 Access** | $0.045/GB | **Free** (Gateway Endpoint) |
For our 40TB example:
- VPC Endpoint costs: ($7.44 × 2 interfaces) + (40TB × $0.01) = $415/month
- Savings: $1,900 - $415 = $1,485/month (78% reduction)
Implementation Strategy That Actually Works
The key insight most tutorials miss: you can't just create endpoints and expect traffic to magically route through them. Your route tables and security groups need explicit configuration.
Here's the Terraform configuration that sets up ECR VPC Endpoints properly:
1# ECR API Interface Endpoint
2resource "aws_vpc_endpoint" "ecr_api" {
3 vpc_id = var.vpc_id
4 service_name = "com.amazonaws.${var.region}.ecr.api"
5 vpc_endpoint_type = "Interface"
6 subnet_ids = var.private_subnet_ids
7 security_group_ids = [aws_security_group.vpc_endpoints.id]
8 The Migration Gotchas Nobody Talks About
DNS Resolution: Interface endpoints create new DNS names. Your applications should use the standard ECR URLs (they'll resolve to endpoint IPs automatically), but verify with:
1# Test from a pod in your private subnet
2kubectl run debug --image=amazon/aws-cli --rm -it -- sh
3
4# Inside the pod:
5nslookup 123456789.dkr.ecr.us-east-1.amazonaws.com
6# Should return private IPs (10.x.x.x), not public ones
7
8aws ecr get-login-password --region us-east-1
9# Should work without internet accessSecurity Group Rules: The most common failure point is restrictive security groups. Your endpoint security group needs:
- Inbound HTTPS (443) from your private subnets
- Outbound rules don't matter much (endpoints are managed)
Route Table Priority: If you have both NAT Gateway and VPC Endpoint routes, the more specific route wins. Gateway endpoints (S3) automatically update route tables, but verify the routes exist.
Why This Matters Beyond Cost
While the cost savings are compelling, the operational benefits often matter more:
- Improved Security: ECR traffic never leaves AWS's backbone network
- Better Performance: Direct connections typically have lower latency than internet routing
- Reduced Complexity: Fewer NAT Gateways mean fewer potential failure points
- Compliance: Many regulated industries require avoiding public internet for internal service communication
Start with an audit: Use AWS Cost Explorer to filter NAT Gateway costs by usage type. If you're processing more than 10TB monthly to AWS services, VPC Endpoints will likely save money immediately. For teams pulling container images frequently, this optimization alone can reduce infrastructure costs by 15-20% while improving security posture.
The best part? This is a one-time configuration change that pays dividends every month your containers are running.

