Karpenter Node Autoscaling
Karpenter replaces Cluster Autoscaler for workload nodes. It provisions right-sized EC2 instances directly via the AWS API in under 60 seconds, supports bin-packing, handles Spot interruptions gracefully, and consolidates underutilized nodes automatically.
Karpenter vs Cluster Autoscaler
| Feature | Karpenter | Cluster Autoscaler |
|---|---|---|
| Provisioning speed | ~30–60s | 2–5 minutes |
| Instance selection | Any instance type, dynamic | Fixed ASG instance type |
| Bin-packing | Yes — right-sizes per pending pod | No — fixed instance type |
| Spot interruption handling | Native, graceful drain | Manual setup required |
| Node consolidation | Automatic (WhenUnderutilized) | Limited |
| Multi-architecture | Yes (arm64/amd64) | Via separate ASGs |
NodePool Configuration
kubernetes/karpenter/nodepool-default.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
metadata:
labels:
role: workloads
spec:
nodeClassRef:
name: default
requirements:
# Prefer On-Demand, fall back to Spot
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand", "spot"]
# Allow a broad set of instance families for flexibility
- key: node.kubernetes.io/instance-type
operator: In
values:
- m6i.xlarge
- m6i.2xlarge
- m6i.4xlarge
- m6a.xlarge
- m6a.2xlarge
- m6a.4xlarge
- m7i.xlarge
- m7i.2xlarge
- c6i.2xlarge
- c6i.4xlarge
- key: topology.kubernetes.io/zone
operator: In
values: ["us-east-1a", "us-east-1b", "us-east-1c"]
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
limits:
cpu: "200"
memory: 400Gi
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30s
# Rotate nodes every 30 days (security best practice)
expireAfter: 720h
EC2NodeClass Configuration
kubernetes/karpenter/nodeclass-default.yaml
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2
role: "KarpenterNodeRole-devopsgenie-production"
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: devopsgenie-production
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: devopsgenie-production
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
iops: 3000
throughput: 125
encrypted: true
deleteOnTermination: true
# User data runs after node initialization
userData: |
#!/bin/bash
/etc/eks/bootstrap.sh devopsgenie-production \
--container-runtime containerd \
--kubelet-extra-args '--max-pods=110'
tags:
Environment: production
ManagedBy: karpenter
Spot Interruption Handling
Karpenter handles Spot interruptions automatically using SQS + EventBridge:
terraform/environments/production/karpenter-spot.tf
resource "aws_sqs_queue" "karpenter_interruption" {
name = "karpenter-interruption-devopsgenie-production"
message_retention_seconds = 300
sqs_managed_sse_enabled = true
}
resource "aws_cloudwatch_event_rule" "spot_interruption" {
name = "karpenter-spot-interruption"
description = "Spot instance interruption notification"
event_pattern = jsonencode({
source = ["aws.ec2"]
detail-type = ["EC2 Spot Instance Interruption Warning"]
})
}
resource "aws_cloudwatch_event_target" "spot_interruption" {
rule = aws_cloudwatch_event_rule.spot_interruption.name
arn = aws_sqs_queue.karpenter_interruption.arn
}
Verifying Karpenter
# Watch Karpenter provision a node
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: inflate
spec:
replicas: 10
selector:
matchLabels:
app: inflate
template:
metadata:
labels:
app: inflate
spec:
containers:
- name: inflate
image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
resources:
requests:
cpu: "1"
memory: 1Gi
EOF
# Watch nodes being provisioned
kubectl get nodes --watch
# Check Karpenter logs
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f --tail=50
# Clean up
kubectl delete deployment inflate
Useful Karpenter Commands
# List NodePools
kubectl get nodepools
# List Karpenter-managed nodes
kubectl get nodes -l karpenter.sh/nodepool
# Manually trigger consolidation
kubectl annotate nodepool default karpenter.sh/do-not-consolidate-
# View node utilization
kubectl top nodes -l karpenter.sh/nodepool=default