Skip to main content

Kubernetes & EKS Guide

This guide covers Kubernetes best practices and EKS-specific configuration patterns used throughout DevOpsGenie-managed environments.

Workload Best Practices

Resource Management

Every production workload must declare CPU and memory requests and limits. This enables Kubernetes to schedule pods efficiently and protects the cluster from noisy neighbors.

resources:
requests:
cpu: 250m # guaranteed CPU slice
memory: 256Mi # guaranteed memory
limits:
cpu: 500m # burstable ceiling
memory: 512Mi # OOM kill threshold — set this carefully
Memory limits and OOMKill

Memory limits trigger an OOMKill when exceeded. Set them conservatively high (2x request) and tune using actual metrics. Use VPA (Vertical Pod Autoscaler) in recommendation mode to calibrate requests over time.

Pod Disruption Budgets

Always define a PodDisruptionBudget for production workloads to prevent inadvertent complete takedown during node drains:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-service-pdb
namespace: team-platform
spec:
minAvailable: 2 # or use maxUnavailable: "25%"
selector:
matchLabels:
app: my-service

Topology Spread Constraints

Distribute pods across availability zones to avoid AZ-level correlated failures:

topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: my-service
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: my-service

RBAC Design

DevOpsGenie enforces a three-tier RBAC model:

RoleScopePermissions
platform-adminCluster-wideFull access
team-leadNamespaceRead/write workloads, view secrets
developerNamespaceRead workloads, exec into pods, port-forward
ci-deployerNamespaceCreate/update Deployments and ConfigMaps only
readonlyNamespaceRead-only across all resources
kubernetes/rbac/developer-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: developer
namespace: team-payments
rules:
- apiGroups: ["", "apps", "batch"]
resources: ["pods", "deployments", "replicasets", "jobs", "configmaps", "services"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["pods/exec", "pods/portforward", "pods/log"]
verbs: ["create", "get"]

Networking

Ingress Configuration

DevOpsGenie uses the AWS Load Balancer Controller to provision ALBs from Kubernetes Ingress resources:

kubernetes/ingress/my-service.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-service
namespace: team-platform
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/ssl-redirect: "443"
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:ACCOUNT:certificate/CERT_ID
alb.ingress.kubernetes.io/healthcheck-path: /healthz/ready
alb.ingress.kubernetes.io/healthcheck-interval-seconds: "15"
alb.ingress.kubernetes.io/load-balancer-attributes: |
idle_timeout.timeout_seconds=60,
routing.http2.enabled=true,
deletion_protection.enabled=true
spec:
rules:
- host: my-service.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-service
port:
number: 80

Network Policies

Enforce microsegmentation with Network Policies. DevOpsGenie ships with default deny-all policies per namespace:

kubernetes/policies/default-deny.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: team-payments
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress

---
# Allow intra-namespace communication
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-same-namespace
namespace: team-payments
spec:
podSelector: {}
policyTypes:
- Ingress
ingress:
- from:
- podSelector: {}

---
# Allow ingress from ALB controller
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-alb-ingress
namespace: team-payments
spec:
podSelector:
matchLabels:
app: payments-api
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
- ipBlock:
cidr: 10.0.0.0/8

Cluster Upgrades

EKS cluster upgrades follow the platform's blue/green node strategy. Refer to the Deployment Guide for detailed steps.

Pre-Upgrade Checklist

  • Review AWS release notes for the target Kubernetes version
  • Test all workloads in staging cluster on the target version
  • Confirm all Helm chart versions are compatible
  • Verify ArgoCD, Karpenter, and AWS add-ons support the target version
  • Schedule maintenance window and notify stakeholders
  • Enable extended support if applicable
  • Confirm etcd backup is current
# Run pre-upgrade validation
devopsgenie cluster upgrade --target-version 1.30 --dry-run

Useful Commands

# Get all pods with their node placement
kubectl get pods -o wide --all-namespaces

# View resource utilization per node
kubectl top nodes

# View resource utilization per pod
kubectl top pods -n team-payments

# Debug pod scheduling failures
kubectl describe pod <pod-name> -n <namespace>
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# Check Karpenter logs
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=100 -f