Kubernetes & EKS Guide
This guide covers Kubernetes best practices and EKS-specific configuration patterns used throughout DevOpsGenie-managed environments.
Workload Best Practices
Resource Management
Every production workload must declare CPU and memory requests and limits. This enables Kubernetes to schedule pods efficiently and protects the cluster from noisy neighbors.
resources:
requests:
cpu: 250m # guaranteed CPU slice
memory: 256Mi # guaranteed memory
limits:
cpu: 500m # burstable ceiling
memory: 512Mi # OOM kill threshold — set this carefully
Memory limits trigger an OOMKill when exceeded. Set them conservatively high (2x request) and tune using actual metrics. Use VPA (Vertical Pod Autoscaler) in recommendation mode to calibrate requests over time.
Pod Disruption Budgets
Always define a PodDisruptionBudget for production workloads to prevent inadvertent complete takedown during node drains:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-service-pdb
namespace: team-platform
spec:
minAvailable: 2 # or use maxUnavailable: "25%"
selector:
matchLabels:
app: my-service
Topology Spread Constraints
Distribute pods across availability zones to avoid AZ-level correlated failures:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: my-service
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: my-service
RBAC Design
Recommended Role Hierarchy
DevOpsGenie enforces a three-tier RBAC model:
| Role | Scope | Permissions |
|---|---|---|
platform-admin | Cluster-wide | Full access |
team-lead | Namespace | Read/write workloads, view secrets |
developer | Namespace | Read workloads, exec into pods, port-forward |
ci-deployer | Namespace | Create/update Deployments and ConfigMaps only |
readonly | Namespace | Read-only across all resources |
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: developer
namespace: team-payments
rules:
- apiGroups: ["", "apps", "batch"]
resources: ["pods", "deployments", "replicasets", "jobs", "configmaps", "services"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["pods/exec", "pods/portforward", "pods/log"]
verbs: ["create", "get"]
Networking
Ingress Configuration
DevOpsGenie uses the AWS Load Balancer Controller to provision ALBs from Kubernetes Ingress resources:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-service
namespace: team-platform
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/ssl-redirect: "443"
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:ACCOUNT:certificate/CERT_ID
alb.ingress.kubernetes.io/healthcheck-path: /healthz/ready
alb.ingress.kubernetes.io/healthcheck-interval-seconds: "15"
alb.ingress.kubernetes.io/load-balancer-attributes: |
idle_timeout.timeout_seconds=60,
routing.http2.enabled=true,
deletion_protection.enabled=true
spec:
rules:
- host: my-service.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-service
port:
number: 80
Network Policies
Enforce microsegmentation with Network Policies. DevOpsGenie ships with default deny-all policies per namespace:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: team-payments
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
---
# Allow intra-namespace communication
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-same-namespace
namespace: team-payments
spec:
podSelector: {}
policyTypes:
- Ingress
ingress:
- from:
- podSelector: {}
---
# Allow ingress from ALB controller
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-alb-ingress
namespace: team-payments
spec:
podSelector:
matchLabels:
app: payments-api
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
- ipBlock:
cidr: 10.0.0.0/8
Cluster Upgrades
EKS cluster upgrades follow the platform's blue/green node strategy. Refer to the Deployment Guide for detailed steps.
Pre-Upgrade Checklist
- Review AWS release notes for the target Kubernetes version
- Test all workloads in staging cluster on the target version
- Confirm all Helm chart versions are compatible
- Verify ArgoCD, Karpenter, and AWS add-ons support the target version
- Schedule maintenance window and notify stakeholders
- Enable extended support if applicable
- Confirm etcd backup is current
# Run pre-upgrade validation
devopsgenie cluster upgrade --target-version 1.30 --dry-run
Useful Commands
# Get all pods with their node placement
kubectl get pods -o wide --all-namespaces
# View resource utilization per node
kubectl top nodes
# View resource utilization per pod
kubectl top pods -n team-payments
# Debug pod scheduling failures
kubectl describe pod <pod-name> -n <namespace>
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
# Check Karpenter logs
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=100 -f