Kubernetes & EKS Guide

This guide covers Kubernetes best practices and EKS-specific configuration patterns used throughout DevOpsGenie-managed environments.

Workload Best Practices

Resource Management

Every production workload must declare CPU and memory requests and limits. This enables Kubernetes to schedule pods efficiently and protects the cluster from noisy neighbors.

resources:
  requests:
    cpu: 250m       # guaranteed CPU slice
    memory: 256Mi   # guaranteed memory
  limits:
    cpu: 500m       # burstable ceiling
    memory: 512Mi   # OOM kill threshold — set this carefully

Memory limits and OOMKill

Memory limits trigger an OOMKill when exceeded. Set them conservatively high (2x request) and tune using actual metrics. Use VPA (Vertical Pod Autoscaler) in recommendation mode to calibrate requests over time.

Pod Disruption Budgets

Always define a PodDisruptionBudget for production workloads to prevent inadvertent complete takedown during node drains:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-service-pdb
  namespace: team-platform
spec:
  minAvailable: 2  # or use maxUnavailable: "25%"
  selector:
    matchLabels:
      app: my-service

Topology Spread Constraints

Distribute pods across availability zones to avoid AZ-level correlated failures:

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: my-service
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: my-service

RBAC Design

Recommended Role Hierarchy

DevOpsGenie enforces a three-tier RBAC model:

Role	Scope	Permissions
`platform-admin`	Cluster-wide	Full access
`team-lead`	Namespace	Read/write workloads, view secrets
`developer`	Namespace	Read workloads, exec into pods, port-forward
`ci-deployer`	Namespace	Create/update Deployments and ConfigMaps only
`readonly`	Namespace	Read-only across all resources

kubernetes/rbac/developer-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: developer
  namespace: team-payments
rules:
  - apiGroups: ["", "apps", "batch"]
    resources: ["pods", "deployments", "replicasets", "jobs", "configmaps", "services"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["pods/exec", "pods/portforward", "pods/log"]
    verbs: ["create", "get"]

Networking

Ingress Configuration

DevOpsGenie uses the AWS Load Balancer Controller to provision ALBs from Kubernetes Ingress resources:

kubernetes/ingress/my-service.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-service
  namespace: team-platform
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/ssl-redirect: "443"
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:ACCOUNT:certificate/CERT_ID
    alb.ingress.kubernetes.io/healthcheck-path: /healthz/ready
    alb.ingress.kubernetes.io/healthcheck-interval-seconds: "15"
    alb.ingress.kubernetes.io/load-balancer-attributes: |
      idle_timeout.timeout_seconds=60,
      routing.http2.enabled=true,
      deletion_protection.enabled=true
spec:
  rules:
    - host: my-service.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: my-service
                port:
                  number: 80

Network Policies

Enforce microsegmentation with Network Policies. DevOpsGenie ships with default deny-all policies per namespace:

kubernetes/policies/default-deny.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: team-payments
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

---
# Allow intra-namespace communication
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-same-namespace
  namespace: team-payments
spec:
  podSelector: {}
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector: {}

---
# Allow ingress from ALB controller
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-alb-ingress
  namespace: team-payments
spec:
  podSelector:
    matchLabels:
      app: payments-api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
        - ipBlock:
            cidr: 10.0.0.0/8

Cluster Upgrades

EKS cluster upgrades follow the platform's blue/green node strategy. Refer to the Deployment Guide for detailed steps.

Pre-Upgrade Checklist

Review AWS release notes for the target Kubernetes version
Test all workloads in staging cluster on the target version
Confirm all Helm chart versions are compatible
Verify ArgoCD, Karpenter, and AWS add-ons support the target version
Schedule maintenance window and notify stakeholders
Enable extended support if applicable
Confirm etcd backup is current

# Run pre-upgrade validation
devopsgenie cluster upgrade --target-version 1.30 --dry-run

Useful Commands

# Get all pods with their node placement
kubectl get pods -o wide --all-namespaces

# View resource utilization per node
kubectl top nodes

# View resource utilization per pod
kubectl top pods -n team-payments

# Debug pod scheduling failures
kubectl describe pod <pod-name> -n <namespace>
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# Check Karpenter logs
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=100 -f

Workload Best Practices​

Resource Management​

Pod Disruption Budgets​

Topology Spread Constraints​

RBAC Design​

Recommended Role Hierarchy​

Networking​

Ingress Configuration​

Network Policies​

Cluster Upgrades​

Pre-Upgrade Checklist​

Useful Commands​