Platform Overview

DevOpsGenie is architected as a layered platform. Each layer builds on the one below, and each layer can be adopted incrementally without needing to buy into the entire stack at once.

Architecture Layers

┌────────────────────────────────────────────────────────────────┐
│  Developer Layer       Self-service portal, service catalog,   │
│                        environment templates, Backstage plugin  │
├────────────────────────────────────────────────────────────────┤
│  Delivery Layer        ArgoCD GitOps, Argo Rollouts, Tekton,   │
│                        progressive delivery, policy gates       │
├────────────────────────────────────────────────────────────────┤
│  Observability Layer   Prometheus, Grafana, Loki, OTel,        │
│                        Alertmanager, PagerDuty integration      │
├────────────────────────────────────────────────────────────────┤
│  Security Layer        OPA/Gatekeeper, Falco, cert-manager,    │
│                        IRSA, Secrets Manager, mTLS             │
├────────────────────────────────────────────────────────────────┤
│  Cluster Layer         EKS, Karpenter, VPC-CNI, CoreDNS,       │
│                        ALB controller, Cluster Autoscaler       │
├────────────────────────────────────────────────────────────────┤
│  Infrastructure Layer  Terraform, AWS VPC, IAM, ECR, S3,       │
│                        Route53, ACM, CloudWatch Logs            │
└────────────────────────────────────────────────────────────────┘

Control Plane vs Data Plane

DevOpsGenie separates control plane concerns (cluster lifecycle, policy management, GitOps sync) from data plane concerns (workload scheduling, networking, autoscaling). This separation is intentional:

Control plane components run in a dedicated management cluster (or management account) and have broad IAM permissions
Data plane components run in workload clusters and operate with least-privilege IRSA roles

Key Components

Cluster Management

Component	Purpose
Terraform EKS Module	Provisions EKS clusters, managed node groups, and Fargate profiles
Karpenter	Node lifecycle and bin-packing autoscaling (preferred over Cluster Autoscaler)
AWS VPC-CNI	Native AWS networking with prefix delegation for pod density
CoreDNS	Service discovery and DNS caching, tuned for high-traffic environments
AWS Load Balancer Controller	Manages ALB/NLB resources from Kubernetes Ingress and Service manifests

GitOps & Delivery

Component	Purpose
ArgoCD	Declarative GitOps continuous delivery with multi-cluster support
Argo Rollouts	Blue/green and canary deployments with automated analysis
Argo Image Updater	Automated container image promotion via Git commits
Tekton Pipelines	Kubernetes-native CI pipelines, optional (GitHub Actions also supported)

Observability

Component	Purpose
kube-prometheus-stack	Prometheus Operator, Alertmanager, node-exporter, kube-state-metrics
Grafana	Dashboards — 12 pre-built, including EKS, service mesh, and cost views
Loki + Promtail	Log aggregation and querying (Grafana-native, cost-efficient alternative to ELK)
OpenTelemetry Collector	Trace collection, routing, and export to Jaeger or Tempo

Security

Component	Purpose
OPA Gatekeeper	Policy enforcement using Open Policy Agent and ConstraintTemplates
Falco	Runtime threat detection for container workloads
cert-manager	Automated TLS certificate management via ACME/Let's Encrypt
External Secrets Operator	Sync secrets from AWS Secrets Manager and SSM Parameter Store
Kyverno	Supplemental policy engine for mutation and validation (optional)

Network Architecture

DevOpsGenie deploys into a VPC with three availability zones, each containing:

Public subnets — NAT Gateways, ALBs, and bastion hosts
Private subnets — EKS worker nodes and pods
Isolated subnets — RDS, ElastiCache, and other data-tier resources (optional)

Traffic flow for inbound requests:

Internet → Route53 → ACM-terminated ALB → Ingress Controller → Service → Pod

Inter-cluster traffic uses AWS Transit Gateway or VPC Peering depending on latency requirements.

Multi-Cluster Federation

For organizations running multiple clusters (by environment, region, or team), DevOpsGenie provides:

Hub-spoke ArgoCD — a management cluster runs ArgoCD and syncs to all spoke clusters
Unified RBAC — SSO (Okta, Azure AD) integrated across all clusters via Dex
Cross-cluster observability — federated Prometheus with Thanos for long-term storage and global querying
Centralized policy — OPA policies distributed from the management cluster to all spokes

Upgrade Strategy

Cluster upgrades follow a rolling blue/green node strategy:

New node group provisioned with target K8s version
Nodes cordoned and workloads drained from old group
Old node group terminated after 100% migration
Control plane upgraded last, during off-peak window

tip

Use devopsgenie cluster upgrade --dry-run to preview upgrade impact before execution.

Next Steps

Deployment Guide — production deployment patterns and strategies
Kubernetes & EKS Guide — detailed EKS cluster configuration
Security & Access Control — policy enforcement and IAM setup

Architecture Layers​

Control Plane vs Data Plane​

Key Components​

Cluster Management​

GitOps & Delivery​

Observability​

Security​

Network Architecture​

Multi-Cluster Federation​

Upgrade Strategy​

Next Steps​