Platform Overview
DevOpsGenie is architected as a layered platform. Each layer builds on the one below, and each layer can be adopted incrementally without needing to buy into the entire stack at once.
Architecture Layers
┌────────────────────────────────────────────────────────────────┐
│ Developer Layer Self-service portal, service catalog, │
│ environment templates, Backstage plugin │
├────────────────────────────────────────────────────────────────┤
│ Delivery Layer ArgoCD GitOps, Argo Rollouts, Tekton, │
│ progressive delivery, policy gates │
├─────────── ─────────────────────────────────────────────────────┤
│ Observability Layer Prometheus, Grafana, Loki, OTel, │
│ Alertmanager, PagerDuty integration │
├────────────────────────────────────────────────────────────────┤
│ Security Layer OPA/Gatekeeper, Falco, cert-manager, │
│ IRSA, Secrets Manager, mTLS │
├────────────────────────────────────────────────────────────────┤
│ Cluster Layer EKS, Karpenter, VPC-CNI, CoreDNS, │
│ ALB controller, Cluster Autoscaler │
├────────────────────────────────────────────────────────────────┤
│ Infrastructure Layer Terraform, AWS VPC, IAM, ECR, S3, │
│ Route53, ACM, CloudWatch Logs │
└────────────────────────────────────────────────────────────────┘
Control Plane vs Data Plane
DevOpsGenie separates control plane concerns (cluster lifecycle, policy management, GitOps sync) from data plane concerns (workload scheduling, networking, autoscaling). This separation is intentional:
- Control plane components run in a dedicated management cluster (or management account) and have broad IAM permissions
- Data plane components run in workload clusters and operate with least-privilege IRSA roles
Key Components
Cluster Management
| Component | Purpose |
|---|---|
| Terraform EKS Module | Provisions EKS clusters, managed node groups, and Fargate profiles |
| Karpenter | Node lifecycle and bin-packing autoscaling (preferred over Cluster Autoscaler) |
| AWS VPC-CNI | Native AWS networking with prefix delegation for pod density |
| CoreDNS | Service discovery and DNS caching, tuned for high-traffic environments |
| AWS Load Balancer Controller | Manages ALB/NLB resources from Kubernetes Ingress and Service manifests |
GitOps & Delivery
| Component | Purpose |
|---|---|
| ArgoCD | Declarative GitOps continuous delivery with multi-cluster support |
| Argo Rollouts | Blue/green and canary deployments with automated analysis |
| Argo Image Updater | Automated container image promotion via Git commits |
| Tekton Pipelines | Kubernetes-native CI pipelines, optional (GitHub Actions also supported) |
Observability
| Component | Purpose |
|---|---|
| kube-prometheus-stack | Prometheus Operator, Alertmanager, node-exporter, kube-state-metrics |
| Grafana | Dashboards — 12 pre-built, including EKS, service mesh, and cost views |
| Loki + Promtail | Log aggregation and querying (Grafana-native, cost-efficient alternative to ELK) |
| OpenTelemetry Collector | Trace collection, routing, and export to Jaeger or Tempo |
Security
| Component | Purpose |
|---|---|
| OPA Gatekeeper | Policy enforcement using Open Policy Agent and ConstraintTemplates |
| Falco | Runtime threat detection for container workloads |
| cert-manager | Automated TLS certificate management via ACME/Let's Encrypt |
| External Secrets Operator | Sync secrets from AWS Secrets Manager and SSM Parameter Store |
| Kyverno | Supplemental policy engine for mutation and validation (optional) |
Network Architecture
DevOpsGenie deploys into a VPC with three availability zones, each containing:
- Public subnets — NAT Gateways, ALBs, and bastion hosts
- Private subnets — EKS worker nodes and pods
- Isolated subnets — RDS, ElastiCache, and other data-tier resources (optional)
Traffic flow for inbound requests:
Internet → Route53 → ACM-terminated ALB → Ingress Controller → Service → Pod
Inter-cluster traffic uses AWS Transit Gateway or VPC Peering depending on latency requirements.
Multi-Cluster Federation
For organizations running multiple clusters (by environment, region, or team), DevOpsGenie provides:
- Hub-spoke ArgoCD — a management cluster runs ArgoCD and syncs to all spoke clusters
- Unified RBAC — SSO (Okta, Azure AD) integrated across all clusters via Dex
- Cross-cluster observability — federated Prometheus with Thanos for long-term storage and global querying
- Centralized policy — OPA policies distributed from the management cluster to all spokes
Upgrade Strategy
Cluster upgrades follow a rolling blue/green node strategy:
- New node group provisioned with target K8s version
- Nodes cordoned and workloads drained from old group
- Old node group terminated after 100% migration
- Control plane upgraded last, during off-peak window
Use devopsgenie cluster upgrade --dry-run to preview upgrade impact before execution.
Next Steps
- Deployment Guide — production deployment patterns and strategies
- Kubernetes & EKS Guide — detailed EKS cluster configuration
- Security & Access Control — policy enforcement and IAM setup