Skip to main content

Platform Overview

DevOpsGenie is architected as a layered platform. Each layer builds on the one below, and each layer can be adopted incrementally without needing to buy into the entire stack at once.

Architecture Layers

┌────────────────────────────────────────────────────────────────┐
│ Developer Layer Self-service portal, service catalog, │
│ environment templates, Backstage plugin │
├────────────────────────────────────────────────────────────────┤
│ Delivery Layer ArgoCD GitOps, Argo Rollouts, Tekton, │
│ progressive delivery, policy gates │
├────────────────────────────────────────────────────────────────┤
│ Observability Layer Prometheus, Grafana, Loki, OTel, │
│ Alertmanager, PagerDuty integration │
├────────────────────────────────────────────────────────────────┤
│ Security Layer OPA/Gatekeeper, Falco, cert-manager, │
│ IRSA, Secrets Manager, mTLS │
├────────────────────────────────────────────────────────────────┤
│ Cluster Layer EKS, Karpenter, VPC-CNI, CoreDNS, │
│ ALB controller, Cluster Autoscaler │
├────────────────────────────────────────────────────────────────┤
│ Infrastructure Layer Terraform, AWS VPC, IAM, ECR, S3, │
│ Route53, ACM, CloudWatch Logs │
└────────────────────────────────────────────────────────────────┘

Control Plane vs Data Plane

DevOpsGenie separates control plane concerns (cluster lifecycle, policy management, GitOps sync) from data plane concerns (workload scheduling, networking, autoscaling). This separation is intentional:

  • Control plane components run in a dedicated management cluster (or management account) and have broad IAM permissions
  • Data plane components run in workload clusters and operate with least-privilege IRSA roles

Key Components

Cluster Management

ComponentPurpose
Terraform EKS ModuleProvisions EKS clusters, managed node groups, and Fargate profiles
KarpenterNode lifecycle and bin-packing autoscaling (preferred over Cluster Autoscaler)
AWS VPC-CNINative AWS networking with prefix delegation for pod density
CoreDNSService discovery and DNS caching, tuned for high-traffic environments
AWS Load Balancer ControllerManages ALB/NLB resources from Kubernetes Ingress and Service manifests

GitOps & Delivery

ComponentPurpose
ArgoCDDeclarative GitOps continuous delivery with multi-cluster support
Argo RolloutsBlue/green and canary deployments with automated analysis
Argo Image UpdaterAutomated container image promotion via Git commits
Tekton PipelinesKubernetes-native CI pipelines, optional (GitHub Actions also supported)

Observability

ComponentPurpose
kube-prometheus-stackPrometheus Operator, Alertmanager, node-exporter, kube-state-metrics
GrafanaDashboards — 12 pre-built, including EKS, service mesh, and cost views
Loki + PromtailLog aggregation and querying (Grafana-native, cost-efficient alternative to ELK)
OpenTelemetry CollectorTrace collection, routing, and export to Jaeger or Tempo

Security

ComponentPurpose
OPA GatekeeperPolicy enforcement using Open Policy Agent and ConstraintTemplates
FalcoRuntime threat detection for container workloads
cert-managerAutomated TLS certificate management via ACME/Let's Encrypt
External Secrets OperatorSync secrets from AWS Secrets Manager and SSM Parameter Store
KyvernoSupplemental policy engine for mutation and validation (optional)

Network Architecture

DevOpsGenie deploys into a VPC with three availability zones, each containing:

  • Public subnets — NAT Gateways, ALBs, and bastion hosts
  • Private subnets — EKS worker nodes and pods
  • Isolated subnets — RDS, ElastiCache, and other data-tier resources (optional)

Traffic flow for inbound requests:

Internet → Route53 → ACM-terminated ALB → Ingress Controller → Service → Pod

Inter-cluster traffic uses AWS Transit Gateway or VPC Peering depending on latency requirements.

Multi-Cluster Federation

For organizations running multiple clusters (by environment, region, or team), DevOpsGenie provides:

  • Hub-spoke ArgoCD — a management cluster runs ArgoCD and syncs to all spoke clusters
  • Unified RBAC — SSO (Okta, Azure AD) integrated across all clusters via Dex
  • Cross-cluster observability — federated Prometheus with Thanos for long-term storage and global querying
  • Centralized policy — OPA policies distributed from the management cluster to all spokes

Upgrade Strategy

Cluster upgrades follow a rolling blue/green node strategy:

  1. New node group provisioned with target K8s version
  2. Nodes cordoned and workloads drained from old group
  3. Old node group terminated after 100% migration
  4. Control plane upgraded last, during off-peak window
tip

Use devopsgenie cluster upgrade --dry-run to preview upgrade impact before execution.

Next Steps