Skip to content

Infrastructure

The infrastructure layer provisions the AKS foundation: cluster, node placement, networking, and baseline resource conventions. Privileged Azure authorization is split into a separate state (infra-access/) so contributors can operate infrastructure without elevated permissions (see ADR-0020).

AKS Automatic

AKS Automatic provides a managed Kubernetes experience with built-in best practices (see ADR-0002):

  • Simplified operations: Auto-scaling, auto-upgrade, auto-repair
  • Built-in best practices: Network policy, pod security, cost optimization
  • Integrated Istio: Managed service mesh without manual installation
  • Deployment Safeguards: Gatekeeper policies for compliance

These capabilities reduce the platform-assembly burden and give contributors a reproducible, secure baseline without manual cluster configuration.

Zone Topology

Zone Topology

The cluster spreads workloads across Azure availability zones for high availability:

  • System pool pins to configurable zones (default: zones 1 and 3)
  • Platform pool (Karpenter) dynamically selects from available zones
  • Stateful middleware distributes replicas across AZs for resilience
  • OSDU services remain schedulable across surviving zones during an AZ loss

Node Pools

System components are pinned to reserved nodes, while platform and OSDU workloads run on auto-provisioned capacity that scales with demand.

Reserved nodes for critical cluster components.

Property Value
VM Size Standard_D4lds_v5 (configurable)
Count 2
Taints CriticalAddonsOnly
Managed By AKS (VMSS)
Zones Configurable (default: 1, 2, 3)

Middleware and OSDU services on auto-provisioned nodes.

Property Value
VM Size D-series (4-8 vCPU), auto-selected
Count Auto-scaled
Taints workload=platform:NoSchedule
Managed By NAP (Karpenter)
Zones Dynamically selected

General workloads like MinIO and Airflow task pods.

Property Value
VM Size Auto-provisioned
Count Auto-scaled
Taints None
Managed By NAP (Karpenter)

Why Karpenter (NAP) for platform workloads?

Traditional node pools can fail when a specific VM size isn't available in a zone. Karpenter avoids this by selecting from multiple VM sizes per zone automatically, scaling nodes on demand and consolidating when idle (see ADR-0006).

Network Architecture

Network Plugin:      Azure CNI Overlay
Network Dataplane:   Cilium
Outbound Type:       Managed NAT Gateway
Service CIDR:        10.0.0.0/16
DNS Service IP:      10.0.0.10
  • Azure CNI Overlay: Pod IPs in overlay network, no VNet subnet exhaustion
  • Cilium: eBPF-based network policy enforcement
  • Managed NAT Gateway: Outbound traffic via dedicated NAT
  • Istio mTLS: STRICT mode for east-west traffic (see Security)

Resource Naming and Tagging

Consistent naming and tags support ownership tracking, cost allocation, and multi-environment operations.

All resources follow the pattern: <prefix>-<project>-<environment>

Resource Pattern Example
Resource Group rg-cimpl-{env} rg-cimpl-dev
AKS Cluster cimpl-{env} cimpl-dev
Namespaces platform, osdu platform-blue, osdu-blue

All Azure resources include:

Tag Value
azd-env-name Environment name
project cimpl
Contact Owner email address

See also

  • Deployment Model — how infrastructure fits into the layered architecture
  • Platform Services — middleware deployed on top of this infrastructure
  • Security — Azure RBAC, workload identity, and network security details