Kubernetes Monitoring Architecture Overview¶

This section provides a high-level overview of the monitoring and observability stack for Kubernetes. It highlights the core components, their responsibilities, and how they work together to deliver end-to-end monitoring, alerting, and visualization for clusters and workloads.

Design Philosophy¶

Our Kubernetes monitoring stack follows these principles:

Modular and Composable → Exporters, Prometheus, Grafana, and Alertmanager each serve a focused role.
Kubernetes-Native → Uses service discovery, ConfigMaps, and the Prometheus Operator for automation.
Automation-First → Deployed and managed via Helm charts, Operators, and GitOps pipelines.
Production-Ready → High-availability Prometheus, RBAC-enabled, and scalable with Thanos/Cortex.
Developer-Friendly → Can be run locally with Kind or Minikube for experimentation.

Core Components¶

1. Metrics Collection (Exporters & Kube-State-Metrics)¶

kubelet / cAdvisor → Collects container and pod-level resource usage (CPU, memory, disk, network).
kube-state-metrics → Exposes the state of Kubernetes objects (deployments, pods, nodes, jobs).
Node Exporter → Provides node-level OS metrics.
All expose metrics at /metrics in Prometheus format.

2. Monitoring & Storage (Prometheus / Prometheus Operator)¶

Prometheus scrapes metrics automatically via Kubernetes service discovery.
Stores data in a time-series database (TSDB).
Prometheus Operator manages Prometheus, Alertmanager, and scrape configurations declaratively with CRDs (ServiceMonitor, PodMonitor, PrometheusRule).
PromQL enables deep querying of cluster and application metrics.

3. Visualization (Grafana)¶

Grafana connects to Prometheus and provides pre-built dashboards for Kubernetes clusters, nodes, pods, and workloads.
Developers and SREs can explore, visualize, and share insights from cluster metrics.

4. Alerting (Alertmanager)¶

Alertmanager receives alerts from Prometheus.
Handles deduplication, grouping, and routing of alerts.
Sends notifications via Slack, Email, PagerDuty, or other integrations.
Rules are defined in PrometheusRule CRDs for Kubernetes-native configuration.

Kubernetes Monitoring Architecture Diagram¶

flowchart TD

    subgraph Node["Kubernetes Node"]
        subgraph Pods["Pods & Containers"]
            A1["App Pod A"]
            A2["App Pod B"]
        end
        C["kubelet / cAdvisor"]
        N["Node Exporter"]
    end

    KS["kube-state-metrics"]

    A1 --> C
    A2 --> C
    N --> P["Prometheus (via Operator)"]
    C --> P
    KS --> P

    P --> G["Grafana Dashboards"]
    P --> A["Alertmanager"]

    G --> U["User (SRE/DevOps)"]
    A --> U

Data / Control Flow¶

Pods and nodes run applications and workloads.
kubelet/cAdvisor collect container and pod-level metrics.
kube-state-metrics exposes cluster object states (deployments, pods, jobs, etc.).
Node Exporter collects host-level metrics (CPU, memory, disk).
Prometheus scrapes all metrics via Kubernetes service discovery.
TSDB stores the metrics for querying.
Grafana queries Prometheus and renders dashboards.
Prometheus Rules (via CRDs) define alert conditions.
Alertmanager routes alerts to notification channels.
Users (SRE/DevOps) observe dashboards and respond to incidents.