Skip to content

🚨 What is Alertmanager?

Alertmanager is the alerting component of the Prometheus ecosystem. It is responsible for handling alerts generated by Prometheus servers and sending notifications to external systems.

It provides:

  • Routing β†’ decide where alerts go (Slack, Email, PagerDuty, etc.)
  • Grouping β†’ combine related alerts into a single notification
  • Silencing β†’ temporarily mute alerts during maintenance
  • Deduplication β†’ avoid spamming users with repeated alerts

πŸ‘‰ Prometheus detects the problem, but Alertmanager tells humans (or systems) about it.


🧐 Why Do We Need Alertmanager?

Without Alertmanager:

  • Prometheus can trigger alerts, but it doesn’t know how to notify people.
  • Each alert would generate raw, unorganized messages.

Challenges Alertmanager solves:

  • Too many alerts β†’ group and deduplicate.
  • Wrong people notified β†’ route to the right team.
  • Alert fatigue β†’ silence during maintenance.

πŸ‘‰ It’s the traffic controller for alerts.


πŸ”§ How Alertmanager Works

  1. Prometheus evaluates alert rules (.rules or .yml files).
  2. If a rule fires, Prometheus sends an alert to Alertmanager via HTTP.
  3. Alertmanager:

    • Groups related alerts.
    • Applies routing rules (e.g., critical β†’ PagerDuty, warnings β†’ Slack).
    • Sends notifications.
  4. Users acknowledge alerts, silence them if needed, or take action.


πŸ”— Architecture Overview

+-------------------+
| Prometheus Server |  -->  Fires alerts
+---------+---------+
          |
          v
+---------+---------+
| Alertmanager      |
| - Grouping        |
| - Routing         |
| - Silencing       |
| - Deduplication   |
+---------+---------+
   |   |   |   |
   v   v   v   v
 Email Slack PagerDuty Webhook

πŸ”„ Alert Flow: From Prometheus β†’ Alertmanager β†’ User

sequenceDiagram
    participant Prom as Prometheus
    participant AM as Alertmanager
    participant User as User (SRE/DevOps)

    Prom->>AM: Send alert (HTTP POST)
    AM->>AM: Group, Deduplicate, Silence
    AM->>User: Send notification (Slack/Email/PagerDuty)
    User->>AM: Silence/Ack alert (optional)

πŸ“œ Example Alert Rule in Prometheus

groups:
  - name: node.rules
    rules:
      - alert: HighCPUUsage
        expr: rate(node_cpu_seconds_total{mode="user"}[1m]) > 0.9
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage > 90% for more than 2 minutes."

πŸ‘‰ When this condition is true, Prometheus sends an alert to Alertmanager.


βš™οΈ Alertmanager Configuration

Alertmanager is configured using a YAML file (alertmanager.yml).

Example Config

global:
  resolve_timeout: 5m

route:
  receiver: 'slack-notifications'
  group_by: ['alertname', 'cluster']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 3h

receivers:
  - name: 'slack-notifications'
    slack_configs:
      - channel: '#alerts'
        send_resolved: true
        text: "πŸ”₯ Alert: {{ .CommonAnnotations.summary }}"

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'cluster']

πŸ”Ž Explanation of Key Fields

  • global β†’ default settings (timeouts, SMTP server, Slack API URL).
  • route β†’ defines alert routing rules.

  • group_by β†’ group alerts by label.

  • group_wait β†’ wait before sending to group alerts.
  • repeat_interval β†’ resend alert if still firing.
  • receivers β†’ list of destinations (Slack, email, PagerDuty).
  • inhibit_rules β†’ suppress lower-priority alerts if a higher one is firing.

πŸ”” Notification Integrations

Alertmanager supports many integrations out of the box:

  • πŸ“§ Email
  • πŸ’¬ Slack, Microsoft Teams, Discord
  • πŸ“± PagerDuty, OpsGenie, VictorOps
  • ☁️ Webhook receivers β†’ integrate with any custom system
  • πŸ”Œ Custom receivers via webhooks

πŸ› οΈ Features of Alertmanager

βœ… Grouping

  • Combine alerts into a single message.
  • Example: instead of 100 pod alerts, one grouped β€œPodCrashLoopBackOff” alert.

βœ… Routing

  • Send alerts to different teams.
  • Example: Database alerts β†’ DBA team, Node alerts β†’ Infra team.

βœ… Deduplication

  • If an alert is firing repeatedly, only send once until it’s resolved.

βœ… Silences

  • Mute alerts temporarily (e.g., during maintenance).
  • Configured via API/UI/CLI.

βœ… Inhibition

  • Suppress less severe alerts when a higher severity alert is active.
  • Example: Hide β€œdisk usage warning” if β€œdisk full critical” is active.

πŸ–₯️ Alertmanager UI

Alertmanager provides a simple web UI (default port :9093) where you can:

  • View active alerts
  • Add silences
  • Manage alert history
  • Debug routing

πŸ›‘οΈ Security Best Practices

  • ❌ Don’t expose Alertmanager directly to the internet.
  • βœ… Put it behind a reverse proxy (Nginx/Traefik).
  • βœ… Use authentication if exposed.
  • βœ… Secure communication between Prometheus and Alertmanager with TLS.

πŸ” Key Strengths of Alertmanager

  • Deep Prometheus integration β†’ native in the ecosystem.
  • Powerful routing β†’ fine-grained alert delivery.
  • Extensible β†’ webhooks for custom workflows.
  • Silences & inhibition β†’ reduce noise & alert fatigue.
  • Open-source & widely adopted β†’ large community.

⚠️ Limitations & Watch Outs

  • ❌ Limited UI (mostly config-driven).
  • ❌ No built-in escalation policies (PagerDuty is better for escalation chains).
  • ❌ Single binary β†’ HA requires running multiple instances with a gossip protocol.
  • ❌ Alert storming still possible if rules aren’t well-tuned.

πŸ“¦ Alertmanager in the Observability Stack

flowchart TD

    subgraph Metrics
        P[Prometheus]
    end

    subgraph Alerting
        A[Alertmanager]
    end

    subgraph Notifications
        E[Email]
        S[Slack]
        PD[PagerDuty]
        W[Webhook]
    end

    P --> A
    A --> E
    A --> S
    A --> PD
    A --> W

πŸ‘‰ Prometheus detects, Alertmanager notifies.


🧾 Alertmanager Cheat Sheet

βœ… Core Concepts

Term Meaning
Alert Condition defined in Prometheus that triggers
Receiver Where alerts are sent (Slack, Email, etc.)
Route Rules that decide which receiver gets the alert
Silence Temporary mute for alerts
Inhibition Suppression of lower alerts when higher ones fire
Grouping Bundling multiple alerts into one notification

πŸ“œ Example Silence Command

amtool silence add alertname=HighCPUUsage --duration=2h --comment="Maintenance window"

πŸ“Š Example Routing Rule

route:
  receiver: 'team-A'
  routes:
    - match:
        team: 'database'
      receiver: 'dba-team'
    - match:
        team: 'infra'
      receiver: 'infra-team'

🎯 Final Takeaway

Alertmanager is:

  • The alert distribution hub for Prometheus.
  • Provides routing, grouping, silencing, inhibition.
  • Supports many integrations (Slack, PagerDuty, Email).
  • Essential for production-grade monitoring.

πŸ‘‰ Think of Prometheus as the doctor detecting the illness, and Alertmanager as the nurse paging the right specialist.