Prometheus - Alertmanager routing with AlertmanagerConfig

This post will describe the routing for Alertmanager with the declarative AlertmanagerConfig resource provided by the Prometheus-Operator in Kubernetes.

Scenario

Our alerts in the Kubernetes cluster are forwarded to Mattermost via the Alertmanager. This is the scenario i will cover in the post:

  • All alerts go to a main Mattermost channel (aws-alerts)
  • Each customer deployment is in a separate Kubernetes namespace
  • Each customer can have multiple modules represented by a Kubernetes deployment for each module.
  • Alerts that are triggered form a customer module should be forwarded to a customer/module specific Mattermost channel with the naming schema: aws-<customer-namespace>-<module>

For example, we have the namespace customer with the two modules backend and frontend. This will result in the following channels

  • aws-customer-backend
  • aws-customer-frontend

An alert with the labels namespace=customer, module=frontend will end up in the following Channels

flowchart LR al[Alertmanager] al --> aa[aws-alerts] al --> acf[aws-customer-frontend]

Routing

Static routing rules

A basic routing for the Alertmanager is already deployed with a default receiver for the Mattermost main channel aws-alerts and notice the matchLabel in line 3. This is important so our AlertmanagerConfig gets picked up:

 1alertmanagerConfigSelector:
 2matchLabels:
 3alertmanager-config: mattermost
 4config:
 5route:
 6receiver: mattermost_aws_alerts
 7receivers:
 8
 9- name: mattermost_aws_alerts
10  api_url: xxx
11  

Dynamic routing with AlertmanagerConfig

Our software for the customer is deployed with a helm chart. Therefore, the AlertmanagerConfig will be added to the chart to automatically configure routing rules for a new customer. Each AlertmanagerConfig will be merged with the previous routing config.

 1apiVersion: monitoring.coreos.com/v1alpha1
 2kind: AlertmanagerConfig
 3metadata:
 4  name: {{ $moduleName }}
 5  labels:
 6    alertmanager-config: mattermost
 7spec:
 8  route:
 9    receiver: silence
10    routes:
11      {{- range $key,$val := $.Values.alertmanager.matchers }}
12      - matchers: {{- tpl ($val | toYaml) $ | nindent 8 }}
13        receiver: {{ tpl $.Values.alertmanager.receiver $ }}
14      {{- end }}
15  receivers:
16    - name: silence
17    - name: {{ tpl $.Values.alertmanager.receiver $ }}
18      slackConfigs:
19        - apiURL:
20            key: url
21            name: mattermost-webhook-{{ $.Release.Name }}
22          channel: aws-{{ $.Release.Name | lower }}-{{ $moduleName }}
23          sendResolved: {{ $.Values.alertmanager.mattermostMessage.sendResolved }}
24          color: |-
25            {{- tpl ($.Values.alertmanager.mattermostMessage.color) $ | nindent 12 }}            
26          title: |-
27            {{- tpl ($.Values.alertmanager.mattermostMessage.title) $ | nindent 12 }}            
28          pretext: |-
29            {{- tpl ($.Values.alertmanager.mattermostMessage.pretext) $ | nindent 12 }}            
30          text: |-
31            {{- tpl $.Values.alertmanager.mattermostMessage.text $ | nindent 12 }}            
   

This is a very configurable template to be able to adjust the matchers and the message format. This is useful when things are going to change, otherwise a new chart has to be deployed. The values.yaml contains two matchers:

 1alertmanager:
 2  enabled: true
 3  alertnameWithServiceLabelRegex: "(LoggingError|TargetDown|SlowRequestsByUri|SlowRequests)"
 4  alertnameWithPodLabelRegex: "(PodRestart)"
 5  receiver: 'mattermost-{{ $.Release.Name }}-{{ $.moduleName }}'
 6  matchers:
 7    #includes all alertnames that come with a service label e.g. service: proxora-test-pxs-benefits
 8    serviceMatcher:
 9      - matchType: =~
10        name: alertname
11        value: '{{ $.Values.alertmanager.alertnameWithServiceLabelRegex }}'
12      - matchType: =
13        name: service
14        value: '{{ template "pxs.resourceName" dict "module" $.moduleName "context" $ }}'
15        #includes all alertnames that come with a pod label e.g. pod: demo-pxs-mail-98f457dff-pvxgg, uses the regex demo-pxs-mail.*
16    podMatcher:
17      - matchType: =~
18        name: alertname
19        value: '{{ $.Values.alertmanager.alertnameWithPodLabelRegex }}'
20      - matchType: =~
21        name: pod
22        value: '{{ template "pxs.resourceName" dict "module" $.moduleName "context" $ }}.*'
23  mattermostWebhookUrl: https://
24  #configures notifications via mattermost.
25  mattermostMessage:
26    color: |-
27      {{` {{ if eq .Status "firing" }}danger{{ else }}good{{ end }} `}}      
28    title: |-
29      {{` [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] Prometheus Event Notification `}}      
30    sendResolved: true
31    pretext:  |-
32      {{`{{ .CommonAnnotations.summary }}`}}      
33    text: |-
34      {{`{{ range .Alerts }}
35      ...      

   

Finally, this is the final merged alertmanager config, notice that the receiver names get prefixed with the kubernetes namespace in which the AlertmanagerConfig was deployed and with the name of the AlertmanagerConfig itself (backend in this case). In the example, the namespaces is customer with the module backend

 1route:
 2receiver: mattermost_aws_alerts
 3group_by:
 4- job
 5  continue: false
 6  routes:
 7- receiver: customer-backend-silence
 8  matchers:
 9  - namespace="customer"
10    continue: true
11    routes:
12  - receiver: customer-backend-mattermost-customer-backend # <namespace>-<alertmanagerConfigName> as prefix
13    matchers:
14    - alertname=~"(PodRestart)"
15    - pod=~"customer-backend.*"
16      continue: false
17  - receiver: customer-backend-mattermost-customer-backend
18    matchers:
19    - alertname=~"(LoggingErrorCount|TargetDown|SlowRequestsByUri|SlowRequests)"
20    - service="customer-backend"
21      continue: false

   

This provides us routing into customer specific channels and is configured automatically with our helm chart! Another way would be a Kubernetes-Operator that creates the AlertmanagerConfig with some meta information from the deployed chart, with the advantage of further decoupling the generated AlertmanagerConfig from the chart which would provide more flexiblity.