Prometheus - Alertmanager routing with AlertmanagerConfig
This post will describe the routing for Alertmanager with the declarative AlertmanagerConfig
resource provided by the
Prometheus-Operator in Kubernetes.
Scenario
Our alerts in the Kubernetes cluster are forwarded to Mattermost via the Alertmanager. This is the scenario i will cover in the post:
- All alerts go to a main Mattermost channel (aws-alerts)
- Each customer deployment is in a separate Kubernetes namespace
- Each customer can have multiple modules represented by a Kubernetes deployment for each module.
- Alerts that are triggered form a customer module should be forwarded to a customer/module specific Mattermost channel
with the naming schema:
aws-<customer-namespace>-<module>
For example, we have the namespace customer
with the two modules backend
and frontend
. This will result in the
following channels
- aws-customer-backend
- aws-customer-frontend
An alert with the labels namespace=customer, module=frontend
will end up in the following Channels
Routing
Static routing rules
A basic routing for the Alertmanager is already deployed with a default receiver for the Mattermost main
channel aws-alerts
and
notice the matchLabel
in line 3. This is important so our AlertmanagerConfig
gets picked up:
1alertmanagerConfigSelector:
2matchLabels:
3alertmanager-config: mattermost
4config:
5route:
6receiver: mattermost_aws_alerts
7receivers:
8
9- name: mattermost_aws_alerts
10 api_url: xxx
11
Dynamic routing with AlertmanagerConfig
Our software for the customer is deployed with a helm chart. Therefore, the AlertmanagerConfig
will be added to the
chart to automatically configure routing rules for a new customer. Each AlertmanagerConfig
will be merged with the
previous routing config.
1apiVersion: monitoring.coreos.com/v1alpha1
2kind: AlertmanagerConfig
3metadata:
4 name: {{ $moduleName }}
5 labels:
6 alertmanager-config: mattermost
7spec:
8 route:
9 receiver: silence
10 routes:
11 {{- range $key,$val := $.Values.alertmanager.matchers }}
12 - matchers: {{- tpl ($val | toYaml) $ | nindent 8 }}
13 receiver: {{ tpl $.Values.alertmanager.receiver $ }}
14 {{- end }}
15 receivers:
16 - name: silence
17 - name: {{ tpl $.Values.alertmanager.receiver $ }}
18 slackConfigs:
19 - apiURL:
20 key: url
21 name: mattermost-webhook-{{ $.Release.Name }}
22 channel: aws-{{ $.Release.Name | lower }}-{{ $moduleName }}
23 sendResolved: {{ $.Values.alertmanager.mattermostMessage.sendResolved }}
24 color: |-
25 {{- tpl ($.Values.alertmanager.mattermostMessage.color) $ | nindent 12 }}
26 title: |-
27 {{- tpl ($.Values.alertmanager.mattermostMessage.title) $ | nindent 12 }}
28 pretext: |-
29 {{- tpl ($.Values.alertmanager.mattermostMessage.pretext) $ | nindent 12 }}
30 text: |-
31 {{- tpl $.Values.alertmanager.mattermostMessage.text $ | nindent 12 }}
This is a very configurable template to be able to adjust the matchers and the message format.
This is useful when things are going to change, otherwise a new chart has to be deployed. The values.yaml
contains two matchers:
1alertmanager:
2 enabled: true
3 alertnameWithServiceLabelRegex: "(LoggingError|TargetDown|SlowRequestsByUri|SlowRequests)"
4 alertnameWithPodLabelRegex: "(PodRestart)"
5 receiver: 'mattermost-{{ $.Release.Name }}-{{ $.moduleName }}'
6 matchers:
7 #includes all alertnames that come with a service label e.g. service: proxora-test-pxs-benefits
8 serviceMatcher:
9 - matchType: =~
10 name: alertname
11 value: '{{ $.Values.alertmanager.alertnameWithServiceLabelRegex }}'
12 - matchType: =
13 name: service
14 value: '{{ template "pxs.resourceName" dict "module" $.moduleName "context" $ }}'
15 #includes all alertnames that come with a pod label e.g. pod: demo-pxs-mail-98f457dff-pvxgg, uses the regex demo-pxs-mail.*
16 podMatcher:
17 - matchType: =~
18 name: alertname
19 value: '{{ $.Values.alertmanager.alertnameWithPodLabelRegex }}'
20 - matchType: =~
21 name: pod
22 value: '{{ template "pxs.resourceName" dict "module" $.moduleName "context" $ }}.*'
23 mattermostWebhookUrl: https://
24 #configures notifications via mattermost.
25 mattermostMessage:
26 color: |-
27 {{` {{ if eq .Status "firing" }}danger{{ else }}good{{ end }} `}}
28 title: |-
29 {{` [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] Prometheus Event Notification `}}
30 sendResolved: true
31 pretext: |-
32 {{`{{ .CommonAnnotations.summary }}`}}
33 text: |-
34 {{`{{ range .Alerts }}
35 ...
Finally, this is the final merged alertmanager config, notice that the receiver names get prefixed with the kubernetes namespace in which the AlertmanagerConfig
was deployed and with the name of the AlertmanagerConfig itself (backend in this case). In the example, the namespaces is customer
with the module backend
1route:
2receiver: mattermost_aws_alerts
3group_by:
4- job
5 continue: false
6 routes:
7- receiver: customer-backend-silence
8 matchers:
9 - namespace="customer"
10 continue: true
11 routes:
12 - receiver: customer-backend-mattermost-customer-backend # <namespace>-<alertmanagerConfigName> as prefix
13 matchers:
14 - alertname=~"(PodRestart)"
15 - pod=~"customer-backend.*"
16 continue: false
17 - receiver: customer-backend-mattermost-customer-backend
18 matchers:
19 - alertname=~"(LoggingErrorCount|TargetDown|SlowRequestsByUri|SlowRequests)"
20 - service="customer-backend"
21 continue: false
This provides us routing into customer specific channels and is configured automatically with our helm chart! Another way would be a Kubernetes-Operator that creates the AlertmanagerConfig with some meta information from the deployed chart, with the advantage of further decoupling the generated AlertmanagerConfig from the chart which would provide more flexiblity.