Monitoring: Alertmanager Deadmanswatch

This post describes the setup of a dead man’s switch for Prometheus / Alertmanager in a kubernetes cluster. A primary goal of the monitoring and alerting system is to generate alerts as soon as possible when problems occur, so administrators can react promptly and the impact on users can be limited. But what happens, if the monitoring system itself is impaired? In case of an outage of the kubernetes cluster where Prometheus is installed, it is highly probable that no alerts are generated....

January 10, 2023