Alerting Integrations
Last updated
Last updated
The alerting integration needs to be properly configured. The default installation provides the required ConfigMaps, and in case of errors please refer to the production installation guide.
The Seldon Enterprise Platform alerting integration is built with a flexible architecture and allows users to monitor and define SLAs, SLOs, and SLIs for their models and the operation of the platform. Prometheus metrics exposed by Enterprise Platform and the models form the basis of SLIs and defined alerts form the SLOs.
In this example, we'll demonstrate how to push a test alert manually through the API to provide intuition on what would happen if SLO is breached.
Using the API to fire a test alert
You can use the API to fire a test alert. If you have configured Alertmanager correctly this will then show up in the Enterprise Platform frontend.
You can make an authorized curl request as below, getting a token using the API auth guide.
curl http://<ENTERPRISE_PLATFORM_IP>/seldon-deploy/api/v1alpha1/alerting/test -X POST -H "Authorization: Bearer $TOKEN"
Alert shows in notifications drawer
The test alert will send a notification to the Enterprise Platform frontend within seconds, along with informing any other receivers (Pagerduty/Opsgenie/Slack/Email) that you may have configured.
Alert shows on alerts page
Click on View All Firing Alerts
in the alerts tray and, you will see the test alert along with any other currently firing alerts. This allows you to diagnose and fix any issues you may have missed when away from the Enterprise Platform UI.
Alert removed from alerts page
The test alert will resolve after 1 minute and will no longer be visible on the alerts page once refreshed.
Resolution notification
Once the alert is resolved, after some time the frontend will be notified about the resolution. If you use the default configuration this will be after 5 minutes, but otherwise depends on Alertmanager's resolve_timeout
.
This test alert only relies on Alertmanager, but real alerts will send resolution notifications as soon as Prometheus reports the alert as resolved.