Installing kube-prometheus-stack in the same Kubernetes cluster that hosts the Seldon Enterprise Platform.
kube-prometheus, also known as Prometheus Operator, is a popular open-source project that provides complete monitoring and alerting solutions for Kubernetes clusters. It combines tools and components to create a monitoring stack for Kubernetes environments.
Note: Always install Prometheus within the same Kubernetes cluster as the Seldon Enterprise Platform.
The Seldon Enterprise Platform, along with any deployed models, automatically exposes metrics to Prometheus. By default, certain alerting rules are pre-configured, and an alertmanager instance is included.
You can install kube-prometheus to monitor Seldon components, and ensure that the appropriate ServiceMonitors are in place for Seldon deployments. The analytics component is configured with the Prometheus integration. The monitoring for Seldon Enterprise Platform is based on the Prometheus Operator and the related PodMonitor and PrometheusRule resources.
Monitoring the model deployments in Seldon Enterprise Platform involves:
Download the seldon-deploy-install.tar file that contains required installation resources. For example, to download the installation resources for version 2.4.0 of Seldon Enterprise Platform run the following:
Create a YAML file to specify the initial configuration. For example, create the prometheus-values.yaml file. Use your preferred text editor to create and save the file with the following content:
Note: Make sure to include metric-labels-allowlist: pods=[*] in the Helm values file. If you are using your own Prometheus Operator installation, ensure that the pods labels, particularly app.kubernetes.io/managed-by=seldon-core, are part of the collected metrics. These labels are essential for calculating deployment usage rules.
Change to the directory that contains the prometheus-values file and run the following command to install version 9.5.12 of kube-prometheus.
When the installation is complete, you should see this:
WARNING: There are "resources" sections in the chart not set. Using "resourcesPreset" is not recommended for production. For production installations, please set the following values according to your workload needs:
- alertmanager.resources
- blackboxExporter.resources
- operator.resources
- prometheus.resources
- prometheus.thanos.resources
+info https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
Check the status of the installation.
kubectl rollout status -n seldon-monitoring deployment/seldon-monitoring-operator
When the installation is complete, you should see this:
Waiting for deployment "seldon-monitoring-operator" rollout to finish: 0 of 1 updated replicas are available...
deployment "seldon-monitoring-operator" successfully rolled out
Configuring monitoring for Seldon Enterprise Platform
To configure monitoring create a dedicated PodMonitor and PrometheusRule resources. Copy the installation resource files from the seldon-deploy-install/reference-configuration/metrics/ directory to the current directory.
When the configuration is complete, you should see this:
podmonitor.monitoring.coreos.com/seldon-core created
podmonitor.monitoring.coreos.com/seldon-drift-detector created
podmonitor.monitoring.coreos.com/seldon-deploy created
podmonitor.monitoring.coreos.com/metrics-server created
prometheusrule.monitoring.coreos.com/seldon-deployment-usage-rules created
You can access Prometheus from outside the cluster by running the following commands:
Change to the directory that contains the install-values.yaml file and then upgrade the Seldon Enterprise Platform installation in the namespace seldon-system.
Open your browser and navigate to http://$ISTIO_INGRESS/seldon-deploy/ to access Seldon Enterprise Platform. Where $ISTIO_INGRESS is the IP address of Seldon Enterprise Platform.
Next
You may now be able to check the status of Seldon components in Prometheus:
Open your browser and navigate to http://127.0.0.1:9090/ to access Prometheus UI from outside the cluster.
Go to Status and select Targets.
The status of all the endpoints and the scrape details are displayed.