All pages
Powered by GitBook
1 of 3

Loading...

Loading...

Loading...

Monitoring and Alerting

The monitoring and alerting features of the Seldon Enterprise Platform provide robust tools for tracking the performance and health of machine learning models in production.

Monitoring

  • Real-Time metrics: collects and displays real-time metrics from deployed models, such as response times, error rates, and resource usage.

  • Model performance tracking: monitors key performance indicators (KPIs) like accuracy, drift detection, and model degradation over time.

  • Custom metrics: allows you to define and track custom metrics specific to their models and use cases.

  • Visualization: Provides dashboards and visualizations to easily observe the status and performance of models.

  • Proactive notifications: sends alerts when specific thresholds or conditions are met, such as a sudden drop in model accuracy or an increase in response latency.

  • Integration with alertmanager: leverages alertmanager to manage and route alerts to appropriate channels, such as email, Slack, or other communication tools.

  • Service Level Objectives (SLOs): alerts are triggered based on SLO breaches, ensuring that any critical issues in model performance or infrastructure are promptly addressed.

Together, these features ensure that models running in production are performing as expected and that any issues are quickly identified and addressed to maintain the reliability and effectiveness of the machine learning deployments.

For a hands-on experience, you can explore the alerting functionality through the after installing and components of Seldon Enterprise Platform.

Automated response: supports automated responses to alerts, such as scaling resources or triggering workflows to retrain a model.

Alerting:

alerting demo
monitoring
alerting

Monitoring

Installing kube-prometheus-stack in the same Kubernetes cluster that hosts the Seldon Enterprise Platform.

kube-prometheus, also known as Prometheus Operator, is a popular open-source project that provides complete monitoring and alerting solutions for Kubernetes clusters. It combines tools and components to create a monitoring stack for Kubernetes environments.

Note: Always install Prometheus within the same Kubernetes cluster as the Seldon Enterprise Platform.

The Seldon Enterprise Platform, along with any deployed models, automatically exposes metrics to Prometheus. By default, certain alerting rules are pre-configured, and an alertmanager instance is included.

You can install kube-prometheus to monitor Seldon components, and ensure that the appropriate ServiceMonitors are in place for Seldon deployments. The analytics component is configured with the Prometheus integration. The monitoring for Seldon Enterprise Platform is based on the Prometheus Operator and the related PodMonitor and PrometheusRule resources.

Monitoring the model deployments in Seldon Enterprise Platform involves:

  1. Install .

  2. Install .

  3. Install

  1. Download the seldon-deploy-install.tar file that contains required installation resources. For example, to download the installation resources for version 2.4.0 of Seldon Enterprise Platform run the following:

  2. Extract the contents of the seldon-deploy-install.tar file.

  3. Create a namespace for the monitoring components of Seldon Enterprise Platform.

  1. To configure monitoring create a dedicated PodMonitor and PrometheusRule resources. Copy the installation resource files from the seldon-deploy-install/reference-configuration/metrics/ directory to the current directory.

  2. Apply the configurations to the Kubernetes cluster that is running the Seldon Enterprise Platform.

    When the configuration is complete, you should see this:

  1. Get the Pod that is running Seldon Enterprise Platform in the cluster and save it as $POD_NAME.

  2. You can use port-forwarding to access your application locally.

  3. Open your browser and navigate to http://127.0.0.1:8000/seldon-deploy/ to access Seldon Enterprise Platform.

You may now be able to check the status of Seldon components in Prometheus:

  1. Open your browser and navigate to http://127.0.0.1:9090/ to access Prometheus UI from outside the cluster.

  2. Go to Status and select Targets.

The status of all the endpoints and the scrape details are displayed.

  • Create a YAML file to specify the initial configuration. For example, create the prometheus-values.yaml file. Use your preferred text editor to create and save the file with the following content:

    Note: Make sure to include metric-labels-allowlist: pods=[*] in the Helm values file. If you are using your own Prometheus Operator installation, ensure that the pods labels, particularly app.kubernetes.io/managed-by=seldon-core, are part of the collected metrics. These labels are essential for calculating deployment usage rules.

  • Change to the directory that contains the prometheus-values file and run the following command to install version 9.5.12 of kube-prometheus.

    When the installation is complete, you should see this:

  • Check the status of the installation.

    When the installation is complete, you should see this:

  • You can access Prometheus from outside the cluster by running the following commands:

  • You can access Alertmanager from outside the cluster by running the following commands:

  • Add the following to your install-values.yamlfile.

  • Configure metrics collection by creating the following PodMonitor resources.

    When the resources are created, you should see this:

  • Change to the directory that contains the install-values.yaml file and then upgrade the Seldon Enterprise Platform installation in the namespace seldon-system.

  • Check the status of the installation seldon-enterprise-seldon-deploy.

    When the installation is complete you should see this:

  • Access Seldon Enterprise Platform.

    1. Find the IP address of the Seldon Enterprise Platform instance running with Istio:

    2. Open your browser and navigate to http://$ISTIO_INGRESS/seldon-deploy/ to access Seldon Enterprise Platform. Where $ISTIO_INGRESS is the IP address of Seldon Enterprise Platform.

    Prerequisites

    Installing kube-prometheus

    Configuring monitoring for Seldon Enterprise Platform

    Next

    Installing kube-prometheus
    Configuring monitoring
    Seldon Enterprise Platform
    Ingress Controller
    Docker
    fullnameOverride: seldon-monitoring
    kube-state-metrics:
      extraArgs:
        metric-labels-allowlist: pods=[*]
    helm upgrade --install prometheus kube-prometheus \
     --version 9.5.12 \
     --namespace seldon-monitoring \
     --values prometheus-values.yaml \
     --repo https://charts.bitnami.com/bitnami
    WARNING: There are "resources" sections in the chart not set. Using "resourcesPreset" is not recommended for production. For production installations, please set the following values according to your workload needs:
      - alertmanager.resources
      - blackboxExporter.resources
      - operator.resources
      - prometheus.resources
      - prometheus.thanos.resources
    +info https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
    
    kubectl rollout status -n seldon-monitoring deployment/seldon-monitoring-operator
    Waiting for deployment "seldon-monitoring-operator" rollout to finish: 0 of 1 updated replicas are available...
    deployment "seldon-monitoring-operator" successfully rolled out
    echo "Prometheus URL: http://127.0.0.1:9090/"
    kubectl port-forward --namespace seldon-monitoring svc/seldon-monitoring-prometheus 9090:9090
    echo "Alertmanager URL: http://127.0.0.1:9093/"
    kubectl port-forward --namespace seldon-monitoring svc/seldon-monitoring-alertmanager 9093:9093
    prometheus:
     seldon:
       namespaceMetricName: namespace
       activeModelsNamespaceMetricName: exported_namespace
       serviceMetricName: service
       url: http://seldon-monitoring-prometheus.seldon-monitoring:9090/api/v1/
    env:
      ALERTMANAGER_URL: http://seldon-monitoring-alertmanager.seldon-monitoring:9093/api/v1/alerts   
    PODMONITOR_RESOURCE_LOCATION=https://raw.githubusercontent.com/SeldonIO/seldon-core/v2.8.5/prometheus/monitors
    
    kubectl apply -f ${PODMONITOR_RESOURCE_LOCATION}/agent-podmonitor.yaml
    kubectl apply -f ${PODMONITOR_RESOURCE_LOCATION}/envoy-servicemonitor.yaml
    kubectl apply -f ${PODMONITOR_RESOURCE_LOCATION}/pipelinegateway-podmonitor.yaml
    kubectl apply -f ${PODMONITOR_RESOURCE_LOCATION}/server-podmonitor.yaml
    podmonitor.monitoring.coreos.com/agent created
    servicemonitor.monitoring.coreos.com/envoy created
    podmonitor.monitoring.coreos.com/pipelinegateway created
    podmonitor.monitoring.coreos.com/server created
    helm upgrade seldon-enterprise seldon-charts/seldon-deploy --namespace seldon-system  -f install-values.yaml --version 2.4.0 --install
    kubectl rollout status deployment/seldon-enterprise-seldon-deploy -n seldon-system
    deployment "seldon-enterprise-seldon-deploy" successfully rolled out
    ISTIO_INGRESS=$(kubectl get svc -n istio-system istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
    ISTIO_INGRESS+=$(kubectl get svc -n istio-system istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
    
    echo "Seldon Enterprise Platform: http://$ISTIO_INGRESS/seldon-deploy/"
    
    TAG=2.4.0 && \
     docker create --name=tmp-sd-container seldonio/seldon-deploy-server:2.4.0 && \
     docker cp tmp-sd-container:/seldon-deploy-dist/seldon-deploy-install.tar.gz . && \
     docker rm -v tmp-sd-container
    tar -xzf seldon-deploy-install.tar.gz
    cp seldon-deploy-install/reference-configuration/metrics/seldon-monitor.yaml seldon-monitor.yaml
    cp seldon-deploy-install/reference-configuration/metrics/drift-monitor.yaml drift-monitor.yaml
    cp seldon-deploy-install/reference-configuration/metrics/deploy-monitor.yaml deploy-monitor.yaml
    cp seldon-deploy-install/reference-configuration/metrics/metrics-server-monitor.yaml metrics-server-monitor.yaml
    cp seldon-deploy-install/reference-configuration/metrics/deployment-usage-rules.yaml deployment-usage-rules.yaml
    kubectl apply -n seldon-monitoring -f seldon-monitor.yaml
    kubectl apply -n seldon-monitoring -f drift-monitor.yaml
    kubectl apply -n seldon-monitoring -f deploy-monitor.yaml
    kubectl apply -n seldon-monitoring -f metrics-server-monitor.yaml
    kubectl apply -f deployment-usage-rules.yaml -n seldon-monitoring
    podmonitor.monitoring.coreos.com/seldon-core created
    podmonitor.monitoring.coreos.com/seldon-drift-detector created
    podmonitor.monitoring.coreos.com/seldon-deploy created
    podmonitor.monitoring.coreos.com/metrics-server created
    prometheusrule.monitoring.coreos.com/seldon-deployment-usage-rules created
    export POD_NAME=$(kubectl get pods --namespace seldon-system -l "app.kubernetes.io/name=seldon-deploy,app.kubernetes.io/instance=seldon-enterprise" -o jsonpath="{.items[0].metadata.name}")
    kubectl port-forward $POD_NAME 8000:8000 --namespace seldon-system
    kubectl create ns seldon-monitoring || echo "Namespace seldon-monitoring already exists"

    Alerting

    Installing kube-prometheus-stack in the same Kubernetes cluster that hosts the Seldon Enterprise Platform.

    The Seldon Enterprise Platform, along with any deployed models, automatically exposes metrics to Prometheus. By default, certain alerting rules are pre-configured, and an alertmanager instance is included.

    You can configure Alertmanager to send alerts through email or Slack. It can also be integrated into an incident response tool. To receive alerts when using Seldon Enterprise Platform you need to:

    1. Configure alerts

    2. Integrate into an incident response

    Prerequisites

    1. Install .

    2. Install .

    3. Install

    1. To configure default alerting rules, copy the installation resource files from the seldon-deploy-install/reference-configuration/metrics/ directory to the current directory. To configure custom alerts, see the custom alerts section.

    2. Apply the configurations to the Kubernetes cluster that is running the Seldon Enterprise Platform.

      When the configuration is complete, you should see this:

    3. Create a YAML file to specify the initial configuration. For example, create the

    1. Get the Pod that is running Seldon Enterprise Platform in the cluster and save it as $POD_NAME.

    2. You can use port-forwarding to access your application locally.

    3. Open your browser and navigate to http://127.0.0.1:8000/seldon-deploy/ to access Seldon Enterprise Platform.

    You can also define your own custom alerting rules in Prometheus.

    1. Create a file called custom-alert.yaml that contains your new rules. You can find some examples in the file user-alerts.yaml file located in the seldon-deploy-install/reference-configuration/metrics/ folder.

    2. Apply the alerts using:

    • If you are using App Level Authentication you need to add http_config in the webhook_configs section of alertmanager.yaml. This needs a client that has been configured to access the . The token_url value may vary, depending on your OIDC provider.

    • If you are using a self-signed certificate on your OIDC provider then you need to set insecure_skip_verify in the tls_config of the oauth2

    You can integrate the alerts that you configured in Seldon Enterprise Platform with various alert notification tools such as or .

    You may now be able to check the alerts that you configured in Alertmanager:

    1. Open your browser and navigate to http://127.0.0.1:9093/ to access Alertmanager UI from outside the cluster.

    2. Go to Alerts and check if any alert rules that are listed in Prometheus are tiggered.

    If any of the alert rules are triggered then those alerts are displayed.

    alertmanager.yaml
    file. Use your preferred text editor to create and save the file with the following content:

    For more information about configuring alerts during authetication, see Authentication alerts section.

  • Apply the Altermanager configurations in the Kubernetes cluster that is running Seldon Enterprise Platform:

    When the configurations are applied, you should see this:

  • You can access Alertmanager from outside the cluster by running the following commands:

  • Access Seldon Enterprise Platform.

    1. Find the IP address of the Seldon Enterprise Platform instance running with Istio:

    2. Open your browser and navigate to http://$ISTIO_INGRESS/seldon-deploy/ to access Seldon Enterprise Platform. Where $ISTIO_INGRESS is the IP address of Seldon Enterprise Platform.

    block. Alternatively, you can mount your CA certificate onto the Alertmanager instance to validate the server certificate using
    ca_file
    . For more information see, the
    .

    Configuring alerts in Seldon Enterprise Manager

    Custom alerts

    Authentication alerts

    Integrating into an incident response tool

    Next

    Additional Resources

    Seldon Enterprise Platform
    Ingress Controller
    kube-prometheus
    Seldon Enterprise Platform API
    PagerDuty
    Opsgenie
    Alerting Demo
    PagerDuty Documentation
    Opsgenie Documentation
    Prometheus documentation
    kubectl delete secret -n seldon-monitoring alertmanager-seldon-monitoring-alertmanager || echo "Does not yet exist"
    kubectl apply -f alertmanager.yaml -n seldon-monitoring
    secret "alertmanager-seldon-monitoring-alertmanager" deleted
    secret/alertmanager-seldon-monitoring-alertmanager created
    echo "Alertmanager URL: http://127.0.0.1:9093/"
    kubectl port-forward --namespace seldon-monitoring svc/seldon-monitoring-alertmanager 9093:9093
    ISTIO_INGRESS=$(kubectl get svc -n istio-system istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
    ISTIO_INGRESS+=$(kubectl get svc -n istio-system istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
    
    echo "Seldon Enterprise Platform: http://$ISTIO_INGRESS/seldon-deploy/"
    
     cp seldon-deploy-install/reference-configuration/metrics/user-alerts.yaml user-alerts.yaml
     cp seldon-deploy-install/reference-configuration/metrics/infra-alerts.yaml infra-alerts.yaml
     cp seldon-deploy-install/reference-configuration/metrics/drift-alerts.yaml drift-alerts.yaml
     kubectl apply -n seldon-monitoring -f infra-alerts.yaml
     kubectl apply -n seldon-monitoring -f user-alerts.yaml
     kubectl apply -n seldon-monitoring -f drift-alerts.yaml
     prometheusrule.monitoring.coreos.com/deploy-infra-alerts created
     prometheusrule.monitoring.coreos.com/deploy-user-alerts created
     prometheusrule.monitoring.coreos.com/seldon-drift-alerts created
    export POD_NAME=$(kubectl get pods --namespace seldon-system -l "app.kubernetes.io/name=seldon-deploy,app.kubernetes.io/instance=seldon-enterprise" -o jsonpath="{.items[0].metadata.name}")
    kubectl port-forward $POD_NAME 8000:8000 --namespace seldon-system
    kubectl create -f custom-alert.yaml
    webhook_configs:
      - url: "http://seldon-deploy.seldon-system:80/seldon-deploy/api/v1alpha1/webhooks/firing-alert"
        http_config:
          oauth2:
            client_id: "${OIDC_CLIENT_ID}"
            client_secret: "${OIDC_CLIENT_SECRET}"
            scopes: [openid]
            token_url: "${OIDC_HOST}/auth/realms/${OIDC_REALM}/protocol/openid-connect/token"
            # Note: only needed if using a self-signed certificate on your OIDC provider
            tls_config:
              insecure_skip_verify: true
    kind: Secret
    apiVersion: v1
    metadata:
      name: alertmanager-seldon-monitoring-alertmanager
    stringData:
      alertmanager.yaml: |
        receivers:
          - name: default-receiver
          - name: deploy-webhook
            webhook_configs:
              - url: "http://seldon-deploy.seldon-system:80/seldon-deploy/api/v1alpha1/webhooks/firing-alert"
        route:
          group_wait: 10s
          group_by: ['alertname']
          group_interval: 5m
          receiver: default-receiver
          repeat_interval: 3h
          routes:
            - receiver: deploy-webhook
              matchers:
                - severity =~ "warning|critical"
                - type =~ "user|infra"