Upgrading to 1.6.0

License

Licenses issued to activate 1.5.x and below versions of Seldon Deploy are not compatible with 1.6.x versions. Obtain a license for 1.6.0 version by contacting Seldon.

Prometheus Operator

Seldon Deploy 1.6.0 has changed from using the seldon-core-analytics Helm charts to using the Bitnami Helm charts as the default installation method for Prometheus. Thus, users using the older seldon-core-analytics installed Prometheus will need to take note of the following:

  1. Active models metrics namespace label name change

  2. Prometheus URL change

  3. kube-state-metrics metric change

1. Active models metrics namespace label name change

Deploy uses Prometheus namespace labels to form queries to retrieve and filter model/deployment metrics (such as CPU/memory limits/requests and active models usage). These are displayed in the Usage Monitor and Resource Monitor dashboard.

There are two namespace labels that are used - the Deploy server namespace, and the active model namespace. In Deploy 1.6.0, using the Bitnami default configuration, these labels are namespace and exported_namespace respectively. However, using the old default seldon-core-analytics configuration, these would be kubernetes_namespace and namespace respectively. To allow for backward compatibility, the Deploy Helm chart allows users to specify these namespace label names:

  • Current defaults:

    prometheus:
    seldon:
      namespaceMetricName: "namespace"
      activeModelsNamespaceMetricName: "exported_namespace"
  • Changes required for backward compatibility with seldon-core-analytics Prometheus:

    prometheus:
      seldon:
        namespaceMetricName: "kubernetes_namespace"
        activeModelsNamespaceMetricName: "namespace"

2. Prometheus URL change

The default Prometheus URL in the Helm chart now points to the Bitnami Prometheus default endpoint. From Deploy 1.6.0 onwards, users will need to specify the Prometheus URL for older installations of Prometheus:

  • Current defaults:

    prometheus:
      seldon:
        url: "http://seldon-monitoring-prometheus.seldon-system:9090/api/v1/"
      knative:
        url: "http://seldon-monitoring-prometheus.seldon-system:9090/api/v1/"
    env:
      ALERTMANAGER_URL: http://seldon-monitoring-alertmanager.seldon-system:9093/api/v1/alerts
  • Changes required for backward compatibility with seldon-core-analytics Prometheus:

    prometheus:
      seldon:
        url: "http://seldon-core-analytics-prometheus-seldon.seldon-system/api/v1/"
      knative:
        url: "http://seldon-core-analytics-prometheus-seldon.seldon-system/api/v1/"
    env:
      ALERTMANAGER_URL: http://seldon-core-analytics-prometheus-alertmanager.seldon-system:80/api/v1/alerts

3. kube-state-metrics metric change

The Bitnami Prometheus defaults to installing a much later version of the kube-state-metrics agent. Unfortunately, this has led to a breaking change in the metrics for CPU/Memory Requests/Limits as the metrics we were previously using are not present anymore. Seldon Deploy from version 1.6 using these metrics instead:

  • kube_pod_container_resource_requests_cpu_cores -> kube_pod_container_resource_requests{resource="cpu",unit="core"}

  • kube_pod_container_resource_limits_cpu_cores -> kube_pod_container_resource_limits{resource="cpu",unit="core"}

  • kube_pod_container_resource_limits_memory_bytes -> kube_pod_container_resource_limits{resource="memory",unit="byte"}

  • kube_pod_container_resource_requests_memory_bytes -> kube_pod_container_resource_requests{resource="memory",unit="byte"}

If your Prometheus instance does not expose these (Seldon Core Analytics should still be compatible) this may be a breaking change. Therefore, you may not be able to view data for the CPU limits, CPU requests, Memory limits, and Memory requests page on the Usage Monitor dashboard.

If these dashboards are required, we recommend updating your Prometheus installation.

Upgrading on OpenShift

Seldon Deploy 1.6 installation has been tested on OpenShift 4.10. The full documentation for the installation process is available here. Here, we briefly discuss differences from the previous OpenShift installation that must be taken into account during upgrade process.

New Network Policies

Two new NetworkPolicy resources seldon-detectors and seldon-detectors-serving must be created as discussed in Add NetworkPolicy Resources section. Following the provided documentation create networkpolicy-detectors.yaml manifest file and apply to all your model namespaces:

oc apply -f networkpolicy-detectors.yaml -n <model-namespace>

Cluster Log Forwarder

Section on Installing ClusterLogForwarder has been updated with pointers on how to limit log forwarding from specific namespaces. This is to limit disk usage on Elasticsearch instance by forwarding container logs only from namespaces hosting Seldon models.

New Monitoring Resources

Section on OpenShift Monitoring has been updated to include new PodMonitor resources that need to be created. Following the provided documentation:

  • create deploy-podmonitor.yaml manifest file and apply it to Seldon Deploy namespace:

    oc apply -f deploy-podmonitor.yaml -n seldon-system
  • update seldon-podmonitor.yaml manifest file to include seldon-podmonitor-metrics-server resource and apply it again to all your model namespaces:

    oc apply -f seldon-podmonitor.yaml -n <model-namespace>
  • updated model-usage-prometheus-rules.yaml manifest file to include seldon-podmonitor-metrics-server resource and apply it again to all your model namespaces:

    oc apply -f model-usage-prometheus-rules.yaml -n <model-namespace>

New Alerting Subsection

The new subsection Prometheus Rules for Model Usage has been added here. Follow it to configure alerting in your cluster.

Seldon Core and Deploy

  • Follow Seldon Core configuration for Seldon Core v1 and make following YAML change to make use of RClone storage initializer:

    - name: RELATED_IMAGE_STORAGE_INITIALIZER
      value: "seldonio/rclone-storage-initializer:1.13.1"
  • Change following values in values-openshift.yaml file:

    image: seldonio/seldon-deploy-server:1.6.0
    env:
      ALERTMANAGER_URL: https://alertmanager-main.openshift-monitoring:9094/api/v1/alerts

Obtain new Seldon Deploy Helm charts as described here and execute helm upgrade ... command as described in the documentation to upgrade Seldon Deploy.

'Content-Type: application/json' --data-raw "{"source": {"index": "${OLD_INDEX}"}, "dest": {"index": "${NEW_INDEX}"}}" ```

  1. Delete the old index to avoid duplicates in the Requests Dashboard

    curl --request DELETE "${ES_ADDR}/${OLD_INDEX}"
  • A similar set of steps are required for reference data except for a few key differences:

    • In step (c), note that the old index pattern for reference data did not include the endpoint. This information can be found in the deployment spec as with the inference logs. The pattern that is followed will be

      • Old index pattern: reference-log-<serving engine>-<deployment namespace>-<deployment name>

      • New index pattern: reference-log-<serving engine>-<deployment namespace>-<deployment name>-<deployment endpoint>-<deployment node>

    • Between step (d) and (e), run the following to add the Ce-Modelid field to the mapping:

      MAPPINGS=$(echo $MAPPINGS | jq ".\"properties\" += {\"Ce-Modelid\": {\"type\": \"keyword\"}}")
    • After step (f), add the model id to the Ce-Modelid field in the new index.

      export MODEL_ID=income-container
      curl --request POST "${ES_ADDR}/_reindex/update_by_query" \
         --header 'Content-Type: application/json' \
         --data-raw "{\"script\": {\"source\": \"ctx._source.Ce-Modelid = params.modelId\", \"lang\": \"painless\", \"params\": {\"modelId\": \"${MODEL_ID}\"}}, \"query\": {\"match_all\": {}}}"

Helm values

If you have customised your Deploy installation, please be aware that the following Helm values have changed since v1.4.0:

Name
Previous value
Current value

requestLogger.image

seldonio/seldon-request-logger:1.11.2

seldonio/metronome:1.0.1

alibidetect.image

seldonio/alibi-detect-server:1.11.2

seldonio/alibi-detect-server:1.13.1

batchjobs.processor.image

seldonio/seldon-core-s2i-python37:1.11.2

seldonio/seldon-core-s2i-python37:1.13.1

batchjobs.storageInitializer.image

seldonio/rclone-storage-initializer:1.11.2

seldonio/rclone-storage-initializer:1.13.1

seldon.v2ExplainForm

"http://{{ .ModelName }}-{{ .Predictor }}-explainer.{{ .Namespace }}:9000/v2/models/{{ .GraphModelName }}/explain"

"http://{{ .ModelName }}-{{ .Predictor }}-explainer.{{ .Namespace }}:9000/explain"

kfserving.enabled

true

false

Last updated