OpenShift Environment
Installations for OpenShift environment
Introduction
This document walks through the installation procedure for Seldon Enterprise Platform v2.1.0 onto Red Hat OpenShift (RHOCP) v4.13.
The prerequisites for the installation are aligned with usual requirements for installation / configuration of operators on the Red Hat OpenShift platform:
Access to the OpenShift Container Platform web console.
An account with the
cluster-adminrole.Being logged in to the OpenShift Container Platform cluster as an administrator.
Preparation
Creating the Seldon Namespaces
Seldon Enterprise Platform has a number of namespaces which it expects to be present, and associated with given labels.
The first namespace to create is where the Seldon Enterprise Platform controller pod will run. This is the main orchestrator of the Seldon technology stack, and expects to run in the seldon-system namespace.
oc create namespace seldon-logs
oc create namespace seldon-systemNext, create a namespace within which models can be deployed. In this documentation this is going to be called seldon, however can be configured to any name of your choosing. Create the namespace, and then add the seldon.restricted label so SD has access to it.
oc create namespace seldon
oc label namespace seldon seldon.restricted=false --overwrite=trueDependencies
OpenShift Service Mesh
The first step within the installation process is to add OpenShift Service Mesh. This is required for the networking of all other pieces within the Seldon Enterprise Platform stack, as well as the ingress/egress for model endpoints.
Adding Operators
The initial action taken is to add the relevant operators required for the logging stack. Log into the RHOCP console and navigate to the OperatorHub, within the Operators tab.
Search for and install the following operators, with the default options:
Red Hat OpenShift distributed tracing platform (provided by Red Hat) (1.39.0-3*)
Kiali Operator (provided by Red Hat) (1.57.3*)
Red Hat OpenShift Service Mesh (provided by Red Hat) (2.3.0*)
Configuring the ServiceMeshControlPlane
Next, configure the ServiceMeshControlPlane. Ensuring Control Plane in Version v2.0 is installed and the control plane is created within the istio-system namespace, as per the OpenShift documentation.
Add Namespaces to ServiceMeshMemberRoll
Navigate to the Red Hat OpenShift Service Mesh operator under your installed operators. Select the Istio Service Mesh Member Roll tab and create a new ServiceMeshMemberRoll in istio-system namespace with the following namespaces added to the member roll:
seldonseldon-logsseldon-kafkaseldon-system
Note: this is easiest done using the YAML editor.
Create Seldon’s Istio Gateway
Seldon then requires an Istio gateway to allow traffic to and from the SD controller pod as well as to enable advanced routing features like canary and shadow deployments. Create a following YAML manifest called in this example istio-gateway.yaml:
Apply it with
Create SSL Secure Route
OpenShift clusters usually come with Let's Encrypt certificates enabled for default ingress domain. One can create Route with the tls.termination: edge in order to re-use these certificates. Create the following YAML manifest called seldon-route.yaml:
To display your default ingress domain, run the following command:
After using it in the above manifest in place of <Ingress_Domain> your Istio Ingress will be exposed under
Apply the above manifest with
OpenShift Serverless
Seldon Enterprise Platform uses OpenShift Serverless, in the form of Knative Serving and Knative Eventing, to power many of the advanced monitoring components associated with your deployments; namely outlier and drift detection. Without this component these features will fail to function within the platform.
Serverless Operator
Once more, navigate to the Operator Hub and install the official Red Hat OpenShift Serverless (1.26.0*) operator. Install using the default options.
Install Knative Eventing
Using the Serverless Operator, and as per the OpenShift documentation, install an instance of Knative Eventing.
Install Knative Serving
Using the Serverless Operator, and as per the OpenShift documentation, install an instance of Knative Serving.
Logs Namespace and Broker
The Knative events which activate the outlier and drift detectors in the form of Knative served pods are reliant on events from the Seldon logging stack. The Seldon logging stack is installed into the seldon-logs namespace with a Knative Eventing Broker configured within it.
If you did not create this namespace yet create it now
Create the Knative Eventing Broker, eventing-broker.yaml
and apply it with
Add NetworkPolicy Resources
Finally, we need to add a couple of NetworkPolicy resources to ensure that traffic can flow from Knative Eventing and Serving to the different Seldon namespace:
Seldon Logs Namespace
To allow traffic from Knative Eventing into the Seldon Logs namespace we will create a NetworkPolicy resource. Create networkpolicy-seldon-logs.yaml:
and apply it with
Model Namespaces
For each model namespace, we will need to create a couple of NetworkPolicy resources to ensure that traffic from both Knative Eventing and Serving can go into our model namespace. For this, first create a file named networkpolicy-detectors.yaml with the following resources:
You can then apply these to your model namespace as:
Then apply it to the seldon namespace:
Elasticsearch
Elasticsearch is responsible for storing all requests and responses sent to the machine learning models hosted within Seldon Enterprise Platform. Requests and responses are forwarded to Elasticsearch by the Seldon request logging component, which also runs within the seldon-logs namespace.
Elasticsearch also stores the container logs of all running models and monitoring components hosted within Seldon Enterprise Platform. These are forwarded to Elasticsearch by Fluentd.\
Installing the ECK Operator
The first step to configure Elasticsearch is to add the Elasticsearch (ECK) Operator (2.5.0*) from within the Operator Hub. This operator should be installed with default options, with access to all namespaces.
Create the Elasticsearch Cluster
Navigate to the Elasticsearch (ECK) Operator operator under your installed operators. Select the Elasticsearch Cluster tab and create a new cluster in seldon-logs namespace called elasticsearch-seldon using 8.7.x version:
Add NetworkPolicy Resource
You need to add the below NetworkPolicyresource in the seldon-logs namespace to ensure that traffic can flow between the seldon-logs namespace and openshift-operators, where the ECK operator is running. For this, first create a file named networkpolicy-seldon-elastic.yaml with the following resource:
Granting Access to Elasticsearch
In order for Seldon Enterprise Platform to access the Elasticsearch cluster there are two secrets which are required to be created. One secret in the seldon-logs namespace to allow access to the cluster for the request logger component. The other in the seldon-system namespace where the Seldon Enterprise Platform pod will be installed.
Grab the Elasticsearch password and assign it to a variable for later use.
Create the secret in the seldon-logs namespace:
Create the secret in the seldon-system namespace:
Container Logs Forwarding
To enable container logs visibility in Seldon Enterprise Platform we use OpenShift Logging.
Installing OpenShift Logging Operator
Follow OpenShift documentation to install Red Hat OpenShift Logging (5.5.5*) operator.
Installing ClusterLogging Component
Navigate to the Red Hat OpenShift Logging operator under your installed operators. Select the Cluster Logging tab and create an instance containing at minimum the fluentd logs collection:
Installing ClusterLogForwarder Component
First we need to create seldon-elasticsearch secret that fluentd will use to authenticate with our instance of Elastic.
Fetch elastic password
Fetch the required ca-bundle certificates
Create the secret in the openshift-logging namespace:
Navigate to the Red Hat OpenShift Logging operator under your installed operators. Select the Cluster Log Forwarder tab and create an instance forwarding logs to our Elastic instance.
For further details refer to OpenShift documentation.
Finally, we need to add NetworkPolicy to allow traffic from openshift-logging into seldon-logs namespace. Create networkpolicy-seldoncontainerlogs.yaml with following resource:
and apply it with
OpenShift Monitoring
OpenShift provides an out of the box monitoring stack consisting of Prometheus and Thanos, alongside the Prometheus AlertManager. This stack is configured to monitor the standard OpenShift workloads, but can be extended to collect the metrics which Seldon Enterprise Platform produces. This is done through adding a PodMonitor component to any of the namespaces where Seldon models are expected to be running.
Configuring Cluster Monitoring Stack
First, check that the OpenShift cluster has the correct configuration applied in order to monitor the standard workloads it expects to. This can be done by following the relevant OpenShift documentation.
Once cluster wide monitoring has been set up, the next configuration to add is that for user defined workloads - such as Seldon Enterprise Platform. The steps are very similar to cluster monitoring configuration, and can be completed by following the user defined workload monitoring documentation available here.
By following the OpenShift documentation you should now have these two ConfigMaps created with enableUserWorkload enabled:
and that Prometheus pods in openshift-user-workload-monitoring namespace are running
Providing Seldon Enterprise Platform Access to Prometheus
Seldon Enterprise Platform requires a token in order to access Prometheus. This can be configured by following the steps documented here.
Obtain the authentication token and save it to the text file:
Finally, apply the token as a secret within the seldon-system namespace. This is the secret with which Seldon will authenticate itself against the Prometheus instance:
Adding Network Access Policies
The next step in configuring the monitoring services is to add NetworkPolicy that will allow for ingress from openshift-user-workload-monitoring namespace to any namespace containing seldon specific deployments.
Create networkpolicy-monitoring.yaml file:
Apply this policy to seldon-system namespace:
You also must apply this policy to all namespaces hosting Seldon models:
Apply it to the seldon namespace now:
Adding the Seldon Enterprise Platform PodMonitor
First, we add PodMonitor for metrics exposed directly on Seldon Enterprise Platform pod. Create deploy-podmonitor.yaml file:
And apply it with
For any of the namespaces which Seldon Enterprise Platformed models are going to run within there needs to be a couple of PodMonitor resources created within that namespace. Create file seldon-podmonitor.yaml:
and apply it to all namespaces hosting Seldon models:
Apply it to seldon namespace now
Adding the PrometheusRules for Model Usage
For any of the namespaces which Seldon Enterprise Platformed models are going to run within there needs to be a PrometheusRule resource created within that namespace.
Create file deployment-usage-rules.yaml which content you can find in the appendix at the end of this document. You must apply it to all namespaces that will host SeldonDeployment models
Apply it to seldon namespace now
Adding the PrometheusRules for Alerting
The alerting functionality can be configured through PrometheusRules resources.
Create files user-alerts.yaml, infra-alerts.yaml and drift-alerts.yaml which contents you can find in the appendix at the end of this document.
Apply them to seldon-system namespace:
Once OpenShift reconciles the relevant configuration changes you can verify in the Admin UI -> Observe -> Alerting -> Alerting rules that TestAlertNoActionRequired rule was created (you may need to disable Platform filter to find it).
Configuring Seldon Enterprise Platform as receiver of Alertmanager
The OpenShift documentation explains how to configure alert receivers. This can be done either:
using OpenShift Container Platform web console
using CLI to modify main
alertmanager-mainsecret inopenshift-monitoringnamespacecreating
AlertmanagerConfigcustom resource (alpha preview of OpenShift feature)
Use a following configuration as an example
And if you have OIDC provider configured
PostgreSQL for Model Catalogue
The Model Catalog acts as a registry for all models deployed onto the Seldon platform, where additional metadata can be added to allow for faster deployment, easier model re-use and provenance of metadata across your experimentation, deployment and monitoring tools. The Model Catalog persists this metadata within an instance of PostgreSQL.
The PostgreSQL documentation page contains extensive information how to configure connection to managed postgres solution. In the document here we will give an example using the built-in PostgreSQL application template provided by RHOCP.
Creating built-in PostgreSQL instance
To create postgres instance
Once the template is instantiated, the following Openshift/Kubernetes resources will be created to support the Model Catalog:
DeploymentConfigurationReplicationControllerPostgresqlpodServicePersistentVolumeClaim
Adding Secrets
Seldon Enterprise Platform needs to be able to authenticate to the PostgreSQL instance, and therefore a secret is created called metadata-postgres using the below command.
Argo CD
Seldon Enterprise Platform leverages GitOps to ensure an up-to-date declarative representation of model deployments. GitOps enables changes in deployments to be tracked and deployments to be rolled back to previous states- via commits to a Git repository. The Git repository stores the SeldonDeployments which describe how to create the machine learning models on the Kubernetes cluster.
Red Hat OpenShift provides a GitOps operator, which is built on top of ArgoCD and provides an easy to install and maintain component for enabling GitOps workflows. This installation will leverage the OpenShift GitOps Operator to enable Seldon’s own GitOps functionality.
Prepare Seldon Namespace for GitOps
Each namespace in which Seldon models are meant to be deployed using GitOps needs to specially prepared. Here we will provide example for namespace called seldon-gitops:
The above configures the seldon-gitops namespace to be recognized as gitops-enabled by Seldon Enterprise Platform. Assuming that we install ArgoCD instance into the seldon-argocd namespace we need to allow seldon-gitops namespace to be managed by it:
In addition, for every new namespace we need
add namespace to
ServiceMeshMemberRoll(see mesh configuration)apply
NetworkPolicy,PodMonitoringandPrometheusRules(see Serverless and Monitoring configuration sections):
Installing the OpenShift GitOps Operator
The first step to configure GitOps is to add the Red Hat OpenShift GitOps (1.7.0*) operator from within the Operator Hub. This operator should be installed with default options. Please follow OpenShift documentation here.
The OpenShift GitOps Operator automatically creates an ArgoCD instance in the openshift-gitops namespace. You can use this ArgoCD instance or create a new one as we describe in the next section.
Creating ArgoCD Instance
For purpose of this documentation we will use a new ArgoCD instance. First create a new project/namespace:
Then, create a new Argo CD instance dedicated to Seldon following the OpenShift documentation. We recommend to make following changes to the Argo CD instance using the YAML editor:
Explanation:
spec.server.route.tls.termination: this can be set to re-use the SSL certificates as we did when settingseldon-routein theistio-systemnamespacespec.server.rbac.policy: the default value there readssystem:cluster-adminswhich in certain configurations does not provide expected admin access
Your ArgoCD instance will now be available under
Configuring Git Repository (Seldon Enterprise Platform)
To configure our Git credentials in Seldon Enterprise Platform, we will follow these steps:
Create a Kubernetes secret containing our credentials, either as a SSH key or a User / Password combination. This secret can have any arbitrary name, but must live in the same namespace as Seldon Enterprise Platform.
If the private key is present under $GIT_SSH_PATH, you can create the credentials secret as:
The passphrase field can be left empty if they SSH key doesn't have a passphrase.
You can create the credentials secret using a User / Password combination (or User / Personal Access Token) as:
Make sure that Seldon Enterprise Platform's configuration point to our newly created secret. In particular, we verify the
gitopssection of the values of the Seldon Enterprise Platform Helm chart. Here, we need thegitops.argocd.enabledflag be totrue, and thegitops.git.secretfield to point to the right secret name. The Helm installation of Seldon Enterprise Platform is described in section further down in this document and the Helm values provided there already have GitOps enabled.
Configuring Git Repository (ArgoCD)
There are multiple ways in which git repository can be configured in ArgoCD. One of easiest way is to use ArgoCD UI logged in as an admin user.
Here, we provide example of configuring the repository using declarative approach assuming user/password authentication over HTTPS:
For more examples refer to ArgoCD documentation.
ArgoCD Project
There are multiple ways in which AppProject can be created: OpenShift UI, ArgoCD UI or declaratively.
Create a following AppProject:
ArgoCD Application
Kafka
Install Kafka Operator
The first step to install Kafka is to install an operator that can manage Kafka cluster.
Add the Red Hat Integration - AMQ Streams (2.2.0-4*) operator from within the Operator Hub. This operator should be installed with default options. AMQ Streams is based on Strimzi Operator and can read more about it in the Red Hat documentation here.
Add Strimzi (0.32.0*) operator. This operator should be installed with default options. This is the community Strimzi Operator.
Create Kafka Cluster
Once we have Strimzi (Strimzi provided by Strimzi or AMQ Streams provided by Red Hat) operator up and running we need to create Kafka cluster.
Create seldon-kafka namespace for our Kafka cluster
Select seldon-kafka project and navigate to your Kafka operator under your installed operators. Select the Kafka tab and create the Kafka cluster. Following is a minimal required configuration.
Seldon Core V1
Seldon Core is used to serve machine learning models over REST and gRPC endpoints, using a variety of advanced deployment strategies (canaries, shadows, A/B, multi-armed bandits).
Seldon Core (v1.16.0) is available as an operator within the Operator Hub and can therefore be readily installed onto OpenShift.
Once the operator has been installed there are a number of configuration changes required to ensure smooth interaction with the wider environment of tools. This can be achieved by editing the operator’s ClusterServiceVersion in the YAML tab of newly installed operator.
The configuration parameters to edit are the deployment environment variables:
ISTIO_ENABLEDset totrueEXECUTOR_REQUEST_LOGGER_DEFAULT_ENDPOINTset tohttp://broker-ingress.knative-eventing.svc.cluster.local/seldon-logs/default
Seldon Core v2
Seldon Core v2 can be installed using published Helm charts. To add Helm charts run:
The Seldon Core v2 installation consists of a few different components, each of these having its own corresponding Helm chart.
seldon-core-v2-crds
CRDs defining Core v2 resources
default or seldon-system
seldon-core-v2-setup
Core v2 configuration chart (operator, templates, RBAC)
seldon-system for cluster-wide installation, or each model namespace for namespaced installations, e.g. seldon
seldon-core-v2-runtime
Seldon Runtime defines core components required in each model namespace
each model namespace, e.g. seldon
seldon-core-v2-servers
Seldon Core v2 pre-configured servers to host your models (optional)
each model namespace, e.g. seldon
Installation Modes
Seldon Core v2 supports both cluster-wide and namespaced installations:
In cluster-wide mode, we recommend installing the
seldon-core-v2-setupHelm Chart into theseldon-systemnamespace. The operator will then reconcile Core v2 resources likeSeldonRuntime,Server,Model, andPipelinein all namespaces.In namespaced mode, you must install the
seldon-core-v2-setupHelm chart into each model namespace. Each operator will then reconcile Core v2 resources only in the namespace it is installed in itself.
CRDs
Install Seldon Core v2 CRDs with:
Operator
The Seldon Core v2 operator (seldon-core-v2-setup Helm chart) can be installed either in cluster-wide or namespaced mode.
Prepare the required namespaces with:
Create
components-values.yamlfile that we will use to configure the installation. The values below are meant as a starting point and should be edited where necessary:
3. Conduct Helm installation with:
Prepare the required namespaces with:
Create
components-values.yamlfile that we will use to configure the installation. The values below are meant as a starting point and should be edited where necessary:
3. Conduct Helm installation with:
Seldon Runtime
Conduct Helm installation of Seldon Runtime for Seldon Core v2 with:
Servers
In order to run models, you will need to provision a server or more. As a convenience for getting started, you can install pre-configured Seldon Core v2 Servers. To do this, we first need to create servers-values.yaml that we will use to configure the installation. Create servers-values.yaml file (below are just the default values, adjust them to your needs):
and conduct Helm installation with:
Validation
You should see a pod like the following running in the seldon-system namespace:
And also pods like the following running in the seldon namespace:
Adding new namespaces
To install Seldon Core v2 in additional namespaces, you need to do these steps:
Create the new namespace
Install the runtime and servers into new the namespace
Create the new namespace
Install the
seldon-core-v2-setupHelm chart, runtime and servers into new the namespace
Metrics Monitoring
To configure metrics collection on the Seldon Core v2 components, please create the following PodMonitor resources in the seldon namespace:
Seldon Mesh
Seldon Core v2 inference API is exposed via envoy on seldon-mesh service in seldon namespace
To expose this service via Istio you need to create a following VirtualService. Create seldon-mesh-vs.yaml file
and apply it with
Seldon Enterprise Platform
Installation
Download the
seldon-deploy-install.tarfile that contains required installation resources. For example, to download the installation resources for version2.4.0of Seldon Enterprise Platform run the following:Extract the contents of the
seldon-deploy-install.tarfile.
Seldon Enterprise Platform relies on Helm charts to perform the installation. There is a master configuration file which contains all of the relevant Helm values for the given installation, in the appendix of this document there is the recommended Helm values for the installation of Seldon Enterprise Platform on OpenShift. Save these values as values-openshift.yaml and then run the following helm command to install Seldon Enterprise Platform.
Obtaining ingress URL
Once the Seldon Enterprise Platform pods have come up, the UI can be accessed by running the following command, and entering the resultant URI into the browser.:
Appendix
Adding new namespace for Seldon Enterprise Platform
To add a new namespace for Seldon Enterprise Platform to use called
my-new-namespacedoAdd namespace to
ServiceMeshMemberRollas described in OpenShift Service Mesh section.)If the new namespace is meant to be gitops-enabled (recommended) follow steps described in Argo CD section:
add
seldon.gitops=enabledlabeladd
git-repoannotationadd
argocd.argoproj.io/managed-bylabelupdate
AppProjectwith new namespace entrycreate new
Applicationresource
If you specified explicitly namespaces in the ClusterLogForwarder config you need to add new namespace to the list.
Install Seldon Core v2 in the new namespace. Note that, when following the instructions, you will need to replace
seldonfor the new namespace name (e.g.my-new-namespace).
Validating Installation
This subsection describes basic validation steps for the Seldon installation.
Validating Ingress
Verify that Istio Gateway and Route for Seldon is created with
Validating Serverless
Verify that Broker exist and is in READY state with
Validating NetworkPolicy resources
Verify that following NetworkPolicy resources exist in seldon-system and seldon-logs namespaces:
Verify that in every namespace with your Seldon models a following NetworkPolicy resources exist:
Validating Monitoring Resources
Verify that following PodMonitor and PrometheusRules exists in seldon-system namespace:
Verify that following PodMonitor and PrometheusRules exist in every model namespace:
Validating Kafka
Verify that following pods are present in seldon-kafka namespace:
Validating Seldon Core v2
Verify that following pods are present in your model namespace, e.g. seldon:
Seldon Enterprise Platform Helm Values
This set of Helm values of Seldon Enterprise Platform is designed to work properly on the OpenShift 4.13 platform with all dependencies installed and configured as described in this document.
Please only set Namespace Authorization using labels (rbac.nsLabelsAuth.enabled: true entry) in your Helm values file if you are not going to use OPA Policy Authorization.
Prometheus Rules for Model Usage
Save file as deployment-usage-rules.yaml:
Prometheus Rules for Alerting
Save file as user-alerts.yaml:
Save file as infra-alerts.yaml:
Save file as drift-alerts.yaml:
Troubleshooting
Core v2 Pipelines
If you see an error from the producer in the Pipeline gateway complaining about not enough insync replicas then the replication factor Seldon is using is less than the cluster setting for min.insync.replicas which for a default AWS MSK cluster defaults to 2. Ensure this is equal to that of the cluster. This value can be set in the seldon-charts/seldon-core-v2-setup Helm chart with kafka.topics.replicationFactor.
Prometheus Metrics
If you do not see any metrics in Seldon Enterprise Platform first check if all NetworkPolicy, PodMonitor and kind: PrometheusRule resources are configured correctly. If you still do not see any metrics verify if JWT token given to Seldon Enterprise Platform is correct.
To verify token
Elasticsearch
To verify Elasticsearch credentials and if Seldon indices are being populated:
Get token from Elastic secret (user is "elastic")
Verify that secret matches
and
Verify credentials and indices
Last updated
Was this helpful?