# OpenShift Environment

### Introduction

This document walks through the installation procedure for Seldon Enterprise Platform v2.1.0 onto Red Hat OpenShift (RHOCP) v4.13.

{% hint style="info" %}
**Note**: These instructions have been tested on OpenShift 4.13. These instructions work on OpenShift versions 4.10 - 4.13 only.
{% endhint %}

The prerequisites for the installation are aligned with usual requirements for installation / configuration of operators on the Red Hat OpenShift platform:

* Access to the OpenShift Container Platform web console.
* An account with the `cluster-admin` role.
* Being logged in to the OpenShift Container Platform cluster as an administrator.

{% hint style="info" %}
**Note**: Whenever a (\*) symbol is present next to an operator version it indicates a version available in Operator Hub at the moment of writing this documentation. When following this guide the available or default versions of operators may be different. These versions are noted here for information purposes only and upstream OpenShift documentation should be consulted when in doubts.
{% endhint %}

### Preparation

#### Creating the Seldon Namespaces

Seldon Enterprise Platform has a number of namespaces which it expects to be present, and associated with given labels.

The first namespace to create is where the Seldon Enterprise Platform controller pod will run. This is the main orchestrator of the Seldon technology stack, and expects to run in the seldon-system namespace.

```bash
oc create namespace seldon-logs
oc create namespace seldon-system
```

Next, create a namespace within which models can be deployed. In this documentation this is going to be called seldon, however can be configured to any name of your choosing. Create the namespace, and then add the seldon.restricted label so SD has access to it.

```bash
oc create namespace seldon
oc label namespace seldon seldon.restricted=false --overwrite=true
```

### Dependencies

#### OpenShift Service Mesh

The first step within the installation process is to add [OpenShift Service Mesh](https://docs.openshift.com/container-platform/4.13/service_mesh/v2x/ossm-about.html). This is required for the networking of all other pieces within the Seldon Enterprise Platform stack, as well as the ingress/egress for model endpoints.

{% hint style="info" %}
**Note**: Istio (OpenShift Service Mesh) is an external component outside of the main Seldon stack. Therefore, it is the cluster administrator's responsibility to administrate and manage the Istio installation used by Seldon.
{% endhint %}

**Adding Operators**

The initial action taken is to add the relevant operators required for the logging stack. Log into the RHOCP console and navigate to the `OperatorHub`, within the `Operators` tab.

Search for and install the [following operators](https://docs.openshift.com/container-platform/4.13/service_mesh/v2x/installing-ossm.html), with the default options:

1. Red Hat OpenShift distributed tracing platform (provided by Red Hat) (1.39.0-3\*)
2. Kiali Operator (provided by Red Hat) (1.57.3\*)
3. Red Hat OpenShift Service Mesh (provided by Red Hat) (2.3.0\*)

**Configuring the ServiceMeshControlPlane**

Next, configure the `ServiceMeshControlPlane`. Ensuring Control Plane in Version v2.0 is installed and the control plane is created within the `istio-system` namespace, as per the [OpenShift documentation](https://docs.openshift.com/container-platform/4.13/service_mesh/v2x/ossm-create-smcp.html#ossm-deploy-cluster-wide-control-plane-cli_ossm-create-smcp).

**Add Namespaces to ServiceMeshMemberRoll**

Navigate to the Red Hat OpenShift Service Mesh operator under your installed operators. Select the Istio Service Mesh Member Roll tab and create a new `ServiceMeshMemberRoll` in `istio-system` namespace with the following namespaces added to the member roll:

* `seldon`
* `seldon-logs`
* `seldon-kafka`
* `seldon-system`

Note: this is easiest done using the YAML editor.

**Create Seldon’s Istio Gateway**

Seldon then requires an Istio gateway to allow traffic to and from the SD controller pod as well as to enable advanced routing features like canary and shadow deployments. Create a following YAML manifest called in this example `istio-gateway.yaml`:

```yaml
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: seldon-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - "*"
```

Apply it with

```bash
oc apply -f istio-gateway.yaml
```

**Create SSL Secure Route**

OpenShift clusters usually come with Let's Encrypt certificates enabled for default ingress domain. One can create `Route` with the `tls.termination: edge` in order to re-use these certificates. Create the following YAML manifest called `seldon-route.yaml`:

```yaml
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: seldon-route
  namespace: istio-system
spec:
  host: seldon.<Ingress_Domain>
  port:
    targetPort: http2
  tls:
    insecureEdgeTerminationPolicy: Redirect
    termination: edge
  to:
    kind: Service
    name: istio-ingressgateway
    weight: 100
```

To display your default ingress domain, run the following command:

```bash
$ oc get ingresses.config/cluster -o jsonpath={.spec.domain}
```

After using it in the above manifest in place of `<Ingress_Domain>` your Istio Ingress will be exposed under

```bash
INGRESS_DOMAIN=$(oc get ingresses.config/cluster -o jsonpath={.spec.domain})
echo https://seldon.$INGRESS_DOMAIN/
```

{% hint style="info" %}
**Note**: This setup terminates the SSL on edge and provides non-SSL in-cluster traffic.
{% endhint %}

Apply the above manifest with

```bash
oc apply -f seldon-route.yaml
```

#### OpenShift Serverless

Seldon Enterprise Platform uses OpenShift Serverless, in the form of Knative Serving and Knative Eventing, to power many of the advanced monitoring components associated with your deployments; namely outlier and drift detection. Without this component these features will fail to function within the platform.

{% hint style="info" %}
**Note**: Knative is an external component outside of the main Seldon stack. Therefore, it is the cluster administrator's responsibility to administrate and manage the Knative installation used by Seldon.
{% endhint %}

**Serverless Operator**

Once more, navigate to the Operator Hub and install the official `Red Hat OpenShift Serverless` (1.26.0\*) operator. Install using the default options.

**Install Knative Eventing**

Using the Serverless Operator, and as per the [OpenShift documentation](https://docs.openshift.com/serverless/1.30/install/installing-knative-eventing.html), install an instance of Knative Eventing.

**Install Knative Serving**

Using the Serverless Operator, and as per the [OpenShift documentation](https://docs.openshift.com/serverless/1.30/install/installing-knative-serving.html), install an instance of Knative Serving.

**Logs Namespace and Broker**

The Knative events which activate the outlier and drift detectors in the form of Knative served pods are reliant on events from the Seldon logging stack. The Seldon logging stack is installed into the `seldon-logs` namespace with a Knative Eventing Broker configured within it.

If you did not create this namespace yet create it now

```yaml
oc create namespace seldon-logs
```

Create the Knative Eventing Broker, `eventing-broker.yaml`

```yaml
apiVersion: eventing.knative.dev/v1
kind: Broker
metadata:
  name: default
  namespace: seldon-logs
```

and apply it with

```bash
oc apply -f eventing-broker.yaml
```

**Add NetworkPolicy Resources**

Finally, we need to add a couple of `NetworkPolicy` resources to ensure that traffic can flow from Knative Eventing and Serving to the different Seldon namespace:

**Seldon Logs Namespace**

To allow traffic from Knative Eventing into the Seldon Logs namespace we will create a `NetworkPolicy` resource. Create `networkpolicy-seldon-logs.yaml`:

```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: seldon-request-logs
  namespace: seldon-logs
spec:
  podSelector: {}
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: knative-eventing
```

and apply it with

```bash
kubectl apply -f networkpolicy-seldon-logs.yaml
```

**Model Namespaces**

For **each model namespace**, we will need to create a couple of `NetworkPolicy` resources to ensure that traffic from both Knative Eventing and Serving can go into our model namespace. For this, first create a file named `networkpolicy-detectors.yaml` with the following resources:

```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: seldon-detectors
spec:
  podSelector: {}
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: knative-eventing
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: seldon-detectors-serving
spec:
  podSelector: {}
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: knative-serving
```

You can then apply these to your model namespace as:

```bash
oc apply -f networkpolicy-detectors.yaml -n <model-namespace>
```

Then apply it to the `seldon` namespace:

```bash
oc apply -f networkpolicy-detectors.yaml -n seldon
```

#### Elasticsearch

Elasticsearch is responsible for storing all requests and responses sent to the machine learning models hosted within Seldon Enterprise Platform. Requests and responses are forwarded to Elasticsearch by the Seldon request logging component, which also runs within the `seldon-logs` namespace.

Elasticsearch also stores the container logs of all running models and monitoring components hosted within Seldon Enterprise Platform. These are forwarded to Elasticsearch by Fluentd.\\

{% hint style="info" %}
Elasticsearch is an external component outside the main Seldon stack. Therefore, it is the cluster administrator's responsibility to administrate and manage the Elasticsearch instance used by Seldon.
{% endhint %}

**Installing the ECK Operator**

The first step to configure Elasticsearch is to add the `Elasticsearch (ECK) Operator` (2.5.0\*) from within the Operator Hub. This operator should be installed with default options, with access to all namespaces.

**Create the Elasticsearch Cluster**

Navigate to the `Elasticsearch (ECK) Operator` operator under your installed operators. Select the `Elasticsearch Cluster` tab and create a new cluster in `seldon-logs` namespace called `elasticsearch-seldon` using `8.7.x` version:

```yaml
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch-seldon
  namespace: seldon-logs
spec:
  version: 7.17.18
  nodeSets:
    - name: default
      config:
        node.roles:
          - master
          - data
        node.attr.attr_name: attr_value
        node.store.allow_mmap: false
      podTemplate:
        metadata:
          labels:
            foo: bar
        spec:
          containers:
            - name: elasticsearch
              resources:
                requests:
                  memory: 4Gi
                  cpu: 1
                limits:
                  memory: 4Gi
                  cpu: 2
      count: 3
 # ...
```

{% hint style="info" %}
**Note**: Currently, Seldon guarantees compatibility with Elasticsearch 7.X. Compatibility with Elasticsearch 8.X is not guaranteed.
{% endhint %}

**Add NetworkPolicy Resource**

You need to add the below `NetworkPolicy`resource in the `seldon-logs` namespace to ensure that traffic can flow between the `seldon-logs` namespace and `openshift-operators`, where the ECK operator is running. For this, first create a file named `networkpolicy-seldon-elastic.yaml` with the following resource:

```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: seldon-elastic-cluster
  namespace: seldon-logs
spec:
  podSelector: {}
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: openshift-operators
```

```bash
oc apply -f networkpolicy-seldon-elastic.yaml -n seldon-logs
```

**Granting Access to Elasticsearch**

In order for Seldon Enterprise Platform to access the Elasticsearch cluster there are two secrets which are required to be created. One secret in the `seldon-logs` namespace to allow access to the cluster for the request logger component. The other in the `seldon-system` namespace where the Seldon Enterprise Platform pod will be installed.

Grab the Elasticsearch password and assign it to a variable for later use.

```bash
ELASTIC_PASSWORD=$(kubectl get secret elasticsearch-seldon-es-elastic-user -n seldon-logs -o go-template='{{.data.elastic | base64decode}}')
```

Create the secret in the seldon-logs namespace:

```bash
kubectl create secret generic elastic-credentials -n seldon-logs \
  --from-literal=username="elastic" \
  --from-literal=password="${ELASTIC_PASSWORD}" \
  --dry-run=client -o yaml | kubectl apply -f -
```

Create the secret in the seldon-system namespace:

```bash
kubectl create secret generic elastic-credentials -n seldon-system \
  --from-literal=username="elastic" \
  --from-literal=password="${ELASTIC_PASSWORD}" \
  --dry-run=client -o yaml | kubectl apply -f -
```

#### Container Logs Forwarding

To enable container logs visibility in Seldon Enterprise Platform we use [OpenShift Logging](https://docs.openshift.com/container-platform/4.13/logging/cluster-logging-deploying.html).

**Installing OpenShift Logging Operator**

Follow OpenShift [documentation](https://docs.openshift.com/container-platform/4.13/logging/cluster-logging-deploying.html#cluster-logging-deploy-console_cluster-logging-deploying) to install `Red Hat OpenShift Logging` (5.5.5\*) operator.

**Installing ClusterLogging Component**

{% hint style="info" %}
Note: As we will be forwarding logs to the Elastic instance `elasticsearch-seldon` in `seldon-logs` namespace you can disable the internal Elasticsearch `logStore` and Kibana `visualization` components from the `ClusterLogging` custom resource (CR). Therefore installation of the `OpenShift Elasticsearch Operator` is not required.
{% endhint %}

Navigate to the `Red Hat OpenShift Logging` operator under your installed operators. Select the `Cluster Logging` tab and create an `instance` containing at minimum the `fluentd` logs collection:

```yaml
apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
  name: instance
  namespace: openshift-logging
spec:
  collection:
    logs:
      type: fluentd
  managementState: Managed
```

**Installing ClusterLogForwarder Component**

First we need to create `seldon-elasticsearch` secret that `fluentd` will use to authenticate with our instance of Elastic.

Fetch `elastic` password

```bash
ELASTIC_PASSWORD=$(kubectl get secret elasticsearch-seldon-es-elastic-user -n seldon-logs -o go-template='{{.data.elastic | base64decode}}')
```

Fetch the required `ca-bundle` certificates

```bash
kubectl get secret -n seldon-logs elasticsearch-seldon-es-http-certs-public -o go-template='{{index .data "tls.crt" | base64decode }}' > ca-bundle.crt
```

Create the secret in the openshift-logging namespace:

```bash
kubectl create secret generic seldon-elasticsearch -n openshift-logging \
  --from-literal=username="elastic" \
  --from-literal=password="${ELASTIC_PASSWORD}" \
  --from-file=./ca-bundle.crt \
  --dry-run=client -o yaml | kubectl apply -f -
```

Navigate to the `Red Hat OpenShift Logging` operator under your installed operators. Select the `Cluster Log Forwarder` tab and create an `instance` forwarding logs to our Elastic instance.

```yaml
apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  name: instance
  namespace: openshift-logging
spec:
  inputs:
  - name: seldon-pods
    application:
      selector:
        matchLabels:
          app.kubernetes.io/managed-by: seldon-core
  outputs:
  - name: seldon-elasticsearch
    elasticsearch:
      version: 8
    secret:
      name: seldon-elasticsearch
    type: elasticsearch
    url: https://elasticsearch-seldon-es-http.seldon-logs:9200
  pipelines:
  - name: seldon-container-logs
    inputRefs:
    - seldon-pods
    outputRefs:
    - seldon-elasticsearch
```

For further details refer to OpenShift [documentation](https://docs.openshift.com/container-platform/4.13/logging/log_collection_forwarding/log-forwarding.html).

Finally, we need to add `NetworkPolicy` to allow traffic from `openshift-logging` into `seldon-logs` namespace. Create `networkpolicy-seldoncontainerlogs.yaml` with following resource:

```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: seldon-container-logs
  namespace: seldon-logs
spec:
  podSelector: {}
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: openshift-logging
```

and apply it with

```bash
kubectl apply -f networkpolicy-seldoncontainerlogs.yaml
```

#### OpenShift Monitoring

OpenShift provides an out of the box monitoring stack consisting of Prometheus and Thanos, alongside the Prometheus AlertManager. This stack is configured to monitor the standard OpenShift workloads, but can be extended to collect the metrics which Seldon Enterprise Platform produces. This is done through adding a PodMonitor component to any of the namespaces where Seldon models are expected to be running.

**Configuring Cluster Monitoring Stack**

First, check that the OpenShift cluster has the correct configuration applied in order to monitor the standard workloads it expects to. This can be done by following the relevant [OpenShift documentation](https://docs.openshift.com/container-platform/4.13/monitoring/configuring-the-monitoring-stack.html#creating-cluster-monitoring-configmap_configuring-the-monitoring-stack).

Once cluster wide monitoring has been set up, the next configuration to add is that for user defined workloads - such as Seldon Enterprise Platform. The steps are very similar to cluster monitoring configuration, and can be completed by following the user defined workload monitoring documentation available [here](https://docs.openshift.com/container-platform/4.13/monitoring/configuring-the-monitoring-stack.html#creating-user-defined-workload-monitoring-configmap_configuring-the-monitoring-stack).

By following the OpenShift documentation you should now have these two ConfigMaps created with `enableUserWorkload` enabled:

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    enableUserWorkload: true

---

apiVersion: v1
kind: ConfigMap
metadata:
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
```

and that Prometheus pods in `openshift-user-workload-monitoring` namespace are running

```bash
$ kubectl get pods -n openshift-user-workload-monitoring
NAME                                  READY   STATUS    RESTARTS   AGE
prometheus-operator-f5b597fc4-7wzql   2/2     Running   0          17m
prometheus-user-workload-0            5/5     Running   0          17m
prometheus-user-workload-1            5/5     Running   0          17m
thanos-ruler-user-workload-0          3/3     Running   0          17m
thanos-ruler-user-workload-1          3/3     Running   0          17m
```

**Providing Seldon Enterprise Platform Access to Prometheus**

Seldon Enterprise Platform requires a token in order to access Prometheus. This can be configured by following the steps documented [here](https://docs.openshift.com/container-platform/4.13/monitoring/enabling-monitoring-for-user-defined-projects.html#accessing-metrics-from-outside-cluster_enabling-monitoring-for-user-defined-projects).

Obtain the authentication token and save it to the text file:

```bash
SECRET=`oc get secret -n openshift-user-workload-monitoring | grep  prometheus-user-workload-token | head -n 1 | awk '{print $1 }'`
TOKEN=`echo $(oc get secret $SECRET -n openshift-user-workload-monitoring -o json | jq -r '.data.token') | base64 -d`
echo -n $TOKEN > jwt-seldon.txt
```

Finally, apply the token as a secret within the `seldon-system` namespace. This is the secret with which Seldon will authenticate itself against the Prometheus instance:

```bash
oc create secret generic jwt-seldon -n seldon-system --from-file=./jwt-seldon.txt -o yaml --dry-run=client | kubectl apply -f -
```

{% hint style="info" %}
**Note**: Depending on the permissions of the token used above different metrics and alerts will be available for Seldon Enterprise Platform to use and display.
{% endhint %}

**Adding Network Access Policies**

The next step in configuring the monitoring services is to add NetworkPolicy that will allow for ingress from `openshift-user-workload-monitoring` namespace to any namespace containing seldon specific deployments.

Create `networkpolicy-monitoring.yaml` file:

```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: user-workload-monitoring
spec:
  podSelector: {}
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: openshift-user-workload-monitoring
```

Apply this policy to `seldon-system` namespace:

```bash
oc apply -f networkpolicy-monitoring.yaml -n seldon-system
```

You also must apply this policy to all namespaces hosting Seldon models:

```bash
oc apply -f networkpolicy-monitoring.yaml -n <model-namespace>
```

Apply it to the `seldon` namespace now:

```bash
oc apply -f networkpolicy-monitoring.yaml -n seldon
```

**Adding the Seldon Enterprise Platform PodMonitor**

First, we add `PodMonitor` for metrics exposed directly on Seldon Enterprise Platform pod. Create `deploy-podmonitor.yaml` file:

```yaml
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: seldon-deploy-monitor
  namespace: seldon-system
  labels:
    app: seldon-deploy
spec:
  namespaceSelector:
    matchNames:
    - seldon-system
  selector:
    matchLabels:
      app.kubernetes.io/name: seldon-deploy
  podMetricsEndpoints:
  - port: metrics
    path: /metrics
```

And apply it with

```bash
oc apply -f deploy-podmonitor.yaml -n seldon-system
```

For **any** of the namespaces which Seldon Enterprise Platformed models are going to run within there needs to be a couple of `PodMonitor` resources created within that namespace. Create file `seldon-podmonitor.yaml`:

```yaml
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: seldon-podmonitor
spec:
  selector:
    matchLabels:
      app.kubernetes.io/managed-by: seldon-core
  podMetricsEndpoints:
  - port: metrics
    path: /prometheus
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: seldon-podmonitor-metrics-server
spec:
  selector:
    matchLabels:
      seldon.io/metrics: "true"
  podMetricsEndpoints:
  - path: /v1/metrics
    port: user-port
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: seldon-drift-detector
spec:
  selector:
    matchLabels:
      seldon.io/drift: 'true'
  podMetricsEndpoints:
  - path: /v1/metrics
    port: user-port
```

and apply it to all namespaces hosting Seldon models:

```bash
oc apply -f seldon-podmonitor.yaml -n <model-namespace>
```

Apply it to `seldon` namespace now

```bash
oc apply -f seldon-podmonitor.yaml -n seldon
```

**Adding the PrometheusRules for Model Usage**

For **any** of the namespaces which Seldon Enterprise Platformed models are going to run within there needs to be a `PrometheusRule` resource created within that namespace.

Create file `deployment-usage-rules.yaml` which content you can find in the appendix at the end of this document. You must apply it to all namespaces that will host SeldonDeployment models

```bash
oc apply -f deployment-usage-rules.yaml -n <model-namespace>
```

Apply it to `seldon` namespace now

```bash
oc apply -f deployment-usage-rules.yaml -n seldon
```

**Adding the PrometheusRules for Alerting**

The alerting functionality can be configured through `PrometheusRules` resources.

Create files `user-alerts.yaml`, `infra-alerts.yaml` and `drift-alerts.yaml` which contents you can find in the appendix at the end of this document.

Apply them to `seldon-system` namespace:

```bash
oc apply -n seldon-system -f user-alerts.yaml
oc apply -n seldon-system -f infra-alerts.yaml
oc apply -n seldon-system -f drift-alerts.yaml
```

Once OpenShift reconciles the relevant configuration changes you can verify in the Admin UI -> `Observe` -> `Alerting` -> `Alerting rules` that `TestAlertNoActionRequired` rule was created (you may need to disable `Platform` filter to find it).

**Configuring Seldon Enterprise Platform as receiver of Alertmanager**

The OpenShift [documentation](https://docs.openshift.com/container-platform/4.13/monitoring/managing-alerts.html#sending-notifications-to-external-systems_managing-alerts) explains how to configure alert receivers. This can be done either:

* using OpenShift Container Platform web console
* using CLI to modify main `alertmanager-main` secret in `openshift-monitoring` namespace
* creating `AlertmanagerConfig` custom resource (alpha preview of OpenShift feature)

Use a following configuration as an example

```yaml
receivers:
- name: default-receiver
- name: deploy-webhook
  webhook_configs:
  - url: "http://seldon-deploy.seldon-system:80/seldon-deploy/api/v1alpha1/webhooks/firing-alert"
route:
  group_wait: 10s
  group_by: ['alertname']
  group_interval: 5m
  receiver: default-receiver
  repeat_interval: 3h
  routes:
  - receiver: deploy-webhook
    matchers:
    - severity =~ "warning|critical"
    - type =~ "user|infra"
```

And if you have OIDC provider configured

```yaml
webhook_configs:
- url: "http://seldon-deploy.seldon-system:80/seldon-deploy/api/v1alpha1/webhooks/firing-alert"
  http_config:
    oauth2:
      client_id: "{{ keycloak_api_clientid }}"
      client_secret: "{{ keycloak_api_secret }}"
      scopes: [openid]
      token_url: "{{ external_protocol }}://{{ external_address }}/auth/realms/deploy-realm/protocol/openid-connect/token"
```

#### PostgreSQL for Model Catalogue

The Model Catalog acts as a registry for all models deployed onto the Seldon platform, where additional metadata can be added to allow for faster deployment, easier model re-use and provenance of metadata across your experimentation, deployment and monitoring tools. The Model Catalog persists this metadata within an instance of PostgreSQL.

{% hint style="info" %}
**Note**: PostgreSQL is an external component outside of the main Seldon stack. Therefore, it is the cluster administrator's responsibility to administrate and manage the PostgreSQL instance used by Seldon.
{% endhint %}

The [PostgreSQL documentation](/seldon-enterprise-platform/production-environment/postgresql/managed-postgresql.md) page contains extensive information how to configure connection to managed postgres solution. In the document here we will give an example using the built-in PostgreSQL application template provided by RHOCP.

**Creating built-in PostgreSQL instance**

{% hint style="info" %}
**Note**: These instructions will help you to quickly spin up a PostgreSQL instance. However, we don't recommend using it in a production context, and should be treated as development-only.
{% endhint %}

To create postgres instance

```bash
oc new-app https://raw.githubusercontent.com/openshift/library/master/official/postgresql/templates/postgresql-persistent.json -n seldon-system \
    -p POSTGRESQL_USER=postgresql \
    -p POSTGRESQL_PASSWORD=postgresql \
    -p POSTGRESQL_DATABASE=metadata \
    -p DATABASE_SERVICE_NAME=metadata
```

Once the `template` is instantiated, the following Openshift/Kubernetes resources will be created to support the Model Catalog:

* `DeploymentConfiguration`
* `ReplicationController`
* `Postgresql` pod
* `Service`
* `PersistentVolumeClaim`

**Adding Secrets**

Seldon Enterprise Platform needs to be able to authenticate to the PostgreSQL instance, and therefore a secret is created called `metadata-postgres` using the below command.

```bash
oc create secret generic -n seldon-system metadata-postgres \
  --from-literal=user=postgresql \
  --from-literal=password=postgresql \
  --from-literal=host=metadata.seldon-system.svc.cluster.local \
  --from-literal=port=5432 \
  --from-literal=dbname=metadata \
  --from-literal=sslmode=prefer
```

#### Argo CD

Seldon Enterprise Platform leverages GitOps to ensure an up-to-date declarative representation of model deployments. GitOps enables changes in deployments to be tracked and deployments to be rolled back to previous states- via commits to a Git repository. The Git repository stores the SeldonDeployments which describe how to create the machine learning models on the Kubernetes cluster.

Red Hat OpenShift provides a GitOps operator, which is built on top of ArgoCD and provides an easy to install and maintain component for enabling GitOps workflows. This installation will leverage the OpenShift GitOps Operator to enable Seldon’s own GitOps functionality.

**Prepare Seldon Namespace for GitOps**

Each namespace in which Seldon models are meant to be deployed using GitOps needs to specially prepared. Here we will provide example for namespace called `seldon-gitops`:

```bash
oc create ns seldon-gitops

oc label ns seldon-gitops seldon.restricted=false --overwrite=true
oc label ns seldon-gitops seldon.gitops=enabled --overwrite=true
oc annotate ns seldon-gitops git-repo="https://github.com/<your-organization>/<private-repo>" --overwrite=true
```

The above configures the `seldon-gitops` namespace to be recognized as gitops-enabled by Seldon Enterprise Platform. Assuming that we install ArgoCD instance into the `seldon-argocd` namespace we need to allow `seldon-gitops` namespace to be managed by it:

```bash
oc label namespace seldon-gitops argocd.argoproj.io/managed-by=seldon-argocd
```

In addition, for every new namespace we need

* add namespace to `ServiceMeshMemberRoll` (see [mesh configuration](#openshift-service-mesh))
* apply `NetworkPolicy`, `PodMonitoring` and `PrometheusRules` (see [Serverless](#openshift-service-mesh) and [Monitoring](#openshift-monitoring) configuration sections):

```bash
  oc apply -f networkpolicy-monitoring.yaml -n seldon-gitops
  oc apply -f networkpolicy-detectors.yaml -n seldon-gitops
  oc apply -f seldon-podmonitor.yaml -n seldon-gitops
  oc apply -f deployment-usage-rules.yaml -n seldon-gitops
```

**Installing the OpenShift GitOps Operator**

The first step to configure GitOps is to add the `Red Hat OpenShift GitOps` (1.7.0\*) operator from within the Operator Hub. This operator should be installed with default options. Please follow OpenShift documentation [here](https://docs.openshift.com/gitops/1.9/installing_gitops/installing-openshift-gitops.html).

The OpenShift GitOps Operator automatically creates an ArgoCD instance in the `openshift-gitops` namespace. You can use this ArgoCD instance or create a new one as we describe in the next section.

**Creating ArgoCD Instance**

For purpose of this documentation we will use a new ArgoCD instance. First create a new project/namespace:

```bash
oc create namespace seldon-argocd
```

Then, create a new Argo CD instance dedicated to Seldon following the OpenShift [documentation](https://docs.openshift.com/gitops/1.9/argocd_instance/setting-up-argocd-instance.html). We recommend to make following changes to the Argo CD instance using the YAML editor:

```yaml
apiVersion: argoproj.io/v1alpha1
kind: ArgoCD
metadata:
  name: seldon-argocd        # Change here
  namespace: seldon-argocd   # Change here
spec:
  server:
    route:
      enabled: true
      tls:                      # Change here
        termination: reencrypt  # Change here
  dex:
    openShiftOAuth: true
  rbac:
    policy: g, cluster-admins, role:admin  # Change here
    scopes: '[groups]'
  # ...
```

Explanation:

* `spec.server.route.tls.termination`: this can be set to re-use the SSL certificates as we did when setting `seldon-route` in the `istio-system` namespace
* `spec.server.rbac.policy`: the default value there reads `system:cluster-admins` which in certain configurations does not provide expected admin access

{% hint style="info" %}
**Note**: the above definition enables [Dex OpenShift OAuth Connector](https://docs.openshift.com/gitops/1.9/accesscontrol_usermanagement/configuring-sso-on-argo-cd-using-dex.html#gitops-creating-a-new-client-in-dex_configuring-sso-for-argo-cd-using-dex) that allows you to log into ArgoCD using OpenShift OAuth. OpenShift admin user (belonging to `cluster-admins`) group will have `admin` privileges in ArgoCD UI.
{% endhint %}

Your ArgoCD instance will now be available under

```bash
INGRESS_DOMAIN=$(oc get ingresses.config/cluster -o jsonpath={.spec.domain})
echo https://seldon-argocd-server-seldon-argocd.$INGRESS_DOMAIN/
```

**Configuring Git Repository (Seldon Enterprise Platform)**

To configure our Git credentials in Seldon Enterprise Platform, we will follow these steps:

1. Create a Kubernetes secret containing our credentials, either as a SSH key or a User / Password combination. This secret can have any arbitrary name, but must live in the same namespace as Seldon Enterprise Platform.

{% tabs %}
{% tab title="SSH" %}
If the private key is present under `$GIT_SSH_PATH`, you can create the credentials secret as:

```
kubectl create secret generic git-creds -n seldon-system \
  --from-file=id_rsa=${GIT_SSH_PATH} \
  --from-file=known_hosts=${GIT_KNOWN_HOSTS_PATH} \
  --from-literal=passphrase="${GIT_SSHKEY_PASSPHRASE}" \
  --from-literal=username="${GIT_USER}" \
  --from-literal=email="${GIT_EMAIL}"  \
  --dry-run=client -o yaml | kubectl apply -f -
```

The `passphrase` field can be left empty if they SSH key doesn't have a passphrase.
{% endtab %}

{% tab title="User / Password" %}
You can create the credentials secret using a User / Password combination (or User / Personal Access Token) as:

```
kubectl create secret generic git-creds -n seldon-system \
  --from-literal=username="${GIT_USER}" \
  --from-literal=token="${GIT_TOKEN}" \
  --from-literal=email="${GIT_EMAIL}"  \
  --dry-run=client -o yaml | kubectl apply -f -
```

{% endtab %}
{% endtabs %}

2. Make sure that Seldon Enterprise Platform's configuration point to our newly created secret. In particular, we verify the `gitops` section of the values of the Seldon Enterprise Platform Helm chart. Here, we need the `gitops.argocd.enabled` flag be to `true`, and the `gitops.git.secret` field to point to the right secret name. The Helm installation of Seldon Enterprise Platform is described in [section](#seldon-enterprise-platform) further down in this document and the Helm values provided there already have GitOps enabled.

```yaml
gitops:
  git:
    secret: git-creds
  argocd:
    enabled: true
```

**Configuring Git Repository (ArgoCD)**

There are multiple ways in which git repository can be configured in ArgoCD. One of easiest way is to use ArgoCD UI logged in as an admin user.

Here, we provide example of configuring the repository using declarative approach assuming user/password authentication over HTTPS:

```yaml
apiVersion: v1
kind: Secret
metadata:
  name: seldon-gitops-repository
  namespace: seldon-argocd
  labels:
    argocd.argoproj.io/secret-type: repository
stringData:
  type: git
  url: https://github.com/<your-organization>/<private-repo>
  password: my-password
  username: my-username
```

For more examples refer to ArgoCD [documentation](https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/#repositories).

**ArgoCD Project**

There are multiple ways in which `AppProject` can be created: OpenShift UI, ArgoCD UI or declaratively.

Create a following `AppProject`:

```yaml
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: seldon
  namespace: seldon-argocd
spec:
  description: Seldon Enterprise Platform Project
  sourceRepos:
  - https://github.com/<your-organization>/<private-repo>
  destinations:
  - namespace: seldon-gitops
    server: https://kubernetes.default.svc
  clusterResourceWhitelist:
  - group: '*'
    kind: '*'
  roles:
  - name: seldon-admin
    policies:
    - p, proj:seldon:seldon-admin, applications, get, seldon/*, allow
    - p, proj:seldon:seldon-admin, applications, create, seldon/*, allow
    - p, proj:seldon:seldon-admin, applications, update, seldon/*, allow
    - p, proj:seldon:seldon-admin, applications, delete, seldon/*, allow
    - p, proj:seldon:seldon-admin, applications, sync, seldon/*, allow
```

**ArgoCD Application**

```yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: seldon-gitops-seldon-gitops
  namespace: seldon-argocd
spec:
  project: seldon
  destination:
    namespace: seldon-gitops
    server: https://kubernetes.default.svc
  source:
    directory:
      recurse: true
    path: seldon-gitops
    repoURL: https://github.com/<your-organization>/<private-repo>
  syncPolicy:
    automated: {}
```

{% hint style="info" %}
**Note**: If your ArgoCD application does not follow the `seldon-gitops-${namespace}` naming convention, you can label the namespace accordingly:

```bash
kubectl label namespace $namespace argocdapp=${ARGO_APP_NAME} --overwrite=true
```

{% endhint %}

#### Kafka

{% hint style="info" %}
**Note**: Kafka is an external component outside of the main Seldon stack. Therefore, it is the cluster administrator's responsibility to administrate and manage the Kafka installation used by Seldon.
{% endhint %}

**Install Kafka Operator**

The first step to install Kafka is to install an operator that can manage Kafka cluster.

{% tabs %}
{% tab title="AMQ Streams" %}
Add the `Red Hat Integration - AMQ Streams` (2.2.0-4\*) operator from within the Operator Hub. This operator should be installed with default options. AMQ Streams is based on [Strimzi Operator](https://github.com/strimzi/strimzi-kafka-operator) and can read more about it in the Red Hat documentation [here](https://access.redhat.com/documentation/en-us/red_hat_amq_streams/2.2).
{% endtab %}

{% tab title="Strimzi Kafka" %}
Add `Strimzi` (0.32.0\*) operator. This operator should be installed with default options. This is the community [Strimzi Operator](https://github.com/strimzi/strimzi-kafka-operator).
{% endtab %}
{% endtabs %}

**Create Kafka Cluster**

Once we have Strimzi (Strimzi provided by Strimzi or AMQ Streams provided by Red Hat) operator up and running we need to create Kafka cluster.

Create `seldon-kafka` namespace for our Kafka cluster

```bash
oc create namespace seldon-kafka
```

{% hint style="info" %}
Make sure that `seldon-kafka` namespace is added to `Service Mesh Member Roll` as described in [OpenShift Service Mesh section](#openshift-service-mesh).
{% endhint %}

Select `seldon-kafka` project and navigate to your Kafka operator under your installed operators. Select the `Kafka` tab and create the Kafka cluster. Following is a minimal required configuration.

```yaml
kind: Kafka
apiVersion: kafka.strimzi.io/v1beta2
metadata:
  name: seldon               # Change here
  namespace: seldon-kafka   # Change here
spec:
  kafka:
    version: 3.4.0
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      auto.create.topics.enable: true             # Change here
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 1                      # Change here
      inter.broker.protocol.version: '3.4'
    storage:
      type: ephemeral
  zookeeper:
    replicas: 3
    storage:
      type: ephemeral
  entityOperator:
    topicOperator: {}
    userOperator: {}
```

#### Seldon Core V1

Seldon Core is used to serve machine learning models over REST and gRPC endpoints, using a variety of advanced deployment strategies (canaries, shadows, A/B, multi-armed bandits).

Seldon Core (v1.16.0) is available as an operator within the Operator Hub and can therefore be readily installed onto OpenShift.

Once the operator has been installed there are a number of configuration changes required to ensure smooth interaction with the wider environment of tools. This can be achieved by editing the operator’s `ClusterServiceVersion` in the YAML tab of newly installed operator.

The configuration parameters to edit are the deployment environment variables:

* `ISTIO_ENABLED` set to `true`
* `EXECUTOR_REQUEST_LOGGER_DEFAULT_ENDPOINT` set to `http://broker-ingress.knative-eventing.svc.cluster.local/seldon-logs/default`

#### Seldon Core v2

Seldon Core v2 can be installed using published Helm charts. To add Helm charts run:

```bash
helm repo add seldon-charts https://seldonio.github.io/helm-charts
helm repo update seldon-charts
```

The Seldon Core v2 installation consists of a few different components, each of these having its own corresponding Helm chart.

| Helm Chart             | Description                                                             | Recommended Namespace                                                                                              |   |
| ---------------------- | ----------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | - |
| seldon-core-v2-crds    | CRDs defining Core v2 resources                                         | `default` or `seldon-system`                                                                                       |   |
| seldon-core-v2-setup   | Core v2 configuration chart (operator, templates, RBAC)                 | `seldon-system` for cluster-wide installation, or each model namespace for namespaced installations, e.g. `seldon` |   |
| seldon-core-v2-runtime | Seldon Runtime defines core components required in each model namespace | each model namespace, e.g. `seldon`                                                                                |   |
| seldon-core-v2-servers | Seldon Core v2 pre-configured servers to host your models (optional)    | each model namespace, e.g. `seldon`                                                                                |   |

**Installation Modes**

Seldon Core v2 supports both cluster-wide and namespaced installations:

* In cluster-wide mode, we recommend installing the `seldon-core-v2-setup` Helm Chart into the `seldon-system` namespace. The operator will then reconcile Core v2 resources like `SeldonRuntime`, `Server`, `Model`, and `Pipeline` in **all** namespaces.
* In namespaced mode, you must install the `seldon-core-v2-setup` Helm chart into each model namespace. Each operator will then reconcile Core v2 resources only in the namespace it is installed in itself.

{% hint style="info" %}
Cluster-wide installation of Seldon Core v2 is only available from version 2.6.0 onwards. Installation of the Seldon Core operator into the `seldon-system` namespace (i.e. the same namespace as Core v1) is only available from version 2.7.0.

For a namespaced installation, we will use `seldon` as an exemplary namespace throughout this page to install Core v2. You'd have to repeat these steps for each namespace you want to use Core v2 in. In case of a cluster-wide installation, the `seldon` namespace is an exemplary namespace for only the Seldon Runtime and Servers.
{% endhint %}

**CRDs**

Install Seldon Core v2 CRDs with:

```bash
helm upgrade seldon-core-v2-crds seldon-charts/seldon-core-v2-crds \
    --version 2.8.5 \
    --namespace default \
    --install
```

**Operator**

The Seldon Core v2 operator (`seldon-core-v2-setup` Helm chart) can be installed either in cluster-wide or namespaced mode.

{% tabs %}
{% tab title="Cluster-wide Installation" %}

1. Prepare the required namespaces with:

```
kubectl create ns seldon || echo "Namespace seldon already exists"
kubectl create ns seldon-system || echo "Namespace seldon-system already exists"
```

2. Create `components-values.yaml` file that we will use to configure the installation. The values below are meant as a starting point and should be edited where necessary:

   ```
   controller:
     clusterwide: true
     securityContext:
       runAsNonRoot: true
       runAsUser: null
       runAsGroup: null
       fsGroup: null

   envoy:
     service:
       type: ClusterIP
     securityContext:
       runAsNonRoot: true
       runAsUser: null
       runAsGroup: null
       fsGroup: null

   kafka:
     bootstrap: "seldon-kafka-bootstrap.seldon-kafka.svc:9092"
     topics:
       replicationFactor: 3
       numPartitions: 4

   opentelemetry:
     enable: false

   scheduler:
     service:
       type: ClusterIP
     securityContext:
       runAsNonRoot: true
       runAsUser: null
       runAsGroup: null
       fsGroup: null

   dataflow:
     securityContext:
       runAsNonRoot: true
       runAsUser: null
       runAsGroup: null
       fsGroup: null

   modelgateway:
     securityContext:
       runAsNonRoot: true
       runAsUser: null
       runAsGroup: null
       fsGroup: null

   hodometer:
     securityContext:
       runAsNonRoot: true
       runAsUser: null
       runAsGroup: null
       fsGroup: null

   pipelinegateway:
     securityContext:
       runAsNonRoot: true
       runAsUser: null
       runAsGroup: null
       fsGroup: null

   serverConfig:
     securityContext:
       runAsNonRoot: true
       runAsUser: null
       runAsGroup: null
       fsGroup: null


   serviceGRPCPrefix: "http2-"                                                                                                          

   ```

\
3\. Conduct Helm installation with:

```
helm upgrade seldon-core-v2-components seldon-charts/seldon-core-v2-setup \
    --version 2.8.5 \
    -f components-values.yaml \
    --namespace seldon-system \
    --install
```

{% endtab %}

{% tab title="Namespaced Installation" %}

1. Prepare the required namespaces with:

```
kubectl create ns seldon || echo "Namespace seldon already exists"
```

2. Create `components-values.yaml` file that we will use to configure the installation. The values below are meant as a starting point and should be edited where necessary:

   ```
   controller:
     clusterwide: false
     securityContext:
       runAsNonRoot: true
       runAsUser: null
       runAsGroup: null
       fsGroup: null
       
   envoy:
     service:
       type: ClusterIP
     securityContext:
       runAsNonRoot: true
       runAsUser: null
       runAsGroup: null
       fsGroup: null

   kafka:
     bootstrap: "seldon-kafka-bootstrap.seldon-kafka.svc:9092"
     topics:
       replicationFactor: 3
       numPartitions: 4

   opentelemetry:
     enable: false

   scheduler:
     service:
       type: ClusterIP
     securityContext:
       runAsNonRoot: true
       runAsUser: null
       runAsGroup: null
       fsGroup: null

   dataflow:
     securityContext:
       runAsNonRoot: true
       runAsUser: null
       runAsGroup: null
       fsGroup: null

   modelgateway:
     securityContext:
       runAsNonRoot: true
       runAsUser: null
       runAsGroup: null
       fsGroup: null

   hodometer:
     securityContext:
       runAsNonRoot: true
       runAsUser: null
       runAsGroup: null
       fsGroup: null

   pipelinegateway:
     securityContext:
       runAsNonRoot: true
       runAsUser: null
       runAsGroup: null
       fsGroup: null

   serverConfig:
     securityContext:
       runAsNonRoot: true
       runAsUser: null
       runAsGroup: null
       fsGroup: null


   serviceGRPCPrefix: "http2-"                                                                                                      

   ```

\
3\. Conduct Helm installation with:

```
helm upgrade seldon-core-v2-components seldon-charts/seldon-core-v2-setup \
    --version 2.8.5 \
    -f components-values.yaml \
    --namespace seldon \
    --install
```

{% endtab %}
{% endtabs %}

{% hint style="info" %}
Visit our [Kafka Integration page](/seldon-enterprise-platform/production-environment/kafka.md) for more information on configuring an integration with managed Kafka solutions.
{% endhint %}

**Seldon Runtime**

Conduct Helm installation of Seldon Runtime for Seldon Core v2 with:

```bash
helm upgrade seldon-core-v2-runtime seldon-charts/seldon-core-v2-runtime \
    --version 2.8.5 \
    --namespace seldon \
    --install
```

**Servers**

In order to run models, you will need to provision a server or more. As a convenience for getting started, you can install pre-configured Seldon Core v2 Servers. To do this, we first need to create `servers-values.yaml` that we will use to configure the installation. Create `servers-values.yaml` file (below are just the default values, adjust them to your needs):

```yaml
mlserver:
  replicas: 1

triton:
  replicas: 1
```

and conduct Helm installation with:

```bash
helm upgrade seldon-core-v2-servers seldon-charts/seldon-core-v2-servers \
    --version 2.8.5 \
    -f servers-values.yaml \
    --namespace seldon \
    --install
```

**Validation**

You should see a pod like the following running in the `seldon-system` namespace:

```bash
seldon-v2-controller-manager-7db857ccc-s9xxn   1/1     Running   0          5h25m
```

And also pods like the following running in the `seldon` namespace:

```bash
mlserver-0                                   3/3     Running   0             7h49m
seldon-dataflow-engine-54bc74bd87-rhgs9      1/1     Running   1             7h49m
seldon-envoy-75b44947bd-q9hxm                1/1     Running   0             7h49m
seldon-hodometer-6d9dbf689c-lg8mw            1/1     Running   0             7h49m
seldon-modelgateway-7b9ddfc644-q2knl         1/1     Running   0             7h49m
seldon-pipelinegateway-7f6f4ffd6-fzhpk       1/1     Running   0             7h49m
seldon-scheduler-0                           1/1     Running   0             7h49m
triton-0                                     3/3     Running   0             7h49m
```

{% hint style="info" %}
In case of namespaced installation, the `seldon-v2-controller-manager` pod (with a hash suffix) will be found in `seldon` namespace.
{% endhint %}

**Adding new namespaces**

{% hint style="info" %}
Do not need to install the CRDs again as these are global resources.
{% endhint %}

To install Seldon Core v2 in additional namespaces, you need to do these steps:

{% tabs %}
{% tab title="Cluster-wide Installation" %}

1. Create the new namespace
2. Install the runtime and servers into new the namespace
   {% endtab %}

{% tab title="Namespaced Installation" %}

1. Create the new namespace
2. Install the `seldon-core-v2-setup` Helm chart, runtime and servers into new the namespace
   {% endtab %}
   {% endtabs %}

**Metrics Monitoring**

To configure metrics collection on the Seldon Core v2 components, please create the following `PodMonitor` resources in the `seldon` namespace:

```bash
PODMONITOR_RESOURCE_LOCATION=https://raw.githubusercontent.com/SeldonIO/seldon-core/v2.8.5/prometheus/monitors

kubectl apply -n seldon -f ${PODMONITOR_RESOURCE_LOCATION}/agent-podmonitor.yaml
kubectl apply -n seldon -f ${PODMONITOR_RESOURCE_LOCATION}/envoy-servicemonitor.yaml
kubectl apply -n seldon -f ${PODMONITOR_RESOURCE_LOCATION}/pipelinegateway-podmonitor.yaml
kubectl apply -n seldon -f ${PODMONITOR_RESOURCE_LOCATION}/server-podmonitor.yaml
```

**Seldon Mesh**

Seldon Core v2 inference API is exposed via envoy on `seldon-mesh` service in `seldon` namespace

```bash
NAME          TYPE           CLUSTER-IP   EXTERNAL-IP     PORT(S)                       AGE
seldon-mesh   LoadBalancer   10.24.3.84   <none>          80:31979/TCP,9003:31803/TCP   15h
```

To expose this service via Istio you need to create a following VirtualService. Create `seldon-mesh-vs.yaml` file

```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: seldon-mesh
  namespace: seldon
spec:
  gateways:
  - istio-system/seldon-gateway
  hosts:
  - '*'
  http:
  - match:
    - headers:
        namespace:
          exact: seldon
        inference:
          exact: seldon-mesh
    name: data-plane-seldon
    route:
    - destination:
        host: seldon-mesh.seldon.svc.cluster.local
        port:
          number: 80
```

and apply it with

```bash
kubectl apply -f seldon-mesh-vs.yaml
```

{% hint style="info" %}
**Note**:

To send HTTP requests to Seldon Mesh you will need to set two headers: `-H "namespace:<namespace>" -H "inference:seldon-mesh"` to reach Seldon Mesh in given namespace. For example to reach `iris` pipeline in `seldon` namespace:

<pre class="language-bash"><code class="lang-bash">INGRESS_DOMAIN=$(oc get ingresses.config/cluster -o jsonpath={.spec.domain})

<strong>curl -k https://seldon.$INGRESS_DOMAIN/v2/pipelines/iris/infer -H "Content-Type: application/json" -H "namespace:seldon" -H "inference:seldon-mesh" -d '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'
</strong></code></pre>

{% endhint %}

#### Seldon Enterprise Platform

**Installation**

1. Download the `seldon-deploy-install.tar` file that contains required installation resources. For example, to download the installation resources for version `2.4.0` of Seldon Enterprise Platform run the following:

   ```
   TAG=2.4.0 && \
    docker create --name=tmp-sd-container seldonio/seldon-deploy-server:2.4.0 && \
    docker cp tmp-sd-container:/seldon-deploy-dist/seldon-deploy-install.tar.gz . && \
    docker rm -v tmp-sd-container
   ```
2. Extract the contents of the `seldon-deploy-install.tar` file.

   ```
   tar -xzf seldon-deploy-install.tar.gz
   ```

Seldon Enterprise Platform relies on Helm charts to perform the installation. There is a master configuration file which contains all of the relevant Helm values for the given installation, in the appendix of this document there is the recommended Helm values for the installation of Seldon Enterprise Platform on OpenShift. Save these values as `values-openshift.yaml` and then run the following `helm` command to install Seldon Enterprise Platform.

```bash
helm upgrade seldon-deploy ./seldon-deploy-install/helm-charts/seldon-deploy/ \
    -f values-openshift.yaml \
    --namespace=seldon-system \
    --install
```

**Obtaining ingress URL**

Once the Seldon Enterprise Platform pods have come up, the UI can be accessed by running the following command, and entering the resultant URI into the browser.:

```bash
INGRESS_DOMAIN=$(oc get ingresses.config/cluster -o jsonpath={.spec.domain})
echo https://seldon.$INGRESS_DOMAIN/seldon-deploy/
```

#### Appendix

**Adding new namespace for Seldon Enterprise Platform**

1. To add a new namespace for Seldon Enterprise Platform to use called `my-new-namespace` do

   ```
   oc create namespace my-new-namespace
   oc label namespace my-new-namespace seldon.restricted=false --overwrite=true

   oc apply -f networkpolicy-monitoring.yaml -n my-new-namespace
   oc apply -f networkpolicy-detectors.yaml -n my-new-namespace
   oc apply -f seldon-podmonitor.yaml -n my-new-namespace
   oc apply -f deployment-usage-rules.yaml -n my-new-namespace
   ```
2. Add namespace to `ServiceMeshMemberRoll` as described in [OpenShift Service Mesh section](#openshift-service-mesh).)
3. If the new namespace is meant to be gitops-enabled (recommended) follow steps described in [Argo CD](#argo-cd) section:
   * add `seldon.gitops=enabled` label
   * add `git-repo` annotation
   * add `argocd.argoproj.io/managed-by` label
   * update `AppProject` with new namespace entry
   * create new `Application` resource
4. If you specified explicitly namespaces in the [ClusterLogForwarder](#installing-clusterlogforwarder-component) config you need to add new namespace to the list.
5. Install [Seldon Core v2 in the new namespace](#seldon-core-v2). Note that, when following the instructions, you will need to replace `seldon` for the new namespace name (e.g. `my-new-namespace`).

**Validating Installation**

This subsection describes basic validation steps for the Seldon installation.

**Validating Ingress**

Verify that Istio `Gateway` and `Route` for Seldon is created with

```bash
$ oc get gateway -n istio-system seldon-gateway
NAME             AGE
seldon-gateway   7d20h

$ oc get route -n istio-system seldon-route
NAME           HOST/PORT                PATH   SERVICES               PORT    TERMINATION     WILDCARD
seldon-route   seldon.$INGRESS_DOMAIN          istio-ingressgateway   http2   edge/Redirect   None
```

**Validating Serverless**

Verify that `Broker` exist and is in `READY` state with

```bash
$ oc get broker -n seldon-logs
NAME      URL                                                                            AGE     READY   REASON
default   http://broker-ingress.knative-eventing.svc.cluster.local/seldon-logs/default   7d20h   True
```

**Validating NetworkPolicy resources**

Verify that following `NetworkPolicy` resources exist in `seldon-system` and `seldon-logs` namespaces:

```bash
$ oc get networkpolicy -n seldon-system
NAME                       POD-SELECTOR                   AGE
user-workload-monitoring   <none>                         7d19h

$ oc get networkpolicy -n seldon-logs
NAME                       POD-SELECTOR                   AGE
seldon-container-logs      <none>                         7d19h
seldon-request-logs        <none>                         7d20h
seldon-elastic-cluster     <none>                         7d20h
```

Verify that in every namespace with your Seldon models a following `NetworkPolicy` resources exist:

```bash
$ oc get networkpolicy -n <model namespace>
NAME                       POD-SELECTOR                   AGE
seldon-detectors           <none>                         7d20h
seldon-detectors-serving   <none>                         7d20h
user-workload-monitoring   <none>                         7d19h
```

**Validating Monitoring Resources**

Verify that following `PodMonitor` and `PrometheusRules` exists in `seldon-system` namespace:

```bash
$ oc get podmonitor,prometheusrules -n seldon-system
NAME                                                     AGE
podmonitor.monitoring.coreos.com/seldon-deploy-monitor   6d

NAME                                                       AGE
prometheusrule.monitoring.coreos.com/deploy-infra-alerts   2d21h
prometheusrule.monitoring.coreos.com/deploy-user-alerts    2d21h
prometheusrule.monitoring.coreos.com/seldon-drift-alerts   2d21h
```

Verify that following `PodMonitor` and `PrometheusRules` exist in every model namespace:

```bash
$ oc get podmonitor,prometheusrules -n <model namespace>
NAME                                                                AGE
podmonitor.monitoring.coreos.com/seldon-podmonitor                  7d20h
podmonitor.monitoring.coreos.com/seldon-drift-detector              7d20h
podmonitor.monitoring.coreos.com/seldon-podmonitor-metrics-server   7d20h

NAME                                                            AGE
prometheusrule.monitoring.coreos.com/seldon-deployment-usage-rules   7d20h
```

**Validating Kafka**

Verify that following pods are present in `seldon-kafka` namespace:

```bash
NAME                                      READY   STATUS    RESTARTS   AGE
seldon-entity-operator-5f5cc6f7ff-db9gb   3/3     Running   0          10m
seldon-kafka-0                            1/1     Running   0          11m
seldon-kafka-1                            1/1     Running   0          11m
seldon-kafka-2                            1/1     Running   0          11m
seldon-zookeeper-0                        1/1     Running   0          11m
seldon-zookeeper-1                        1/1     Running   0          11m
seldon-zookeeper-2                        1/1     Running   0          11m
```

**Validating Seldon Core v2**

Verify that following pods are present in your model namespace, e.g. `seldon`:

```bash
$ oc get pods -n seldon
NAME                                         READY   STATUS    RESTARTS   AGE
mlserver-0                                   3/3     Running   0             12h
seldon-controller-manager-5697d9f8bc-qwrmd   1/1     Running   0             12h
seldon-dataflow-engine-54bc74bd87-rhgs9      1/1     Running   0             12h
seldon-envoy-75b44947bd-q9hxm                1/1     Running   0             12h
seldon-hodometer-6d9dbf689c-lg8mw            1/1     Running   0             12h
seldon-modelgateway-7b9ddfc644-q2knl         1/1     Running   0             12h
seldon-pipelinegateway-7f6f4ffd6-fzhpk       1/1     Running   0             12h
seldon-scheduler-0                           1/1     Running   0             12h
triton-0                                     3/3     Running   0             12h
```

**Seldon Enterprise Platform Helm Values**

This set of Helm values of Seldon Enterprise Platform is designed to work properly on the OpenShift 4.13 platform with all dependencies installed and configured as described in this document.

{% hint style="warning" %}
Please only set Namespace Authorization using labels (`rbac.nsLabelsAuth.enabled: true` entry) in your Helm values file if you are not going to use OPA Policy Authorization.
{% endhint %}

{% hint style="info" %}
**Note**:

* Only set Namespace Authorization using labels (`rbac.nsLabelsAuth.enabled: true` entry) in your Helm values file if you are not going to use OPA Policy Authorization.
* Contact your Seldon account manager or sales representative to access the image.
  {% endhint %}

```yaml
image:
  image: seldonio/<PRIVATE>:2.4.0

applicationLogs:
  elasticIndexPattern: "app-write"
  elasticNamespaceField: "kubernetes.namespace_name"
  elasticContainerNameField: "kubernetes.container_name"
  elasticPodNameField: "kubernetes.pod_name"

defaultUserID: ''

enableAppAuth: false
enableAppAnalytics: false

external:
  protocol: http

env:
  USERID_CLAIM_KEY: name
  ALERTMANAGER_URL: https://alertmanager-main.openshift-monitoring:9094/api/v1/alerts

gitops:
  argocd:
    enabled: true
    namespace: seldon-argocd

metadata:
 pg:
   enabled: true
   secret: "metadata-postgres"

prometheus:
  knative:
    url: http://prometheus-system-np.knative-monitoring.svc.cluster.local:8080/api/v1/
  seldon:
    # see https://github.com/openshift/cluster-monitoring-operator/issues/768
    namespaceMetricName: namespace
    serviceMetricName: exported_service

    url: https://thanos-querier.openshift-monitoring.svc:9091/api/v1/
    resourceMetricsUrl: https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/

    jwtSecretKey: jwt-seldon.txt
    jwtSecretName: jwt-seldon

elasticsearch:
  url: https://elasticsearch-seldon-es-http.seldon-logs:9200
  basicAuth: true
  secret:
    name: "elastic-credentials"
    userKey: "username"
    passwordKey: "password"

rbac:
  nsLabelsAuth:
    enabled: true

requestLogger:
  create: true
  elasticsearch:
    host: elasticsearch-seldon-es-http.seldon-logs
  env:
     MAX_PAYLOAD_BYTES: "300000" # 300KB, increase if needed
  kafka_consumer:
    enabled: true
    bootstrap_servers: "seldon-kafka-bootstrap.seldon-kafka.svc.cluster.local:9092"

seldon:
  enabled: true
  curlForm: |
    DOMAIN=$(oc get route -n istio-system seldon-route -o jsonpath={.spec.host})<br>
    curl -k -H "{{ .TokenHeader }}: {{ .Token }} " -H "Content-Type: application/json" https://$DOMAIN/seldon/{{ .Namespace }}/{{ .ModelName }}/api/v0.1/predictions -d '{{ .Payload }}'
  tensorFlowCurlForm: |
    DOMAIN=$(oc get route -n istio-system seldon-route -o jsonpath={.spec.host})<br>
    curl -k -H "{{ .TokenHeader }}: {{ .Token }} " -H "Content-Type: application/json" https://$DOMAIN/seldon/{{ .Namespace }}/{{ .ModelName }}/v1/models/:predict -d '{{ .Payload }}'
  kfservingV2CurlForm: |
    DOMAIN=$(oc get route -n istio-system seldon-route -o jsonpath={.spec.host})<br>
    curl -k -H "{{ .TokenHeader }}: {{ .Token }} " -H "Content-Type: application/json" https://$DOMAIN/seldon/{{ .Namespace }}/{{ .ModelName }}/v2/models/{{ .GraphModelName }}/infer -d '{{ .Payload }}'

seldonCoreV2:
  enabled: true
```

**Prometheus Rules for Model Usage**

Save file as `deployment-usage-rules.yaml`:

```yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: seldon-deployment-usage-rules
  labels:
    prometheus: k8s
    role: record-rules
spec:
  groups:
    - name: deployment-usage.rules
      interval: 3m
      rules:
        - record: deployment_container_count
          expr: sum by (namespace, name) (kube_pod_container_info * on (namespace, pod) group_left(name) label_replace(kube_pod_labels{label_app_kubernetes_io_managed_by="seldon-core"}, "name", "$1", "label_seldon_deployment_id", "(.+)"))
          labels:
            type: "SeldonDeployment"
        - record: deployment_memory_usage_bytes
          expr: label_replace(sum by (namespace, label_seldon_deployment_id) (container_memory_usage_bytes{container=""} * on (namespace, pod) group_left(label_seldon_deployment_id) kube_pod_labels{label_app_kubernetes_io_managed_by="seldon-core"}), "name", "$1", "label_seldon_deployment_id", "(.+)")
          labels:
            type: "SeldonDeployment"
        - record: deployment_cpu_usage_seconds_total
          expr: label_replace(sum by (namespace, label_seldon_deployment_id) (rate(container_cpu_usage_seconds_total{container=""}[2m]) * on (namespace, pod) group_left(label_seldon_deployment_id) kube_pod_labels{label_app_kubernetes_io_managed_by="seldon-core"}), "name", "$1", "label_seldon_deployment_id", "(.+)")
          labels:
            type: "SeldonDeployment"
        - record: deployment_cpu_requests
          expr: sum by (namespace, name) (kube_pod_container_resource_requests{resource="cpu", unit="core"} * on (namespace, pod) group_left(name) label_replace(kube_pod_labels{label_app_kubernetes_io_managed_by="seldon-core"}, "name", "$1", "label_seldon_deployment_id", "(.+)"))
          labels:
            type: "SeldonDeployment"
        - record: deployment_cpu_limits
          expr: sum by (namespace, name) (kube_pod_container_resource_limits{resource="cpu", unit="core"} * on (namespace, pod) group_left(name) label_replace(kube_pod_labels{label_app_kubernetes_io_managed_by="seldon-core"}, "name", "$1", "label_seldon_deployment_id", "(.+)"))
          labels:
            type: "SeldonDeployment"
        - record: deployment_memory_requests_bytes
          expr: sum by (namespace, name) (kube_pod_container_resource_requests{resource="memory", unit="byte"} * on (namespace, pod) group_left(name) label_replace(kube_pod_labels{label_app_kubernetes_io_managed_by="seldon-core"}, "name", "$1", "label_seldon_deployment_id", "(.+)"))
          labels:
            type: "SeldonDeployment"
        - record: deployment_memory_limits_bytes
          expr: sum by (namespace, name) (kube_pod_container_resource_limits{resource="memory", unit="byte"} * on (namespace, pod) group_left(name) label_replace(kube_pod_labels{label_app_kubernetes_io_managed_by="seldon-core"}, "name", "$1", "label_seldon_deployment_id", "(.+)"))
          labels:
            type: "SeldonDeployment"
```

**Prometheus Rules for Alerting**

Save file as `user-alerts.yaml`:

```yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: k8s
    role: alert-rules
  name: deploy-user-alerts
spec:
  groups:
  - name: deploy-user-alerts.rules
    rules:
    - alert: ModelHighErrorRate
      # This matches a regex for 5XX errors, and calculates the percentage over 5 minutes.
      # It requires a percentage higher than 30 for at least 100 requests to fire.
      expr: (sum(rate(seldon_api_executor_client_requests_seconds_count{code=~"5[0-9]{2}"}[5m])) by (seldon_deployment_id)
        /
        sum(rate(seldon_api_executor_client_requests_seconds_count[5m])) by (seldon_deployment_id) * 100.0) > 30
        and
        sum(increase(seldon_api_executor_client_requests_seconds_count[5m])) by (seldon_deployment_id) > 100
      for: 1m
      annotations:
        title: 'High error rate on deployed model.'
        description: 'Model {{ $labels.seldon_deployment_id }} has an internal error rate of greater than 30% for more than 100 requests total.'
      labels:
        severity: 'critical'
        type: 'user'
    - alert: TestAlertNoActionRequired
      expr: increase(deploy_alerting_trigger_test_alert[3m]) > 1
      for: 1m
      annotations:
        title: 'Test alert, safe to ignore.'
        description: 'This is a test alert, used to verify the alerting system is working correctly - it will resolve itself and no action is required.'
      labels:
        severity: 'warning'
        type: 'user'
```

Save file as `infra-alerts.yaml`:

```yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: k8s
    role: alert-rules
  name: deploy-infra-alerts
spec:
  groups:
  - name: deploy-infra-alerts.rules
    rules:
    - alert: MetasyncerNotSyncing
      expr: sum(increase(deploy_metasyncer_processed_sync_total[11m])) < 1
      for: 11m
      # Sync window is 10 minutes, we measure increase over 11m and expect failure condition for 11m to prevent edge cases. Can't measure over 20m for 1m as it would alert at initial startup.
      annotations:
        title: 'The runtime metasyncer has not synced for two cycles.'
        description: 'Deployment information from Kubernetes and model metadata information from Postgres has not been synchronised for more than 20 minutes. Functionality depending on this, like project-based authorisation and the metadata API, might be affected.'
      labels:
        severity: 'critical'
        type: 'infra'
    - alert: DeployIsDown
      expr: absent(up{container="seldon-deploy"})
      for: 1m
      annotations:
        title: 'Enterprise Platform server is down.'
        description: 'Enterprise Platform is not running, check the pods in Kubernetes for status.'
      labels:
        severity: 'critical'
        type: 'infra'
    - alert: DeployApiHighErrorRate
      # This matches a regex for 5XX errors, and calculates the percentage over 5 minutes.
      # It requires a percentage higher than 30 for at least 100 requests to fire.
      expr: (sum(rate(http_request_duration_seconds_count{code=~"5[0-9]{2}"}[5m])) by (handler)
        /
        sum(rate(http_request_duration_seconds_count[5m])) by (handler) * 100.0) > 30
        and
        sum(increase(http_request_duration_seconds_count[5m])) by (handler) > 100
      for: 1m
      annotations:
        title: 'High error rate on Enterprise Platform api.'
        description: 'The Enterprise Platform api for handler {{ $labels.handler }} has an internal error rate of greater than 30% for more than 100 requests total.'
      labels:
        severity: 'critical'
        type: 'infra'
    - alert: DeployMetadataMigrationsFailed
      expr: increase(deploy_metadata_sql_migrations_total{status="failure"}[5m]) > 1
      for: 5m
      annotations:
        title: 'Enterprise Platform Metadata SQL Migrations Failed'
        description: 'Enterprise Platform has failed to perform SQL migrations on the metadata database {{ $value }} times in the last 5 minutes.'
      labels:
        severity: 'warning'
        type: 'infra'
    - alert: DeployOPADynamicPolicyUpdateFailed
      expr: increase(deploy_opa_policies_updates_total{status="failure"}[5m]) > 0
      for: 5m
      annotations:
        title: 'Enterprise Platform OPA Dynamic Policy Update Failed'
        description: 'Enterprise Platform has failed updating the OPA policies from the dynamic policy config {{ $value }} times in the last 5 minutes.'
      labels:
        severity: 'warning'
        type: 'infra'
```

Save file as `drift-alerts.yaml`:

```yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: seldon-drift-alerts
spec:
  groups:
  - name: seldon-dd-alerts.rules
    rules:
    - alert: DriftDetectedV1
      expr: increase(seldon_metric_drift_counter_total[1m]) > 1
      annotations:
        title: 'Drift is occurring in {{ $labels.deployment_name }} deployment.'
        description: 'Drift is happening within deployment {{ $labels.deployment_name }} in the {{ $labels.seldon_deployment_namespace }} namespace (Seldon Core v1).'
      labels:
        severity: 'warning'
        type: 'user'
    - alert: DriftDetectedV2
      expr: increase(seldon_model_drift_count[1m]) > 1
      annotations:
        title: 'Drift is occurring in {{ $labels.model_name }} detector.'
        description: 'Drift is happening within detector {{ $labels.model_name }} in the {{ $labels.namespace }} namespace (Seldon Core v2).'
      labels:
        severity: 'warning'
        type: 'user'
```

#### Troubleshooting

**Core v2 Pipelines**

If you see an error from the producer in the Pipeline gateway complaining about not enough insync replicas then the replication factor Seldon is using is less than the cluster setting for `min.insync.replicas` which for a default AWS MSK cluster defaults to 2. Ensure this is equal to that of the cluster. This value can be set in the `seldon-charts/seldon-core-v2-setup` Helm chart with `kafka.topics.replicationFactor`.

**Prometheus Metrics**

If you do not see any metrics in Seldon Enterprise Platform first check if all `NetworkPolicy`, `PodMonitor` and kind: `PrometheusRule` resources are configured correctly. If you still do not see any metrics verify if JWT token given to Seldon Enterprise Platform is correct.

To verify token

```bash
oc run -it --rm ubuntu --image=ubuntu:latest --restart=Never -n seldon-system -- bash
apt update && apt install curl jq

$ token=...
$ curl -s -H "Authorization: Bearer $token" -k "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/targets" | jq .status
"success"
$ curl -s -H "Authorization: Bearer $token" -k "https://alertmanager-main.openshift-monitoring:9094/api/v1/alerts" | jq .status
"success"
```

**Elasticsearch**

To verify Elasticsearch credentials and if Seldon indices are being populated:

```bash
oc port-forward -n seldon-logs svc/elasticsearch-seldon-es-http 9200
```

Get token from Elastic secret (user is "elastic")

```bash
oc get secret elasticsearch-seldon-es-elastic-user -n seldon-logs -o go-template='{{.data.elastic | base64decode}}'
```

Verify that secret matches

```bash
oc get secret -n seldon-logs elastic-credentials -o json | jq '.data | map_values(@base64d)'
```

and

```
oc get secret -n seldon-logs elastic-credentials -o json | jq '.data | map_values(@base64d)'
```

Verify credentials and indices

```bash
curl -k -u elastic:$token https://localhost:9200
curl -k -u elastic:$token https://localhost:9200/_cat/indices
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.seldon.ai/seldon-enterprise-platform/openshift-environment.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
