1 of 6

Advanced Configurations

Helm Configuration

Seldon Core 2 provides a highly configurable deployment framework that allows you to fine-tune various components using Helm configuration options. These options offer control over deployment behavior, resource management, logging, autoscaling, and model lifecycle policies to optimize the performance and scalability of machine learning deployments.

This section details the key Helm configuration parameters for Envoy, Autoscaling, Server Prestop, and Model Control Plane, ensuring that you can customize deployment workflows and enhance operational efficiency.

Envoy: Manage pre-stop behaviors and configure access logging to track request-level interactions.
Autoscaling (Experimental): Fine-tune dynamic scaling policies for efficient resource allocation based on real-time inference workloads.
Servers: Define grace periods for controlled shutdowns and optimize model control plane parameters for efficient model loading, unloading, and error handling.
Logging: Define log levels for the different components of the system.

Envoy

Prestop

Key

Chart

Description

Default

envoy.preStopSleepPeriodSeconds

components

Sleep after calling prestop command.

envoy.terminationGracePeriodSeconds

components

Grace period to wait for prestop to finish for Envoy pods.

120

Access Log

Key

Chart

Description

Default

envoy.enableAccesslog

components

Whether to enable logging of requests.

true

envoy.accesslogPath

components

Path on disk to store logfile. This is only used when enableAccesslog is set.

/tmp/envoy-accesslog.txt

envoy.includeSuccessfulRequests

components

Whether to including successful requests. If set to false, then only failed requests are logged. This is only used when enableAccesslog is set.

false

Autoscaling

Native autoscaling (experimental)

Key

Chart

Description

Default

autoscaling.autoscalingModelEnabled

components

Enable native autoscaling for Models. This is orthogonal to external autoscaling services e.g. HPA.

false

autoscaling.autoscalingServerEnabled

components

Enable native autoscaling for Models. This is orthogonal to external autoscaling services e.g. HPA.

true

agent.scalingStatsPeriodSeconds

components

Sampling rate for metrics used for autoscaling.

agent.modelInferenceLagThreshold

components

Queue lag threshold to trigger scaling up of a model replica.

agent.modelInactiveSecondsThreshold

components

Period with no activity after which to trigger scaling down of a model replica.

600

autoscaling.serverPackingEnabled

components

Whether packing of models onto fewer servers is enabled.

false

autoscaling.serverPackingPercentage

components

Percentage of events where packing is allowed. Higher values represent more aggressive packing. This is only used when serverPackingEnabled is set. Range is from 0.0 to 1.0

0.0

Server

Prestop

Key

Chart

Description

Default

serverConfig.terminationGracePeriodSeconds

components

Grace period to wait for prestop process to finish for this particular Server pod.

120

Model Control Plane

Key

Chart

Description

Default

agent.overcommitPercentage

components

Overcommit percentage (of memory) allowed. Range is from 0 to 100

agent.maxLoadElapsedTimeMinutes

components

Max time allowed for one model load command for a model on a particular server replica to take. Lower values allow errors to be exposed faster.

120

agent.maxLoadRetryCount

components

Max number of retries for unsuccessful load command for a model on a particular server replica. Lower values allow control plane commands to fail faster.

agent.maxUnloadElapsedTimeMinutes

components

Max time allowed for one model unload command for a model on a particular server replica to take. Lower values allow errors to be exposed faster.

agent.maxUnloadRetryCount

components

Max number of retries for unsuccessful unload command for a model on a particular server replica. Lower values allow control plane commands to fail faster.

agent.unloadGracePeriodSeconds

components

A period guarding against race conditions between Envoy actually applying the cluster change to remove a route and before proceeding with the model replica unloading command.

Logging

Component Log Level

Key

Chart

Description

Default

logging.logLevel

components

Components wide settings for logging level, if individual component levels are not set. Options are: debug, info, error.

info

controller.logLevel

components

check zap log level here

dataflow.logLevel

components

check klogging level here

dataflow.logLevelKafka

components

check klogging level here

scheduler.logLevel

components

check logrus log level here

modelgateway.logLevel

components

check logrus log level here

pipelinegateway.logLevel

components

check logrus log level here

hodometer.logLevel

components

check logrus log level here

serverConfig.rclone.logLevel

components

check rclone log-level here

serverConfig.agent.logLevel

components

check logrus log level here

Notes:

We set kafka client library log level from the log level that is passed to the component, which could be different to the level expected by librdkafka (syslog level). In this case we attempt to map the log level value to the best match.

Server Config

Note: This section is for advanced usage where you want to define new types of inference servers.

Server configurations define how to create an inference server. By default one is provided for Seldon MLServer and one for NVIDIA Triton Inference Server. Both these servers support the V2 inference protocol which is a requirement for all inference servers. They define how the Kubernetes ReplicaSet is defined which includes the Seldon Agent reverse proxy as well as an Rclone server for downloading artifacts for the server. The Kustomize ServerConfig for MlServer is shown below:

Server Runtime

The SeldonRuntime resource is used to create an instance of Seldon installed in a particular namespace.

type SeldonRuntimeSpec struct {
	SeldonConfig string              `json:"seldonConfig"`
	Overrides    []*OverrideSpec     `json:"overrides,omitempty"`
	Config       SeldonConfiguration `json:"config,omitempty"`
	// +Optional
	// If set then when the referenced SeldonConfig changes we will NOT update the SeldonRuntime immediately.
	// Explicit changes to the SeldonRuntime itself will force a reconcile though
	DisableAutoUpdate bool `json:"disableAutoUpdate,omitempty"`
}

type OverrideSpec struct {
	Name        string         `json:"name"`
	Disable     bool           `json:"disable,omitempty"`
	Replicas    *int32         `json:"replicas,omitempty"`
	ServiceType v1.ServiceType `json:"serviceType,omitempty"`
	PodSpec     *PodSpec       `json:"podSpec,omitempty"`
}

For the definition of SeldonConfiguration above see the SeldonConfig resource.

The specification above contains overrides for the chosen SeldonConfig. To override the PodSpec for a given component, the overrides field needs to specify the component name and the PodSpec needs to specify the container name, along with fields to override.

For instance, the following overrides the resource limits for cpu and memory in the hodometer component in the seldon-mesh namespace, while using values specified in the seldonConfig elsewhere (e.g. default).

apiVersion: mlops.seldon.io/v1alpha1
kind: SeldonRuntime
metadata:
  name: seldon
  namespace: seldon-mesh
spec:
  overrides:
  - name: hodometer
    podSpec:
      containers:
      - name: hodometer
        resources:
          limits:
            memory: 64Mi
            cpu: 20m
  seldonConfig: default

As a minimal use you should just define the SeldonConfig to use as a base for this install, for example to install in the seldon-mesh namespace with the SeldonConfig named default:

apiVersion: mlops.seldon.io/v1alpha1
kind: SeldonRuntime
metadata:
  name: seldon
  namespace: seldon-mesh
spec:
  seldonConfig: default

The helm chart seldon-core-v2-runtime allows easy creation of this resource and associated default Servers for an installation of Seldon in a particular namespace.

SeldonConfig Update Propagation

When a SeldonConfig resource changes any SeldonRuntime resources that reference the changed SeldonConfig will also be updated immediately. If this behaviour is not desired you can set spec.disableAutoUpdate in the SeldonRuntime resource for it not be be updated immediately but only when it changes or any owned resource changes.

Seldon Config

Note: This section is for advanced usage where you want to define how seldon is installed in each namespace.

The SeldonConfig resource defines the core installation components installed by Seldon. If you wish to install Seldon, you can use the SeldonRuntime resource which allows easy overriding of some parts defined in this specification. In general, we advise core DevOps to use the default SeldonConfig or customize it for their usage. Individual installation of Seldon can then use the SeldonRuntime with a few overrides for special customisation needed in that namespace.

The specification contains core PodSpecs for each core component and a section for general configuration including the ConfigMaps that are created for the Agent (rclone defaults), Kafka and Tracing (open telemetry).

type SeldonConfigSpec struct {
	Components []*ComponentDefn    `json:"components,omitempty"`
	Config     SeldonConfiguration `json:"config,omitempty"`
}

type SeldonConfiguration struct {
	TracingConfig TracingConfig      `json:"tracingConfig,omitempty"`
	KafkaConfig   KafkaConfig        `json:"kafkaConfig,omitempty"`
	AgentConfig   AgentConfiguration `json:"agentConfig,omitempty"`
	ServiceConfig ServiceConfig      `json:"serviceConfig,omitempty"`
}

type ServiceConfig struct {
	GrpcServicePrefix string         `json:"grpcServicePrefix,omitempty"`
	ServiceType       v1.ServiceType `json:"serviceType,omitempty"`
}

type KafkaConfig struct {
	BootstrapServers      string                        `json:"bootstrap.servers,omitempty"`
	ConsumerGroupIdPrefix string                        `json:"consumerGroupIdPrefix,omitempty"`
	Debug                 string                        `json:"debug,omitempty"`
	Consumer              map[string]intstr.IntOrString `json:"consumer,omitempty"`
	Producer              map[string]intstr.IntOrString `json:"producer,omitempty"`
	Streams               map[string]intstr.IntOrString `json:"streams,omitempty"`
	TopicPrefix           string                        `json:"topicPrefix,omitempty"`
}

type AgentConfiguration struct {
	Rclone RcloneConfiguration `json:"rclone,omitempty" yaml:"rclone,omitempty"`
}

type RcloneConfiguration struct {
	ConfigSecrets []string `json:"config_secrets,omitempty" yaml:"config_secrets,omitempty"`
	Config        []string `json:"config,omitempty" yaml:"config,omitempty"`
}

type TracingConfig struct {
	Disable              bool   `json:"disable,omitempty"`
	OtelExporterEndpoint string `json:"otelExporterEndpoint,omitempty"`
	OtelExporterProtocol string `json:"otelExporterProtocol,omitempty"`
	Ratio                string `json:"ratio,omitempty"`
}

type ComponentDefn struct {
	// +kubebuilder:validation:Required

	Name                 string                  `json:"name"`
	Labels               map[string]string       `json:"labels,omitempty"`
	Annotations          map[string]string       `json:"annotations,omitempty"`
	Replicas             *int32                  `json:"replicas,omitempty"`
	PodSpec              *v1.PodSpec             `json:"podSpec,omitempty"`
	VolumeClaimTemplates []PersistentVolumeClaim `json:"volumeClaimTemplates,omitempty"`
}

Some of these values can be overridden on a per namespace basis via the SeldonRuntime resource. Labels and annotations can also be set at the component level - these will be merged with the labels and annotations from the SeldonConfig resource in which they are defined and added to the component's corresponding Deployment, or StatefulSet.

The default configuration is shown below.

https://github.com/SeldonIO/seldon-core/blob/v2/operator/config/seldonconfigs/default.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: SeldonConfig
metadata:
  name: default
spec:
  components:
  - name: seldon-dataflow-engine
    replicas: 1
    podSpec:
      containers:
      - env:
        - name: SELDON_UPSTREAM_HOST
          value: seldon-scheduler
        - name: SELDON_UPSTREAM_PORT
          value: "9008"
        - name: OTEL_JAVAAGENT_ENABLED
          valueFrom:
            configMapKeyRef:
              key: OTEL_JAVAAGENT_ENABLED
              name: seldon-tracing
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          valueFrom:
            configMapKeyRef:
              key: OTEL_EXPORTER_OTLP_ENDPOINT
              name: seldon-tracing
        - name: OTEL_EXPORTER_OTLP_PROTOCOL
          valueFrom:
            configMapKeyRef:
              key: OTEL_EXPORTER_OTLP_PROTOCOL
              name: seldon-tracing
        - name: SELDON_POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        image: seldonio/seldon-dataflow-engine:latest
        imagePullPolicy: Always
        name: dataflow-engine
        resources:
          limits:
            memory: 1G
          requests:
            cpu: 100m
            memory: 1G
      serviceAccountName: seldon-scheduler
      terminationGracePeriodSeconds: 5
  - name: seldon-envoy
    replicas: 1
    annotations:
        "prometheus.io/path": "/stats/prometheus"
        "prometheus.io/port": "9003"
        "prometheus.io/scrape": "true"
    podSpec:
      containers:
      - image: seldonio/seldon-envoy:latest
        imagePullPolicy: Always
        name: envoy
        ports:
        - containerPort: 9000
          name: http
        - containerPort: 9003
          name: envoy-stats
        resources:
          limits:
            memory: 128Mi
          requests:
            cpu: 100m
            memory: 128Mi
        readinessProbe:
          httpGet:
            path: /ready
            port: envoy-stats
          initialDelaySeconds: 10
          periodSeconds: 5
          failureThreshold: 3
      terminationGracePeriodSeconds: 5
  - name: hodometer
    replicas: 1
    podSpec:
      containers:
      - env:
        - name: PUBLISH_URL
          value: http://hodometer.seldon.io
        - name: SCHEDULER_HOST
          value: seldon-scheduler
        - name: SCHEDULER_PLAINTXT_PORT
          value: "9004"
        - name: SCHEDULER_TLS_PORT
          value: "9044"
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        image: seldonio/seldon-hodometer:latest
        imagePullPolicy: Always
        name: hodometer
        resources:
          limits:
            memory: 32Mi
          requests:
            cpu: 1m
            memory: 32Mi
      serviceAccountName: hodometer
      terminationGracePeriodSeconds: 5
  - name: seldon-modelgateway
    replicas: 1
    podSpec:
      containers:
      - args:
        - --scheduler-host=seldon-scheduler
        - --scheduler-plaintxt-port=$(SELDON_SCHEDULER_PLAINTXT_PORT)
        - --scheduler-tls-port=$(SELDON_SCHEDULER_TLS_PORT)
        - --envoy-host=seldon-mesh
        - --envoy-port=80
        - --kafka-config-path=/mnt/kafka/kafka.json
        - --tracing-config-path=/mnt/tracing/tracing.json
        - --log-level=$(LOG_LEVEL)
        command:
        - /bin/modelgateway
        env:
        - name: SELDON_SCHEDULER_PLAINTXT_PORT
          value: "9004"
        - name: SELDON_SCHEDULER_TLS_PORT
          value: "9044"
        - name: MODELGATEWAY_MAX_NUM_CONSUMERS
          value: "100"
        - name: LOG_LEVEL
          value: "warn"
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        image: seldonio/seldon-modelgateway:latest
        imagePullPolicy: Always
        name: modelgateway
        resources:
          limits:
            memory: 1G
          requests:
            cpu: 100m
            memory: 1G
        volumeMounts:
        - mountPath: /mnt/kafka
          name: kafka-config-volume
        - mountPath: /mnt/tracing
          name: tracing-config-volume
      serviceAccountName: seldon-scheduler
      terminationGracePeriodSeconds: 5
      volumes:
      - configMap:
          name: seldon-kafka
        name: kafka-config-volume
      - configMap:
          name: seldon-tracing
        name: tracing-config-volume
  - name: seldon-pipelinegateway
    replicas: 1
    podSpec:
      containers:
      - args:
        - --http-port=9010
        - --grpc-port=9011
        - --metrics-port=9006
        - --scheduler-host=seldon-scheduler
        - --scheduler-plaintxt-port=$(SELDON_SCHEDULER_PLAINTXT_PORT)
        - --scheduler-tls-port=$(SELDON_SCHEDULER_TLS_PORT)
        - --envoy-host=seldon-mesh
        - --envoy-port=80
        - --kafka-config-path=/mnt/kafka/kafka.json
        - --tracing-config-path=/mnt/tracing/tracing.json
        - --log-level=$(LOG_LEVEL)
        command:
        - /bin/pipelinegateway
        env:
        - name: SELDON_SCHEDULER_PLAINTXT_PORT
          value: "9004"
        - name: SELDON_SCHEDULER_TLS_PORT
          value: "9044"
        - name: LOG_LEVEL
          value: "warn"
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        image: seldonio/seldon-pipelinegateway
        imagePullPolicy: Always
        name: pipelinegateway
        ports:
        - containerPort: 9010
          name: http
          protocol: TCP
        - containerPort: 9011
          name: grpc
          protocol: TCP
        - containerPort: 9006
          name: metrics
          protocol: TCP
        resources:
          limits:
            memory: 1G
          requests:
            cpu: 100m
            memory: 1G
        volumeMounts:
        - mountPath: /mnt/kafka
          name: kafka-config-volume
        - mountPath: /mnt/tracing
          name: tracing-config-volume
      serviceAccountName: seldon-scheduler
      terminationGracePeriodSeconds: 5
      volumes:
      - configMap:
          name: seldon-kafka
        name: kafka-config-volume
      - configMap:
          name: seldon-tracing
        name: tracing-config-volume
  - name: seldon-scheduler
    replicas: 1
    podSpec:
      containers:
      - args:
        - --pipeline-gateway-host=seldon-pipelinegateway
        - --tracing-config-path=/mnt/tracing/tracing.json
        - --db-path=/mnt/scheduler/db
        - --allow-plaintxt=$(ALLOW_PLAINTXT)
        - --kafka-config-path=/mnt/kafka/kafka.json
        - --scheduler-ready-timeout-seconds=$(SCHEDULER_READY_TIMEOUT_SECONDS)
        - --server-packing-enabled=$(SERVER_PACKING_ENABLED)
        - --server-packing-percentage=$(SERVER_PACKING_PERCENTAGE)
        - --envoy-accesslog-path=$(ENVOY_ACCESSLOG_PATH)
        - --enable-envoy-accesslog=$(ENABLE_ENVOY_ACCESSLOG)
        - --include-successful-requests-envoy-accesslog=$(INCLUDE_SUCCESSFUL_REQUESTS_ENVOY_ACCESSLOG)
        - --enable-model-autoscaling=$(ENABLE_MODEL_AUTOSCALING)
        - --enable-server-autoscaling=$(ENABLE_SERVER_AUTOSCALING)
        - --log-level=$(LOG_LEVEL)
        command:
        - /bin/scheduler
        env:
        - name: ALLOW_PLAINTXT
          value: "true"
        - name: SCHEDULER_READY_TIMEOUT_SECONDS
          value: 600
        - name: SERVER_PACKING_ENABLED
          value: "false"
        - name: SERVER_PACKING_PERCENTAGE
          value: "0.0"
        - name: ENVOY_ACCESSLOG_PATH
          value: /tmp/envoy-accesslog.txt
        - name: ENABLE_ENVOY_ACCESSLOG
          value: "true"
        - name: INCLUDE_SUCCESSFUL_REQUESTS_ENVOY_ACCESSLOG
          value: "false"
        - name: ENABLE_MODEL_AUTOSCALING
          value: "false"
        - name: ENABLE_SERVER_AUTOSCALING
          value: "true"
        - name: LOG_LEVEL
          value: "warn"
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        image: seldonio/seldon-scheduler:latest
        imagePullPolicy: Always
        name: scheduler
        ports:
        - containerPort: 9002
          name: xds
        - containerPort: 9004
          name: scheduler
        - containerPort: 9044
          name: scheduler-mtls
        - containerPort: 9005
          name: agent
        - containerPort: 9055
          name: agent-mtls
        - containerPort: 9008
          name: dataflow
        resources:
          limits:
            memory: 1G
          requests:
            cpu: 100m
            memory: 1G
        volumeMounts:
        - mountPath: /mnt/kafka
          name: kafka-config-volume
        - mountPath: /mnt/tracing
          name: tracing-config-volume
        - mountPath: /mnt/scheduler
          name: scheduler-state
      serviceAccountName: seldon-scheduler
      terminationGracePeriodSeconds: 5
      volumes:
      - configMap:
          name: seldon-kafka
        name: kafka-config-volume
      - configMap:
          name: seldon-tracing
        name: tracing-config-volume
    volumeClaimTemplates:
    - name: scheduler-state
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1G

Pipeline Config

Pipelines allow one to connect flows of inference data transformed by Model components. A directed acyclic graph (DAG) of steps can be defined to join Models together. Each Model will need to be capable of receiving a V2 inference request and respond with a V2 inference response. An example Pipeline is shown below:

The steps list shows three models: tfsimple1, tfsimple2 and tfsimple3. These three models each take two tensors called INPUT0 and INPUT1 of integers. The models produce two outputs OUTPUT0 (the sum of the inputs) and OUTPUT1 (subtraction of the second input from the first).

tfsimple1 and tfsimple2 take as inputs the input to the Pipeline: the default assumption when no explicit inputs are defined. tfsimple3 takes one V2 tensor input from each of the outputs of tfsimple1 and tfsimple2. As the outputs of tfsimple1 and tfsimple2 have tensors named OUTPUT0 and OUTPUT1 their names need to be changed to respect the expected input tensors and this is done with a tensorMap component providing this tensor renaming. This is only required if your models can not be directly chained together.

The output of the Pipeline is the output from the tfsimple3 model.

Detailed Specification

The full GoLang specification for a Pipeline is shown below:

type PipelineSpec struct {
	// External inputs to this pipeline, optional
	Input *PipelineInput `json:"input,omitempty"`

	// The steps of this inference graph pipeline
	Steps []PipelineStep `json:"steps"`

	// Synchronous output from this pipeline, optional
	Output *PipelineOutput `json:"output,omitempty"`
}

// +kubebuilder:validation:Enum=inner;outer;any
type JoinType string

const (
	// data must be available from all inputs
	JoinTypeInner JoinType = "inner"
	// data will include any data from any inputs at end of window
	JoinTypeOuter JoinType = "outer"
	// first data input that arrives will be forwarded
	JoinTypeAny JoinType = "any"
)

type PipelineStep struct {
	// Name of the step
	Name string `json:"name"`

	// Previous step to receive data from
	Inputs []string `json:"inputs,omitempty"`

	// msecs to wait for messages from multiple inputs to arrive before joining the inputs
	JoinWindowMs *uint32 `json:"joinWindowMs,omitempty"`

	// Map of tensor name conversions to use e.g. output1 -> input1
	TensorMap map[string]string `json:"tensorMap,omitempty"`

	// Triggers required to activate step
	Triggers []string `json:"triggers,omitempty"`

	// +kubebuilder:default=inner
	InputsJoinType *JoinType `json:"inputsJoinType,omitempty"`

	TriggersJoinType *JoinType `json:"triggersJoinType,omitempty"`

	// Batch size of request required before data will be sent to this step
	Batch *PipelineBatch `json:"batch,omitempty"`
}

type PipelineBatch struct {
	Size     *uint32 `json:"size,omitempty"`
	WindowMs *uint32 `json:"windowMs,omitempty"`
	Rolling  bool    `json:"rolling,omitempty"`
}

type PipelineInput struct {
	// Previous external pipeline steps to receive data from
	ExternalInputs []string `json:"externalInputs,omitempty"`

	// Triggers required to activate inputs
	ExternalTriggers []string `json:"externalTriggers,omitempty"`

	// msecs to wait for messages from multiple inputs to arrive before joining the inputs
	JoinWindowMs *uint32 `json:"joinWindowMs,omitempty"`

	// +kubebuilder:default=inner
	JoinType *JoinType `json:"joinType,omitempty"`

	// +kubebuilder:default=inner
	TriggersJoinType *JoinType `json:"triggersJoinType,omitempty"`

	// Map of tensor name conversions to use e.g. output1 -> input1
	TensorMap map[string]string `json:"tensorMap,omitempty"`
}

type PipelineOutput struct {
	// Previous step to receive data from
	Steps []string `json:"steps,omitempty"`

	// msecs to wait for messages from multiple inputs to arrive before joining the inputs
	JoinWindowMs uint32 `json:"joinWindowMs,omitempty"`

	// +kubebuilder:default=inner
	StepsJoin *JoinType `json:"stepsJoin,omitempty"`

	// Map of tensor name conversions to use e.g. output1 -> input1
	TensorMap map[string]string `json:"tensorMap,omitempty"`
}