Note: The default installation will provide two initial servers: one MLServer and one Triton. You only need to define additional servers for advanced use cases.
A Server defines an inference server onto which models will be placed for inference. By default on installation two server StatefulSets will be deployed one MlServer and one Triton. An example Server definition is shown below:
The main requirement is a reference to a ServerConfig resource in this case mlserver.
Detailed Specs
typeServerSpecstruct {// Server definition ServerConfig string`json:"serverConfig"`// The extra capabilities this server will advertise// These are added to the capabilities exposed by the referenced ServerConfig ExtraCapabilities []string`json:"extraCapabilities,omitempty"`// The capabilities this server will advertise// This will override any from the referenced ServerConfig Capabilities []string`json:"capabilities,omitempty"`// Image overrides ImageOverrides *ContainerOverrideSpec`json:"imageOverrides,omitempty"`// PodSpec overrides// Slices such as containers would be appended not overridden PodSpec *PodSpec`json:"podSpec,omitempty"`// Scaling specScalingSpec`json:",inline"`// +Optional// If set then when the referenced ServerConfig changes we will NOT update the Server immediately.// Explicit changes to the Server itself will force a reconcile though DisableAutoUpdate bool`json:"disableAutoUpdate,omitempty"`}typeContainerOverrideSpecstruct {// The Agent overrides Agent *v1.Container`json:"agent,omitempty"`// The RClone server overrides RClone *v1.Container`json:"rclone,omitempty"`}typeServerDefnstruct {// Server config name to match// Required Config string`json:"config"`}
Custom Servers
One can easily utilize a custom image with the existing ServerConfigs. For example, the following defines an MLServer server with a custom image:
One can also create a Server definition to add a persistent volume to your server. This can be used to allow models to be loaded directly from the persistent volume.