> For the complete documentation index, see [llms.txt](https://docs.seldon.ai/seldon-core-2/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.seldon.ai/seldon-core-2/user-guide/models.md).

# Models

Models provide the atomic building blocks of Seldon. They represents machine learning models,\
drift detectors, outlier detectors, explainers, feature transformations, and more complex routing\
models such as multi-armed bandits.

* Seldon can handle a wide range of [inference artifacts](/seldon-core-2/user-guide/models/inference-artifacts.md)
* Artifacts can be stored on any of the 40 or more cloud storage technologies as well as from\
  local (mounted) folder as discussed [here](/seldon-core-2/user-guide/models/rclone.md).

## Kubernetes Example

A Kubernetes yaml example is shown below for a SKLearn model for iris classification:

```yaml
# samples/models/sklearn-iris-gs.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris
spec:
  storageUri: "gs://seldon-models/scv2/samples/mlserver_1.5.0/iris-sklearn"
  requirements:
  - sklearn
  memory: 100Ki
```

Its Kubernetes `spec` has two core requirements

* A `storageUri` specifying the location of the artifact. This can be any rclone URI specification.
* A `requirements` list which provides tags that need to be matched by the Server that can run\
  this artifact type. By default when you install Seldon we provide a set of Servers that cover a\
  range of artifact types.

## GRPC Example

You can also load models directly over the scheduler grpc service. An example is shown below use grpcurl tool:

```bash
!grpcurl -d '{"model":{ \
              "meta":{"name":"iris"},\
              "modelSpec":{"uri":"gs://seldon-models/mlserver/iris",\
                           "requirements":["sklearn"],\
                           "memoryBytes":500},\
              "deploymentSpec":{"replicas":1}}}' \
         -plaintext \
         -import-path ../../apis \
         -proto apis/mlops/scheduler/scheduler.proto  0.0.0.0:9004 seldon.mlops.scheduler.Scheduler/LoadModel
```

The proto buffer definitions for the scheduler are outlined [here](/seldon-core-2/resources/apis/scheduler.md).

## Multi-model Serving with Overcommit

Multi-model serving is an architecture pattern where one ML inference server hosts multiple models\
at the same time. It is a feature provided out of the box by Nvidia Triton and Seldon MLServer.\
Multi-model serving reduces infrastructure hardware requirements (e.g. expensive GPUs) which enables\
the deployment of a large number of models while making it efficient to operate the system at scale.

Seldon Core 2 leverages multi-model serving by design and it is the default option for deploying\
models. The system will find an appropriate server to load the model onto based on requirements that\
the user defines in the `Model` deployment definition.

Moreover, in many cases demand patterns allow for further Overcommit of resources. Seldon Core 2\
is able to register more models than what can be served by the provisioned (memory) infrastructure\
and will swap models dynamically according to least used without adding significant latency overheads\
to inference workload.

See [Multi-model serving](/seldon-core-2/user-guide/models/mms.md) for more information.

## Autoscaling of Models

See [here](broken://pages/WncUW6j5rFYoCiFxkpRW) for discussion of autoscaling of models.

## Scheduling of Models onto Servers

See [here](/seldon-core-2/user-guide/models/scheduling.md) for details on how Core 2 schedules Models onto Servers.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.seldon.ai/seldon-core-2/user-guide/models.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
