For the complete documentation index, see llms.txt. This page is also available as Markdown.

API Reference

This page links to the key reference docs for configuring and using MLServer.

MLServer Settings

Server-wide configuration (e.g., HTTP/GRPC ports) loaded from a settings.json in the working directory. Settings can also be provided via environment variables prefixed with MLSERVER_ (e.g., MLSERVER_GRPC_PORT).

  • Scope: server-wide (independent from model-specific settings)

  • Sources: settings.json or env vars MLSERVER_*

Read the full reference →

Model Settings

Each model has its own configuration (metadata, parallelism, etc.). Typically provided via a model-settings.json next to the model artifacts. Alternatively, use env vars prefixed with MLSERVER_MODEL_ (e.g., MLSERVER_MODEL_IMPLEMENTATION). If no model-settings.json is found, MLServer will try to load a default model from these env vars. Note: these env vars are shared across models unless overridden by model-settings.json.

  • Scope: per-model

  • Sources: model-settings.json or env vars MLSERVER_MODEL_*

Read the full reference →

MLServer CLI

The mlserver CLI helps with common model lifecycle tasks (build images, init projects, start serving, etc.). For a quick overview:

mlserver --help
  • Commands include: build, dockerfile, infer (deprecated), init, start

  • Each command lists its options, arguments, and examples

Read the full CLI reference →

Python API

Build custom runtimes and integrate with MLServer using Python:

  • MLModel: base class for custom inference runtimes

  • Types: request/response schemas and enums (Pydantic)

  • Codecs: payload conversions between protocol types and Python types

  • Metrics: emit and configure metrics

Browse the Python API →

Last updated