Python API

MLServer exposes a Python framework to build custom inference runtimes, define request/response types, plug codecs for payload conversion, and emit metrics. This page provides a high-level overview and links to the API docs.

MLModel
- Base class to implement custom inference runtimes.
- Core lifecycle: load(), predict(), unload().
- Helpers for encoding/decoding requests and responses.
- Access to model metadata and settings.
- Extend this class to implement your own model logic.
Types
- Data structures and enums for the V2 inference protocol.
- Includes Pydantic models like InferenceRequest, InferenceResponse, RequestInput, ResponseOutput.
- See model fields (type and default) and JSON Schemas in the docs.
Codecs
- Encode/decode payloads between Open Inference Protocol types and Python types.
- Base classes: InputCodec (inputs/outputs) and RequestCodec (requests/responses).
- Built-ins include codecs such as NumpyCodec, Base64Codec, StringCodec, etc.
Metrics
- Emit and configure metrics within MLServer.
- Use log() to record custom metrics; see server lifecycle hooks and utilities.

When creating a custom runtime, start by subclassing MLModel, use the structures from Types for requests/responses, pick or implement the appropriate Codecs, and optionally emit Metrics from your model code.

PreviousMLServer CLI NextMLModel

Last updated 1 month ago

Was this helpful?