Python API
MLServer exposes a Python framework to build custom inference runtimes, define request/response types, plug codecs for payload conversion, and emit metrics. This page provides a high-level overview and links to the API docs.
Base class to implement custom inference runtimes.
Core lifecycle:
load(),predict(),unload().Helpers for encoding/decoding requests and responses.
Access to model metadata and settings.
Extend this class to implement your own model logic.
Data structures and enums for the V2 inference protocol.
Includes Pydantic models like
InferenceRequest,InferenceResponse,RequestInput,ResponseOutput.See model fields (type and default) and JSON Schemas in the docs.
Encode/decode payloads between Open Inference Protocol types and Python types.
Base classes:
InputCodec(inputs/outputs) andRequestCodec(requests/responses).Built-ins include codecs such as
NumpyCodec,Base64Codec,StringCodec, etc.
Emit and configure metrics within MLServer.
Use
log()to record custom metrics; see server lifecycle hooks and utilities.
Last updated
Was this helpful?
