Codecs are used to encapsulate the logic required to encode / decode payloads following the Open Inference Protocol into high-level Python types. You can read more about the high-level concepts behind codecs in thesection of the docs, as well as how to use them.
All the codecs within MLServer extend from either the {class}InputCodec <mlserver.codecs.base.InputCodec>
or the {class}RequestCodec <mlserver.codecs.base.RequestCodec>
base classes. These define the interface to deal with input (outputs) and request (responses) respectively.
The mlserver
package will include a set of built-in codecs to cover common conversions. You can learn more about these in thesection of the docs.
The MLServer package exposes a set of methods that let you register and track custom metrics. This can be used within your own custom inference runtimes. To learn more about how to expose custom metrics, check out the metrics usage guide.
The MLModel
class is the base class for all custom inference runtimes. It exposes the main interface that MLServer will use to interact with ML models.
The bulk of its public interface are the {func}load() <mlserver.MLModel.load>
, {func}unload() <mlserver.MLModel.unload>
and {func}predict() <mlserver.MLModel.predict>
methods. However, it also contains helpers with encoding / decoding of requests and responses, as well as properties to access the most common bits of the model's metadata.
When writing custom runtimes, this class should be extended to implement your own load and predict logic.