Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
MLServer can be configured through a settings.json
file on the root folder from where MLServer is started. Note that these are server-wide settings (e.g. gRPC or HTTP port) which are separate from the invidual model settings. Alternatively, this configuration can also be passed through environment variables prefixed with MLSERVER_
(e.g. MLSERVER_GRPC_PORT
).
Codecs are used to encapsulate the logic required to encode / decode payloads following the Open Inference Protocol into high-level Python types. You can read more about the high-level concepts behind codecs in thesection of the docs, as well as how to use them.
All the codecs within MLServer extend from either the {class}InputCodec <mlserver.codecs.base.InputCodec>
or the {class}RequestCodec <mlserver.codecs.base.RequestCodec>
base classes. These define the interface to deal with input (outputs) and request (responses) respectively.
The mlserver
package will include a set of built-in codecs to cover common conversions. You can learn more about these in thesection of the docs.
The MLServer package exposes a set of methods that let you register and track custom metrics. This can be used within your own custom inference runtimes. To learn more about how to expose custom metrics, check out the metrics usage guide.
In MLServer, each loaded model can be configured separately. This configuration will include model information (e.g. metadata about the accepted inputs), but also model-specific settings (e.g. number of parallel workers to run inference).
This configuration will usually be provided through a model-settings.json
file which sits next to the model artifacts. However, it's also possible to provide this through environment variables prefixed with MLSERVER_MODEL_
(e.g. MLSERVER_MODEL_IMPLEMENTATION
). Note that, in the latter case, this environment variables will be shared across all loaded models (unless they get overriden by a model-settings.json
file). Additionally, if no model-settings.json
file is found, MLServer will also try to load a "default" model from these environment variables.
The MLModel
class is the base class for all custom inference runtimes. It exposes the main interface that MLServer will use to interact with ML models.
The bulk of its public interface are the {func}load() <mlserver.MLModel.load>
, {func}unload() <mlserver.MLModel.unload>
and {func}predict() <mlserver.MLModel.predict>
methods. However, it also contains helpers with encoding / decoding of requests and responses, as well as properties to access the most common bits of the model's metadata.
When writing custom runtimes, this class should be extended to implement your own load and predict logic.