Changelog
Last updated
Was this helpful?
Last updated
Was this helpful?
MLServer supports Pydantic V2.
MLServer supports streaming data to and from your models.
Streaming support is available for both the REST and gRPC servers:
for the REST server is limited only to server streaming. This means that the client sends a single request to the server, and the server responds with a stream of data.
for the gRPC server is available for both client and server streaming. This means that the client sends a stream of data to the server, and the server responds with a stream of data.
See our and for more details.
fix(ci): fix typo in CI name by in https://github.com/SeldonIO/MLServer/pull/1623
Update CHANGELOG by in https://github.com/SeldonIO/MLServer/pull/1624
Re-generate License Info by in https://github.com/SeldonIO/MLServer/pull/1634
Fix mlserver_huggingface settings device type by in https://github.com/SeldonIO/MLServer/pull/1486
fix: Adjust HF tests post-merge of PR by in https://github.com/SeldonIO/MLServer/pull/1635
Update README.md w licensing clarification by in https://github.com/SeldonIO/MLServer/pull/1636
Re-generate License Info by in https://github.com/SeldonIO/MLServer/pull/1642
fix(ci): optimise disk space for GH workers by in https://github.com/SeldonIO/MLServer/pull/1644
build: Update maintainers by in https://github.com/SeldonIO/MLServer/pull/1659
fix: Missing f-string directives by in https://github.com/SeldonIO/MLServer/pull/1677
build: Add Catboost runtime to Dependabot by in https://github.com/SeldonIO/MLServer/pull/1689
Fix JSON input shapes by in https://github.com/SeldonIO/MLServer/pull/1679
build(deps): bump alibi-detect from 0.11.5 to 0.12.0 by in https://github.com/SeldonIO/MLServer/pull/1702
build(deps): bump alibi from 0.9.5 to 0.9.6 by in https://github.com/SeldonIO/MLServer/pull/1704
Docs correction - Updated README.md in mlflow to match column names order by in https://github.com/SeldonIO/MLServer/pull/1703
fix(runtimes): Remove unused Pydantic dependencies by in https://github.com/SeldonIO/MLServer/pull/1725
test: Detect generate failures by in https://github.com/SeldonIO/MLServer/pull/1729
build: Add granularity in types generation by in https://github.com/SeldonIO/MLServer/pull/1749
Migrate to Pydantic v2 by in https://github.com/SeldonIO/MLServer/pull/1748
Re-generate License Info by in https://github.com/SeldonIO/MLServer/pull/1753
Revert "build(deps): bump uvicorn from 0.28.0 to 0.29.0" by in https://github.com/SeldonIO/MLServer/pull/1758
refactor(pydantic): Remaining migrations for deprecated functions by in https://github.com/SeldonIO/MLServer/pull/1757
Fixed openapi dataplane.yaml by in https://github.com/SeldonIO/MLServer/pull/1752
fix(pandas): Use Pydantic v2 compatible type by in https://github.com/SeldonIO/MLServer/pull/1760
Fix Pandas codec decoding from numpy arrays by in https://github.com/SeldonIO/MLServer/pull/1751
build: Bump versions for Read the Docs by in https://github.com/SeldonIO/MLServer/pull/1761
docs: Remove quotes around local TOC by in https://github.com/SeldonIO/MLServer/pull/1764
Spawn worker in custom environment by in https://github.com/SeldonIO/MLServer/pull/1739
Re-generate License Info by in https://github.com/SeldonIO/MLServer/pull/1767
basic contributing guide on contributing and opening a PR by in https://github.com/SeldonIO/MLServer/pull/1773
Inference streaming support by in https://github.com/SeldonIO/MLServer/pull/1750
Re-generate License Info by in https://github.com/SeldonIO/MLServer/pull/1779
build: Lock GitHub runners' OS by in https://github.com/SeldonIO/MLServer/pull/1765
Removed text-model form benchmarking by in https://github.com/SeldonIO/MLServer/pull/1790
Bumped mlflow to 2.13.1 and gunicorn to 22.0.0 by in https://github.com/SeldonIO/MLServer/pull/1791
Build(deps): Update to poetry version 1.8.3 in docker build by in https://github.com/SeldonIO/MLServer/pull/1792
Bumped werkzeug to 3.0.3 by in https://github.com/SeldonIO/MLServer/pull/1793
Docs streaming by in https://github.com/SeldonIO/MLServer/pull/1789
Bump uvicorn 0.30.1 by in https://github.com/SeldonIO/MLServer/pull/1795
Fixes for all-runtimes by in https://github.com/SeldonIO/MLServer/pull/1794
Fix BaseSettings import for pydantic v2 by in https://github.com/SeldonIO/MLServer/pull/1798
Bumped preflight version to 1.9.7 by in https://github.com/SeldonIO/MLServer/pull/1797
build: Install dependencies only in Tox environments by in https://github.com/SeldonIO/MLServer/pull/1785
Bumped to 1.6.0.dev2 by in https://github.com/SeldonIO/MLServer/pull/1803
Fix CI/CD macos-huggingface by in https://github.com/SeldonIO/MLServer/pull/1805
Fixed macos kafka CI by in https://github.com/SeldonIO/MLServer/pull/1807
Update poetry lock by in https://github.com/SeldonIO/MLServer/pull/1808
Re-generate License Info by in https://github.com/SeldonIO/MLServer/pull/1813
Fix/macos all runtimes by in https://github.com/SeldonIO/MLServer/pull/1823
fix: Update stale reviewer in licenses.yml workflow by in https://github.com/SeldonIO/MLServer/pull/1824
ci: Merge changes from master to release branch by in https://github.com/SeldonIO/MLServer/pull/1825
Full Changelog: https://github.com/SeldonIO/MLServer/compare/1.5.0...1.6.0
We remove support for python 3.8, check https://github.com/SeldonIO/MLServer/pull/1603 for more info. Docker images for mlserver are already using python 3.10.
Full Changelog: https://github.com/SeldonIO/MLServer/compare/1.4.0...1.5.0
Full Changelog: https://github.com/SeldonIO/MLServer/compare/1.3.5...1.4.0
Full Changelog: https://github.com/SeldonIO/MLServer/compare/1.3.4...1.3.5
Full Changelog: https://github.com/SeldonIO/MLServer/compare/1.3.3...1.3.4
Full Changelog: https://github.com/SeldonIO/MLServer/compare/1.3.2...1.3.3
Full Changelog: https://github.com/SeldonIO/MLServer/compare/1.3.1...1.3.2
More often that not, your custom runtimes will depend on external 3rd party dependencies which are not included within the main MLServer package - or different versions of the same package (e.g. scikit-learn==1.1.0
vs scikit-learn==1.2.0
). In these cases, to load your custom runtime, MLServer will need access to these dependencies.
Under the hood, each of these environments will run their own separate pool of workers.
[mlserver.register()](https://mlserver.readthedocs.io/en/latest/reference/api/metrics.html#mlserver.register)
: Register a new metric.
[mlserver.log()](https://mlserver.readthedocs.io/en/latest/reference/api/metrics.html#mlserver.log)
: Log a new set of metric / value pairs.
MLServer 1.3.0
now includes an autogenerated Swagger UI which can be used to interact dynamically with the Open Inference Protocol.
The autogenerated Swagger UI can be accessed under the /v2/docs
endpoint.
The model-specific autogenerated Swagger UI can be accessed under the following endpoints:
/v2/models/{model_name}/docs
/v2/models/{model_name}/versions/{model_version}/docs
MLServer now includes improved Codec support for all the main different types that can be returned by HugginFace models - ensuring that the values returned via the Open Inference Protocol are more semantic and meaningful.
Internally, MLServer leverages a Model Repository implementation which is used to discover and find different models (and their versions) available to load. The latest version of MLServer will now allow you to swap this for your own model repository implementation - letting you integrate against your own model repository workflows.
The latest version of MLServer includes a few optimisations around image size, which help reduce the size of the official set of images by more than ~60% - making them more convenient to use and integrate within your workloads. In the case of the full seldonio/mlserver:1.3.0
image (including all runtimes and dependencies), this means going from 10GB down to ~3GB.
Full Changelog: https://github.com/SeldonIO/MLServer/compare/1.2.3...1.2.4
Full Changelog: https://github.com/SeldonIO/MLServer/compare/1.2.2...1.2.3
Full Changelog: https://github.com/SeldonIO/MLServer/compare/1.2.1...1.2.2
Full Changelog: https://github.com/SeldonIO/MLServer/compare/1.2.0...1.2.1
To make it easier to write your own custom runtimes, MLServer now ships with a mlserver init
command that will generate a templated project. This project will include a skeleton with folders, unit tests, Dockerfiles, etc. for you to fill.
For example, if we assume a flat model repository where each folder represents a model, you would end up with a folder structure like the one below:
The 1.2.0
release of MLServer, includes a number of fixes around the parallel inference pool focused on improving the architecture to optimise memory usage and reduce latency. These changes include (but are not limited to):
The main MLServer process won’t load an extra replica of the model anymore. Instead, all computing will occur on the parallel inference pool.
The worker pool will now ensure that all requests are executed on each worker’s AsyncIO loop, thus optimising compute time vs IO time.
Several improvements around logging from the inference workers.
MLServer has now dropped support for Python 3.7
. Going forward, only 3.8
, 3.9
and 3.10
will be supported (with 3.8
being used in our official set of images).
In line with MLServer’s close relationship with the MLflow team, this release of MLServer introduces support for the recently released MLflow 2.0. This introduces changes to the drop-in MLflow “scoring protocol” support, in the MLflow runtime for MLServer, to ensure it’s aligned with MLflow 2.0.
MLServer is also shipped as a dependency of MLflow, therefore you can try it out today by installing MLflow as:
Full Changelog: https://github.com/SeldonIO/MLServer/compare/1.1.0...1.2.0
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1636
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1679
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1703
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1752
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1751
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1773
Update CHANGELOG by in https://github.com/SeldonIO/MLServer/pull/1592
build: Migrate away from Node v16 actions by in https://github.com/SeldonIO/MLServer/pull/1596
build: Bump version and improve release doc by in https://github.com/SeldonIO/MLServer/pull/1602
build: Upgrade stale packages (fastapi, starlette, tensorflow, torch) by in https://github.com/SeldonIO/MLServer/pull/1603
fix(ci): tests and security workflow fixes by in https://github.com/SeldonIO/MLServer/pull/1608
Re-generate License Info by in https://github.com/SeldonIO/MLServer/pull/1612
fix(ci): Missing quote in CI test for all_runtimes by in https://github.com/SeldonIO/MLServer/pull/1617
build(docker): Bump dependencies by in https://github.com/SeldonIO/MLServer/pull/1618
docs: List supported Python versions by in https://github.com/SeldonIO/MLServer/pull/1591
fix(ci): Have separate smaller tasks for release by in https://github.com/SeldonIO/MLServer/pull/1619
Free up some space for GH actions by in https://github.com/SeldonIO/MLServer/pull/1282
Introduce tracing with OpenTelemetry by in https://github.com/SeldonIO/MLServer/pull/1281
Update release CI to use Poetry by in https://github.com/SeldonIO/MLServer/pull/1283
Re-generate License Info by in https://github.com/SeldonIO/MLServer/pull/1284
Add support for white-box explainers to alibi-explain runtime by in https://github.com/SeldonIO/MLServer/pull/1279
Update CHANGELOG by in https://github.com/SeldonIO/MLServer/pull/1294
Fix build-wheels.sh error when copying to output path by in https://github.com/SeldonIO/MLServer/pull/1286
Fix typo by in https://github.com/SeldonIO/MLServer/pull/1289
feat(logging): Distinguish logs from different models by in https://github.com/SeldonIO/MLServer/pull/1302
Make sure we use our Response class by in https://github.com/SeldonIO/MLServer/pull/1314
Adding Quick-Start Guide to docs by in https://github.com/SeldonIO/MLServer/pull/1315
feat(logging): Provide JSON-formatted structured logging as option by in https://github.com/SeldonIO/MLServer/pull/1308
Bump in conda version and mamba solver by in https://github.com/SeldonIO/MLServer/pull/1298
feat(huggingface): Merge model settings by in https://github.com/SeldonIO/MLServer/pull/1337
feat(huggingface): Load local artefacts in HuggingFace runtime by in https://github.com/SeldonIO/MLServer/pull/1319
Document and test behaviour around NaN by in https://github.com/SeldonIO/MLServer/pull/1346
Address flakiness on 'mlserver build' tests by in https://github.com/SeldonIO/MLServer/pull/1363
Bump Poetry and lockfiles by in https://github.com/SeldonIO/MLServer/pull/1369
Bump Miniforge3 to 23.3.1 by in https://github.com/SeldonIO/MLServer/pull/1372
Re-generate License Info by in https://github.com/SeldonIO/MLServer/pull/1373
Improved huggingface batch logic by in https://github.com/SeldonIO/MLServer/pull/1336
Add inference params support to MLFlow's custom invocation endpoint (… by in https://github.com/SeldonIO/MLServer/pull/1375
Increase build space for runtime builds by in https://github.com/SeldonIO/MLServer/pull/1385
Fix minor typo in sklearn
README by in https://github.com/SeldonIO/MLServer/pull/1402
Add catboost classifier support by in https://github.com/SeldonIO/MLServer/pull/1403
added model_kwargs to huggingface model by in https://github.com/SeldonIO/MLServer/pull/1417
Re-generate License Info by in https://github.com/SeldonIO/MLServer/pull/1456
Local response cache implementation by in https://github.com/SeldonIO/MLServer/pull/1440
fix link to custom runtimes by in https://github.com/SeldonIO/MLServer/pull/1467
Improve typing on Environment
class by in https://github.com/SeldonIO/MLServer/pull/1469
build(dependabot): Change reviewers by in https://github.com/SeldonIO/MLServer/pull/1548
MLServer changes from internal fork - deps and CI updates by in https://github.com/SeldonIO/MLServer/pull/1588
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1281
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1286
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1289
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1315
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1337
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1336
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1375
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1417
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1467
Rename HF codec to hf
by in https://github.com/SeldonIO/MLServer/pull/1268
Publish is_drift metric to Prom by in https://github.com/SeldonIO/MLServer/pull/1263
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1263
Silent logging by in https://github.com/SeldonIO/MLServer/pull/1230
Fix mlserver infer
with BYTES
by in https://github.com/SeldonIO/MLServer/pull/1213
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1230
Add default LD_LIBRARY_PATH env var by in https://github.com/SeldonIO/MLServer/pull/1120
Adding cassava tutorial (mlserver + seldon core) by in https://github.com/SeldonIO/MLServer/pull/1156
Add docs around converting to / from JSON by in https://github.com/SeldonIO/MLServer/pull/1165
Document SKLearn available outputs by in https://github.com/SeldonIO/MLServer/pull/1167
Fix minor typo in alibi-explain
tests by in https://github.com/SeldonIO/MLServer/pull/1170
Add support for .ubj
models and improve XGBoost docs by in https://github.com/SeldonIO/MLServer/pull/1168
Fix content type annotations for pandas codecs by in https://github.com/SeldonIO/MLServer/pull/1162
Added option to configure the grpc histogram by in https://github.com/SeldonIO/MLServer/pull/1143
Add OS classifiers to project's metadata by in https://github.com/SeldonIO/MLServer/pull/1171
Don't use qsize
for parallel worker queue by in https://github.com/SeldonIO/MLServer/pull/1169
Fix small typo in Python API docs by in https://github.com/SeldonIO/MLServer/pull/1174
Fix star import in mlserver.codecs.*
by in https://github.com/SeldonIO/MLServer/pull/1172
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1143
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1174
Use default initialiser if not using a custom env by in https://github.com/SeldonIO/MLServer/pull/1104
Add support for online drift detectors by in https://github.com/SeldonIO/MLServer/pull/1108
added intera and inter op parallelism parameters to the hugggingface … by in https://github.com/SeldonIO/MLServer/pull/1081
Fix settings reference in runtime docs by in https://github.com/SeldonIO/MLServer/pull/1109
Bump Alibi libs requirements by in https://github.com/SeldonIO/MLServer/pull/1121
Add default LD_LIBRARY_PATH env var by in https://github.com/SeldonIO/MLServer/pull/1120
Ignore both .metrics and .envs folders by in https://github.com/SeldonIO/MLServer/pull/1132
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1108
Move OpenAPI schemas into Python package ()
WARNING : The 1.3.0
has been yanked from PyPi due to a packaging issue. This should have been now resolved in >= 1.3.1
.
In MLServer 1.3.0
, it is now , through an , whose path can be specified within your model-settings.json
file. This custom environment will get provisioned on the fly after loading a model - alongside the default environment and any other custom environments.
The MLServer framework now includes a simple interface that allows you to register and keep track of any :
Custom metrics will generally be registered in the [load()](https://mlserver.readthedocs.io/en/latest/reference/api/model.html#mlserver.MLModel.load)
method and then used in the [predict()](https://mlserver.readthedocs.io/en/latest/reference/api/model.html#mlserver.MLModel.predict)
method of your . These metrics can then be polled and queried via .
Alongside the , MLServer also exposes now a set of API docs tailored to individual models, showing the specific endpoints available for each one.
Massive thanks to for taking the lead on improving the HuggingFace runtime!
This is exposed via the flag of your settings.json
configuration file.
Thanks to (aka ) for his effort contributing this feature!
MLServer 1.3.0
introduces a to increase visibility around two of its internal queues:
queue: used to accumulate request batches on the fly.
queue: used to send over requests to the inference worker pool.
Many thanks to for taking the time to implement this highly requested feature!
Alongside its built-in inference runtimes, MLServer also exposes a Python framework that you can use to extend MLServer and write your own codecs and inference runtimes. The MLServer official docs now include a documenting the main components of this framework in more detail.
made their first contribution in https://github.com/SeldonIO/MLServer/pull/864
made their first contribution in https://github.com/SeldonIO/MLServer/pull/692
made their first contribution in https://github.com/SeldonIO/MLServer/pull/849
made their first contribution in https://github.com/SeldonIO/MLServer/pull/860
made their first contribution in https://github.com/SeldonIO/MLServer/pull/950
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1033
made their first contribution in https://github.com/SeldonIO/MLServer/pull/1064
MLServer now exposes an alternative which can be used to write custom runtimes. This interface can be enabled by decorating your predict() method with the mlserver.codecs.decode_args
decorator, and it lets you specify in the method signature both how you want your request payload to be decoded and how to encode the response back.
Based on the information provided in the method signature, MLServer will automatically decode the request payload into the different inputs specified as keyword arguments. Under the hood, this is implemented through .
MLServer now lets you into a running instance of MLServer. Once you have your custom runtime ready, all you need to do is to move it to your model folder, next to your model-settings.json
configuration file.
This release of MLServer introduces a new command, which will let you run inference over a large batch of input data on the client side. Under the hood, this command will stream a large set of inference requests from specified input file, arrange them in microbatches, orchestrate the request / response lifecycle, and will finally write back the obtained responses into output file.
The official set of MLServer images has now moved to use as a base image. This ensures support to run MLServer in OpenShift clusters, as well as a well-maintained baseline for our images.
To learn more about how to use MLServer directly from the MLflow CLI, check out the .
made their first contribution in https://github.com/SeldonIO/MLServer/pull/633
made their first contribution in https://github.com/SeldonIO/MLServer/pull/711
made their first contribution in https://github.com/SeldonIO/MLServer/pull/720
made their first contribution in https://github.com/SeldonIO/MLServer/pull/742
made their first contribution in https://github.com/SeldonIO/MLServer/pull/776
made their first contribution in https://github.com/SeldonIO/MLServer/pull/839