Content Types (and Codecs)
Machine learning models generally expect their inputs to be passed down as a particular Python type. Most commonly, this type ranges from "general purpose" NumPy arrays or Pandas DataFrames to more granular definitions, like datetime
objects, Pillow
images, etc. Unfortunately, the definition of the V2 Inference Protocol doesn't cover any of the specific use cases. This protocol can be thought of a wider "lower level" spec, which only defines what fields a payload should have.
To account for this gap, MLServer introduces support for content types, which offer a way to let MLServer know how it should "decode" V2-compatible payloads. When shaped in the right way, these payloads should "encode" all the information required to extract the higher level Python type that will be required for a model.
To illustrate the above, we can think of a Scikit-Learn pipeline, which takes in a Pandas DataFrame and returns a NumPy Array. Without the use of content types, the V2 payload itself would probably lack information about how this payload should be treated by MLServer Likewise, the Scikit-Learn pipeline wouldn't know how to treat a raw V2 payload. In this scenario, the use of content types allows us to specify information on what's the actual "higher level" information encoded within the V2 protocol payloads.
Usage
To let MLServer know that a particular payload must be decoded / encoded as a different Python data type (e.g. NumPy Array, Pandas DataFrame, etc.), you can specify it through the content_type
field of the parameters
section of your request.
As an example, we can consider the following dataframe, containing two columns: Age and First Name.
Joanne
34
Michael
22
This table, could be specified in the V2 protocol as the following payload, where we declare that:
The whole set of inputs should be decoded as a Pandas Dataframe (i.e. setting the content type as
pd
).The First Name column should be decoded as a UTF-8 string (i.e. setting the content type as
str
).
{
"parameters": {
"content_type": "pd"
},
"inputs": [
{
"name": "First Name",
"datatype": "BYTES",
"parameters": {
"content_type": "str"
},
"shape": [2],
"data": ["Joanne", "Michael"]
},
{
"name": "Age",
"datatype": "INT32",
"shape": [2],
"data": [34, 22]
},
]
}
To learn more about the available content types and how to use them, you can see all the available ones in the Available Content Types section below.
Codecs
Under the hood, the conversion between content types is implemented using codecs. In the MLServer architecture, codecs are an abstraction which know how to encode and decode high-level Python types to and from the V2 Inference Protocol.
Depending on the high-level Python type, encoding / decoding operations may require access to multiple input or output heads. For example, a Pandas Dataframe would need to aggregate all of the input-/output-heads present in a V2 Inference Protocol response.
However, a Numpy array or a list of strings, could be encoded directly as an input head within a larger request.
To account for this, codecs can work at either the request- / response-level (known as request codecs), or the input- / output-level (known as input codecs). Each of these codecs, expose the following public interface, where Any
represents a high-level Python datatype (e.g. a Pandas Dataframe, a Numpy Array, etc.):
Request Codecs
encode_request() <mlserver.codecs.RequestCodec.encode_request>
decode_request() <mlserver.codecs.RequestCodec.decode_request>
encode_response() <mlserver.codecs.RequestCodec.encode_response>
decode_response() <mlserver.codecs.RequestCodec.decode_response>
Input Codecs
encode_input() <mlserver.codecs.InputCodec.encode_input>
decode_input() <mlserver.codecs.InputCodec.decode_input>
encode_output() <mlserver.codecs.InputCodec.encode_output>
decode_output() <mlserver.codecs.InputCodec.decode_output>
Note that, these methods can also be used as helpers to encode requests and decode responses on the client side. This can help to abstract away from the user most of the details about the underlying structure of V2-compatible payloads.
For example, in the example above, we could use codecs to encode the DataFrame into a V2-compatible request simply as:
import pandas as pd
from mlserver.codecs import PandasCodec
dataframe = pd.DataFrame({'First Name': ["Joanne", "Michael"], 'Age': [34, 22]})
inference_request = PandasCodec.encode_request(dataframe)
print(inference_request)
For a full end-to-end example on how content types and codecs work under the hood, feel free to check out this Content Type Decoding example.
Converting to / from JSON
When using MLServer's request codecs, the output of encoding payloads will always be one of the classes within the mlserver.types
package (i.e. InferenceRequest <mlserver.types.InferenceRequest>
or InferenceResponse <mlserver.types.InferenceResponse>
). Therefore, if you want to use them with requests
(or other package outside of MLServer) you will need to convert them to a Python dict or a JSON string.
Luckily, these classes leverage Pydantic under the hood. Therefore you can just call the .model_dump()
or .model_dump_json()
method to convert them. Likewise, to read them back from JSON, we can always pass the JSON fields as kwargs to the class' constructor (or use any of the other methods available within Pydantic).
For example, if we want to send an inference request to model foo
, we could do something along the following lines:
import pandas as pd
import requests
from mlserver.codecs import PandasCodec
dataframe = pd.DataFrame({'First Name': ["Joanne", "Michael"], 'Age': [34, 22]})
inference_request = PandasCodec.encode_request(dataframe)
# raw_request will be a Python dictionary compatible with `requests`'s `json` kwarg
raw_request = inference_request.dict()
response = requests.post("localhost:8080/v2/models/foo/infer", json=raw_request)
# raw_response will be a dictionary (loaded from the response's JSON),
# therefore we can pass it as the InferenceResponse constructors' kwargs
raw_response = response.json()
inference_response = InferenceResponse(**raw_response)
Support for NaN values
The NaN (Not a Number) value is used in Numpy and other scientific libraries to describe an invalid or missing value (e.g. a division by zero). In some scenarios, it may be desirable to let your models receive and / or output NaN values (e.g. these can be useful sometimes with GBTs, like XGBoost models). This is why MLServer supports encoding NaN values on your request / response payloads under some conditions.
In order to send / receive NaN values, you must ensure that:
You are using the
REST
interface.The input / output entry containing NaN values uses either the
FP16
,FP32
orFP64
datatypes.You are either using the Pandas codec or the Numpy codec.
Assuming those conditions are satisfied, any null
value within your tensor payload will be converted to NaN.
For example, if you take the following Numpy array:
import numpy as np
foo = np.array([[1.2, 2.3], [np.NaN, 4.5]])
We could encode it as:
{
"inputs": [
{
"name": "foo",
"parameters": {
"content_type": "np"
},
"data": [1.2, 2.3, null, 4.5]
"datatype": "FP64",
"shape": [2, 2],
}
]
}
Model Metadata
Content types can also be defined as part of the model's metadata. This lets the user pre-configure what content types should a model use by default to decode / encode its requests / responses, without the need to specify it on each request.
For example, to configure the content type values of the example above, one could create a model-settings.json
file like the one below:
{
"parameters": {
"content_type": "pd"
},
"inputs": [
{
"name": "First Name",
"datatype": "BYTES",
"parameters": {
"content_type": "str"
},
"shape": [-1],
},
{
"name": "Age",
"datatype": "INT32",
"shape": [-1],
},
]
}
It's important to keep in mind that content types passed explicitly as part of the request will always take precedence over the model's metadata. Therefore, we can leverage this to override the model's metadata when needed.
Available Content Types
Out of the box, MLServer supports the following list of content types. However, this can be extended through the use of 3rd-party or custom runtimes.
NumPy Array
The np
content type will decode / encode V2 payloads to a NumPy Array, taking into account the following:
The
datatype
field will be matched to the closest NumPydtype
.The
shape
field will be used to reshape the flattened array expected by the V2 protocol into the expected tensor shape.
For example, if we think of the following NumPy Array:
import numpy as np
foo = np.array([[1, 2], [3, 4]])
We could encode it as the input foo
in a V2 protocol request as:
{
"inputs": [
{
"name": "foo",
"parameters": {
"content_type": "np"
},
"data": [1, 2, 3, 4]
"datatype": "INT32",
"shape": [2, 2],
}
]
}
When using the NumPy Array content type at the request-level, it will decode the entire request by considering only the first input
element. This can be used as a helper for models which only expect a single tensor.
Pandas DataFrame
The pd
content type will decode / encode a V2 request into a Pandas DataFrame. For this, it will expect that the DataFrame is shaped in a columnar way. That is,
Each entry of the
inputs
list (oroutputs
, in the case of responses), will represent a column of the DataFrame.Each of these entires, will contain all the row elements for that particular column.
The
shape
field of eachinput
(oroutput
) entry will contain (at least) the amount of rows included in the dataframe.
For example, if we consider the following dataframe:
a1
b1
c1
a2
b2
c2
a3
b3
c3
a4
b4
c4
We could encode it to the V2 Inference Protocol as:
{
"parameters": {
"content_type": "pd"
},
"inputs": [
{
"name": "A",
"data": ["a1", "a2", "a3", "a4"]
"datatype": "BYTES",
"shape": [4],
},
{
"name": "B",
"data": ["b1", "b2", "b3", "b4"]
"datatype": "BYTES",
"shape": [4],
},
{
"name": "C",
"data": ["c1", "c2", "c3", "c4"]
"datatype": "BYTES",
"shape": [4],
},
]
}
UTF-8 String
The str
content type lets you encode / decode a V2 input into a UTF-8 Python string, taking into account the following:
The expected
datatype
isBYTES
.The
shape
field represents the number of "strings" that are encoded in the payload (e.g. the["hello world", "one more time"]
payload will have a shape of 2 elements).
For example, when if we consider the following list of strings:
foo = ["bar", "bar2"]
We could encode it to the V2 Inference Protocol as:
{
"parameters": {
"content_type": "str"
},
"inputs": [
{
"name": "foo",
"data": ["bar", "bar2"]
"datatype": "BYTES",
"shape": [2],
}
]
}
When using the str
content type at the request-level, it will decode the entire request by considering only the first input
element. This can be used as a helper for models which only expect a single string or a set of strings.
Base64
The base64
content type will decode a binary V2 payload into a Base64-encoded string (and viceversa), taking into account the following:
The expected
datatype
isBYTES
.The
data
field should contain the base64-encoded binary strings.The
shape
field represents the number of binary strings that are encoded in the payload.
For example, if we think of the following "bytes array":
foo = b"Python is fun"
We could encode it as the input foo
of a V2 request as:
{
"inputs": [
{
"name": "foo",
"parameters": {
"content_type": "base64"
},
"data": ["UHl0aG9uIGlzIGZ1bg=="]
"datatype": "BYTES",
"shape": [1],
}
]
}
Datetime
The datetime
content type will decode a V2 input into a Python datetime.datetime
object, taking into account the following:
The expected
datatype
isBYTES
.The
data
field should contain the dates serialised following the ISO 8601 standard.The
shape
field represents the number of datetimes that are encoded in the payload.
For example, if we think of the following datetime
object:
import datetime
foo = datetime.datetime(2022, 1, 11, 11, 0, 0)
We could encode it as the input foo
of a V2 request as:
{
"inputs": [
{
"name": "foo",
"parameters": {
"content_type": "datetime"
},
"data": ["2022-01-11T11:00:00"]
"datatype": "BYTES",
"shape": [1],
}
]
}
Last updated
Was this helpful?