Creating your Python inference class
In order to run a custom python model on Seldon Core, you first need to wrap the model in it's own Python Class.
Model
To wrap your machine learning model create a Class that has a predict method with the following signature:
def predict(self, X: Union[np.ndarray, List, str, bytes, Dict], names: Optional[List[str]], meta: Optional[Dict] = None) -> Union[np.ndarray, List, str, bytes, Dict]:Your predict method will receive a numpy array X with iterable set of column names (if they exist in the input features) and optional Dictionary of meta data. It should return the result of the prediction as either:
Numpy array
List of values
String or Bytes
Dictionary
A simple example is shown below:
class MyModel(object):
"""
Model template. You can load your model parameters in __init__ from a location accessible at runtime
"""
def __init__(self):
"""
Add any initialization parameters. These will be passed at runtime from the graph definition parameters defined in your seldondeployment kubernetes resource manifest.
"""
print("Initializing")
def predict(self, X, features_names=None):
"""
Return a prediction.
Parameters
----------
X : array-like
feature_names : array of feature names (optional)
"""
print("Predict called - will run identity function")
return XReturning class names
You can also provide a method to return the column names for your prediction with a method class_names with signature show below:
Examples
You can follow various notebook examples.
Transformers
Seldon Core allows you to create components to transform features either in the input request direction (input transformer) or the output response direction (output transformer). For these components create methods with the signatures below:
Combiners
Seldon Core allows you to create components that combine responses from multiple models into a single response. To create a class for this add a method with signature below:
A simple example that averages a set of responses is shown below:
Routers
Routers provide functionality to direct a request to one of a set of child components. For this you should create a method with signature as shown below that returns the id for the child component to route the request to. The id is the index of children connected to the router.
To see examples of this you can follow the various example routers that are part of Seldon Core.
Adding Custom Metrics
To return metrics associated with a call create a method with signature as shown below:
This method should return a Dictionary of metrics as described in the custom metrics docs.
An illustrative example is shown below:
Note: prior to Seldon Core 1.1 custom metrics have always been returned to client. From SC 1.1 you can control this behaviour setting INCLUDE_METRICS_IN_CLIENT_RESPONSE environmental variable to either true or false. Despite value of this environmental variable custom metrics will always be exposed to Prometheus.
Prior to Seldon Core 1.1.0 not implementing custom metrics logs a message at the info level at each predict call. Starting with Seldon Core 1.1.0 this is logged at the debug level. To suppress this warning implement a metrics function returning an empty list:
Returning Tags
If we wish to add arbitrary tags to the returned metadata you can provide a tags method with signature as shown below:
A simple example is shown below:
Runtime Metrics and Tags
Starting from SC 1.3 metrics and tags can also be defined on the output of predict, transform_input, transform_output, send_feedback, route and aggregate.
This is thread-safe.
Note that tags and metrics defined through SeldonResponse take priority. In above examples returned tags will be:
REST Health Endpoint
If you wish to add a REST health point, you can implement the health_status method with signature as shown below:
You can use this to verify that your service can respond to HTTP calls after you have built your docker image and also as kubernetes liveness and readiness probes to verify that your model is healthy.
A simple example is shown below:
When you use seldon-core-microservice to start the HTTP server, you can verify that the model is up and running by checking the /health/status endpoint:
Additionally, you can also use the /health/ping endpoint if you want a lightweight call that just checks that the HTTP server is up:
You can also override the default liveness and readiness probes and use HTTP health endpoints by adding them in your SeldonDeployment YAML. You can modify the parameters for the probes to suit your reliability needs without putting too much stress on the container. Read more about these probes in the kubernetes documentation. An example is shown below:
However, note if executor.fullHealthChecks is set to true then the Seldon orchestrator will call your health status method to check the model is ready.
Low level Methods
If you want more control you can provide a low-level methods that will provide as input the raw proto buffer payloads. The signatures for these are shown below for release seldon_core>=0.2.6.1:
User Defined Exceptions
If you want to handle custom exceptions define a field model_error_handler as shown below:
An example is as follows:
Multi-value numpy arrays
By default, when using the data ndarray parameter, the conversion to ndarray (by default) converts all inner types into the same type. With models that may take as input arrays with different value types, you will be able to do so by overriding the predict_raw function yourself which gives you access to the raw request, and creating the numpy array as follows:
Gunicorn and load
If the wrapped python class is served under Gunicorn then as part of initialization of each gunicorn worker a load method will be called on your class if it has it. You should use this method to load and initialise your model. This is important for Tensorflow models which need their session created in each worker process. The Tensorflow MNIST example does this.
Integer numbers
The json package in Python, parses numbers with no decimal part as integers. Therefore, a tensor containing only numbers without a decimal part will get parsed as an integer tensor.
To illustrate the above, we can consider the following example:
By default, the json package will parse the array in the data.ndarray field as an array of Python Integer values. Since there are no floating-point values, numpy will then create a tensor with dtype = np.int32.
If we want to force a different behaviour, we can use the underlying predict_raw() method to control the deserialisation of the input payload. As an example, using the example above, we could force the resulting tensor to always using dtype = np.float64 by implementing predict_raw() as:
Incubating features
REST Metadata Endpoint
The python wrapper will automatically expose a /metadata endpoint to return metadata about the loaded model. It is up to the developer to implement a metadata method in their class to provide a dict back containing the model metadata.
See metadata documentation for more details.
Example format:
Validation
Output of developer-defined metadata method will be validated to follow the V2 dataplane proposal protocol, see this GitHub issue for details:
with
If validation fails server will reply with 500 response MICROSERVICE_BAD_METADATA when requested for metadata.
Examples:
Next Steps
After you have created the Component you need to create a Docker image that can be managed by Seldon Core. Follow the documentation to do this with s2i or Docker.
Last updated
Was this helpful?