Health

Server Live

get

The “server live” API indicates if the inference server is able to receive and respond to metadata and inference requests. The “server live” API can be used directly to implement the Kubernetes livenessProbe.

Responses
200

OK

No content

get
/v2/health/live
200

OK

No content

Server Ready

get

The “server ready” health API indicates if all the models are ready for inferencing. The “server ready” health API can be used directly to implement the Kubernetes readinessProbe.

Responses
200

OK

No content

get
/v2/health/ready
200

OK

No content

Model Ready

get

The “model ready” health API indicates if a specific model is ready for inferencing. The model name and (optionally) version must be available in the URL. If a version is not provided the server may choose a version based on its own policies. The endpoint model readiness endpoints report that an individual model is loaded and ready to serve. It is intended only to give customers visibility into the model’s state and is not intended to be used as a Kubernetes readiness probe for the MLServer container. If you use a model-specific health endpoint for the container readiness probe this can cause a deadlock based on the current implementation, because - the Seldon agent does not begin model download until the Pod’s IP is visible in endpoints; – the IP of the Pod is only published after the Pod is Ready or all internal readiness checks have passed; – the MLServer container only becomes Ready once the model is loaded. This would result in the agent never downloading the model and the Pod never becoming Ready. For container-level readiness checks we recommend the server-level readiness endpoints. These indicate that the MLServer process is up and accepting health checks and does not deadlock the agent/model loading flow.

Path parameters
model_namestringRequired
Responses
200

OK

No content

get
/v2/models/{model_name}/ready
200

OK

No content

Last updated

Was this helpful?