Troubleshooting Deployments
If your Seldon Deployment does not seem to be running here are some tips to diagnose the issue.
My model does not seem to be running
Check whether the Seldon Deployment is running:
kubectl get sdepIf it exists, check its status, for a Seldon deployment called <name>:
kubectl get sdep <name> -o jsonpath='{.status}'This might look like:
>kubectl get sdep
NAME AGE
mymodel 1m
>kubectl get sdep mymodel -o jsonpath='{.status}'
map[predictorStatus:[map[name:mymodel-mymodel-7cd068f replicas:1 replicasAvailable:1]] state:Available]If you have the jq tool installed you can get a nicer output with:
>kubectl get sdep mymodel -o json | jq .status
{
"predictorStatus": [
{
"name": "mymodel-mymodel-7cd068f",
"replicas": 1,
"replicasAvailable": 1
}
],
"state": "Available"
}For a model with invalid json/yaml an example is shown below:
Check all events on the SeldonDeployment
This will show each event from the operator including create, update, delete and error events.
My Seldon Deployment remains in "creating" state
Check if the pods are running successfully.
I get 500s when calling my model over the API
Check the logs of your running model pods.
My Seldon Deployment is not listed
Check the logs of the Seldon Operator. This is the pod which handles the Seldon Deployment graphs sent to Kubernetes. On a default installation, you can find the operator pod on the seldon-system namespace. The pod will be labelled as control-plane=seldon-controller-manager, so to get the logs you can run:
Invalid memory address
On some cases, you will see an error message on the operator logs like the following:
This error can be caused by empty or unexpected values in the SeldonDeployment spec. The main cause is usually a misconfiguration of the mutating webhook. To fix it, you can try to re-install Seldon Core in your cluster.
I have tried the above and I'm still confused
Contact our Slack Community
Create an issue on Seldon Core's Github repo. Please make sure to add any diagnostics from the above suggestions to help us diagnose your issue.
Last updated
Was this helpful?