Seldon Core Autoscaling
Seldon Core provides native autoscaling features for both Models and Servers, enabling automatic scaling based on inference load. The diagram below depicts an autoscaling implementation that uses both Model and Server autoscaling features native to Seldon Core (i.e. this implementation doesn't leverage HPA for autoscaling, an approach we cover here)

Model Autoscaling
Models can automatically scale their replicas based on load. Enable it by setting MinReplicas
or MaxReplicas
in your model spec. For more detail on this setup see here
Server Autoscaling
Server autoscaling automatically scales Servers based on Model needs. This implementation supports scaling in a Multi-Model Serving setup where multiple models are hosted on shared inference servers. For more detail on this setup see here
Last updated
Was this helpful?