Models
This section covers various aspects of optimizing model performance in Seldon Core 2, from initial load testing to infrastructure setup and inference optimization. Each subsection provides detailed guidance on different aspects of model performance tuning:
Learn how to conduct effective load testing to understand your model's performance characteristics:
Determining load saturation points
Understanding closed-loop vs. open-loop testing
Determining the right number of replicas based on your configuration (model, infrastructure, etc.)
Setting up reproducible test environments
Interpreting test results for autoscaling configuration
Explore different approaches to optimize inference performance:
Choosing between gRPC and REST protocols
Implementing adaptive batching
Optimizing input dimensions
Configuring parallel processing with workers
Understanding CPU vs. GPU utilization
Optimizing your model artefact
Understand how to configure the underlying infrastructure for optimal model performance:
Choosing between CPU and GPU deployments
Setting appropriate CPU specifications
Configuring thread affinity
Managing memory allocation
Optimizing resource utilization
Each of these aspects plays a crucial role in achieving optimal model performance. We recommend starting with load testing to establish a baseline, then using the insights gained to inform your infrastructure setup and inference optimization strategies.
Last updated
Was this helpful?