Models

This section covers various aspects of optimizing model performance in Seldon Core 2, from initial load testing to infrastructure setup and inference optimization. Each subsection provides detailed guidance on different aspects of model performance tuning:

Learn how to conduct effective load testing to understand your model's performance characteristics:

  • Determining load saturation points

  • Understanding closed-loop vs. open-loop testing

  • Determining the right number of replicas based on your configuration (model, infrastructure, etc.)

  • Setting up reproducible test environments

  • Interpreting test results for autoscaling configuration

Explore different approaches to optimize inference performance:

  • Choosing between gRPC and REST protocols

  • Implementing adaptive batching

  • Optimizing input dimensions

  • Configuring parallel processing with workers

  • Understanding CPU vs. GPU utilization

  • Optimizing your model artefact

Understand how to configure the underlying infrastructure for optimal model performance:

  • Choosing between CPU and GPU deployments

  • Setting appropriate CPU specifications

  • Configuring thread affinity

  • Managing memory allocation

  • Optimizing resource utilization

Each of these aspects plays a crucial role in achieving optimal model performance. We recommend starting with load testing to establish a baseline, then using the insights gained to inform your infrastructure setup and inference optimization strategies.

Last updated

Was this helpful?