Models

Seldon’s LLM Module provides an interface to deploy a wide variety of Large Language models, whether they be third party, hosted LLMs such as OpenAI and Gemini models, or open-source or custom LLMs to be deployed on your own infrastructure. In order to do this, the LLM module offers two separate runtimes: API and Local. Each runtime is set up so that requests and responses conform to Seldon’s Open Inference Protocol standard, and can easily be plug-and-played within Core 2 pipelines.

  • API: This runtime provides a pass-through to call OpenAI and Gemini models, hosted by the relevant third-party provider.

  • Local: This runtime integrates with DeepSpeed, vLLM, and a Hugging Face Transformers serving backends to deploy models on your own (local or cloud) infrastructure. These backends come with LLM-specific serving optimizations to manage resource-usage and optimize performance.

Last updated

Was this helpful?