API
The API runtime of our LLM Module provides a gateway for different LLM providers, starting with OpenAI and Gemini. This runtime provides a wrapper around the OpenAI and Gemini suite of models and allows you to easily deploy and integrate models like GPT-3.5-turbo, GPT-4, Gemini 1.5 Flash, etc. into Seldon Core 2 pipelines. This runtime also includes support for OpenAI Azure Deployments.
Typical LLM applications are split up into a series of components aside from your Large Language Model. These may include an additional model for creating embeddings, a vector database to store these embeddings, memory stores and prompt templates, among others. The language model is the component responsible for acting on text that is typically provided by a user or from the interactions of your system with other applications. As straightforward example of a chat app is illustrated in the diagram below:
Here, we combine different components of an LLM-based application into a pipeline. The component of interest in this diagram is the OpenAI model in the centre of the diagram. The pipeline is a question-answer chat application where a user's questions will be answered by the LLM. The memory components keep a history of the conversation so that the user can ask follow-up questions to the LLM (see Conversational Memory).
The API runtime is designed to slot into a Seldon Core 2 pipeline with minimal set-up and configuration. It can also be run as a stand-alone runtime with a selection of the available OpenAI and Gemini models in order to provide natural language output for chat and text completions applications, embeddings to use in combination with a vector database, and image generation from models such as Dalle-3. For a list of compatible models for those endpoints see the OpenAI documentation and Gemini documentation.
Last updated
Was this helpful?