# API

The API runtime of our LLM Module provides a gateway for different LLM providers, starting with OpenAI and Gemini. This runtime provides a wrapper around the OpenAI and Gemini suite of models and allows you to easily deploy and integrate models like GPT-3.5-turbo, GPT-4, Gemini 1.5 Flash, etc. into [Seldon Core 2 pipelines](https://docs.seldon.io/projects/seldon-core/en/v2/contents/pipelines/index.html). This runtime also includes support for OpenAI Azure Deployments.

Typical LLM applications are split up into a series of components aside from your Large Language Model. These may include an additional model for creating embeddings, a vector database to store these embeddings, memory stores and prompt templates, among others. The language model is the component responsible for acting on text that is typically provided by a user or from the interactions of your system with other applications. As straightforward example of a chat app is illustrated in the diagram below:

{% @mermaid/diagram content="flowchart LR
input(\[input])
output(\[output])
filesys\[(FILE SYSTEM)]
memory\_1
memory\_2
OAI\["OpenAI"]

```
input --> memory_1 --> OAI --> output
filesys <--> memory_1
memory_2 --> filesys
memory_2 --> output
OAI --> memory_2

%% Styling for OpenAI node
style OAI fill:#407,stroke:#333,stroke-width:2px,color:#fff" %}
```

Here, we combine different components of an LLM-based application into a pipeline. The component of interest in this diagram is the OpenAI model in the centre of the diagram. The pipeline is a question-answer chat application where a user's questions will be answered by the LLM. The memory components keep a history of the conversation so that the user can ask follow-up questions to the LLM (see [Conversational Memory](/llm-module/components/memory.md)).

The API runtime is designed to slot into a [Seldon Core 2 pipeline](https://docs.seldon.io/projects/seldon-core/en/v2/contents/pipelines/index.html) with minimal set-up and configuration. It can also be run as a stand-alone runtime with a selection of the available OpenAI and Gemini models in order to provide natural language output for chat and text completions applications, embeddings to use in combination with a vector database, and image generation from models such as Dalle-3. For a list of compatible models for those endpoints see the [OpenAI documentation](https://platform.openai.com/docs/models/overview) and [Gemini documentation](https://ai.google.dev/gemini-api/docs/models/gemini).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.seldon.ai/llm-module/components/models/api.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
