# OpenAI Client

## Overview

The **OpenAI Client Translator** provides a compatibility layer between the **OpenAI API protocol** and the **Open Inference Protocol (OIP)**. This allows users to interact with models deployed via **Seldon Core** using the familiar **OpenAI Python client** and API specification (targeting **OpenAI API v1**).

With this translator, users can send requests to OpenAI-compatible models deployed on Seldon using standard OpenAI interfaces such as:

* `chat.completions.create`
* `embeddings.create`
* `images.generate`

Core 2.10 introduced a translation layer that allows OpenAI API requests to be seamlessly translated into Open Inference Protocol requests, enabling interoperability between Seldon Core and OpenAI-compatible clients. This ensures a unified interaction across OpenAI remote models and local model deployed on prem.

The transloator offers full support for the following OpenAI API functionalities:

* ✅ Chat completions
* ✅ Embeddings
* ✅ Image generation

{% hint style="info" %}
The **legacy completions endpoint** is deprecated and **not included**. It may be added in the future if required.
{% endhint %}

In terms of runtimes compatibility, the translator currently supports:

* OpenAI runtime
* Local runtime
* Local embeddings runtime

This means that you can deploy models on prem and interact with them using the OpenAI client. Additionally, streaming responses are supported for chat completions when using the OpenAI or local runtime.

## Usage Examples

### Python Client Examples

You can send requests to your models deployed via Seldon Core using the OpenAI Python client as follows:

#### Chat Completions

```python
from openai import OpenAI

client = OpenAI(
    api_key="dummy-key",
    base_url="http://localhost:9000/v2/models/chatgpt/infer"
)

completion = client.chat.completions.create(
    model="chatgpt",
    messages=[
        {"role": "user", "content": "You are a helpful assistant."},
        {"role": "assistant", "content": "Hello! How can I help you?"},
        {"role": "user", "content": "What is the capital of Romania?"}
    ],
)

print(completion)
```

#### Embeddings

```python
from openai import OpenAI

client = OpenAI(
    api_key="dummy-key",
    base_url="http://localhost:9000/v2/models/openai-embeddings/infer"
)

embedding = client.embeddings.create(
    model="openai-embeddings",
    input=["This is a test", "This is another test"]
)

print(embedding)
```

#### Image Generation

```python
from openai import OpenAI

client = OpenAI(
    api_key="dummy-key",
    base_url="http://localhost:9000/v2/models/openai-images/infer"
)

image = client.images.generate(
    model="openai-images",
    prompt="A beautiful beach in Costa Rica at sunset",
    n=1,
    size="512x512"
)

print(image)
```

{% hint style="info" %}

* Each **base URL** includes the specific model name. This differs from the standard OpenAI setup, where the base URL is typically global (e.g., `https://api.openai.com/v1/embeddings`) and the model name is provided per request. In Core 2, the model name is part of the base URL to align with the internal routing structure. A sanity check ensures that the model in the request matches the one in the URL.
* The `api_key` parameter is required by the OpenAI client but is **not used for authentication** here. You can provide any dummy value — actual authentication keys should be provided via secrets and loaded as environment variables on the server.
  {% endhint %}

### curl Examples

You can also send OpenAI-compatible requests directly via `curl`:

#### Chat Completions

```bash
curl http://localhost:9000/v2/models/chatgpt/infer/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "chatgpt",
    "messages": [
      {"role": "developer", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ]
  }'
```

#### Embeddings

```bash
curl http://localhost:9000/v2/models/openai-embeddings/infer/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": "The food was delicious and the waiter...",
    "model": "openai-embeddings",
    "encoding_format": "float"
  }'
```

#### Image Generation

```bash
curl http://localhost:9000/v2/models/openai-images/infer/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-images",
    "prompt": "A cute baby sea otter",
    "n": 1,
    "size": "1024x1024"
  }'
```

{% hint style="info" %}
Note that these endpoints differ slightly from standard Seldon Core model inference endpoints. Each request path includes the corresponding OpenAI API route (e.g., `/chat/completions`, `/embeddings`, `/images/generations`).
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.seldon.ai/llm-module/components/models/openai-client.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
