OpenAI Client

Overview

The OpenAI Client Translator provides a compatibility layer between the OpenAI API protocol and the Open Inference Protocol (OIP). This allows users to interact with models deployed via Seldon Core using the familiar OpenAI Python client and API specification (targeting OpenAI API v1).

With this translator, users can send requests to OpenAI-compatible models deployed on Seldon using standard OpenAI interfaces such as:

chat.completions.create
embeddings.create
images.generate

Core 2.10 introduced a translation layer that allows OpenAI API requests to be seamlessly translated into Open Inference Protocol requests, enabling interoperability between Seldon Core and OpenAI-compatible clients. This ensures a unified interaction across OpenAI remote models and local model deployed on prem.

The transloator offers full support for the following OpenAI API functionalities:

✅ Chat completions
✅ Embeddings
✅ Image generation

The legacy completions endpoint is deprecated and not included. It may be added in the future if required.

In terms of runtimes compatibility, the translator currently supports:

OpenAI runtime
Local runtime
Local embeddings runtime

This means that you can deploy models on prem and interact with them using the OpenAI client. Additionally, streaming responses are supported for chat completions when using the OpenAI or local runtime.

Usage Examples

Python Client Examples

You can send requests to your models deployed via Seldon Core using the OpenAI Python client as follows:

Chat Completions

from openai import OpenAI

client = OpenAI(
    api_key="dummy-key",
    base_url="http://localhost:9000/v2/models/chatgpt/infer"
)

completion = client.chat.completions.create(
    model="chatgpt",
    messages=[
        {"role": "user", "content": "You are a helpful assistant."},
        {"role": "assistant", "content": "Hello! How can I help you?"},
        {"role": "user", "content": "What is the capital of Romania?"}
    ],
)

print(completion)

Embeddings

from openai import OpenAI

client = OpenAI(
    api_key="dummy-key",
    base_url="http://localhost:9000/v2/models/openai-embeddings/infer"
)

embedding = client.embeddings.create(
    model="openai-embeddings",
    input=["This is a test", "This is another test"]
)

print(embedding)

Image Generation

from openai import OpenAI

client = OpenAI(
    api_key="dummy-key",
    base_url="http://localhost:9000/v2/models/openai-images/infer"
)

image = client.images.generate(
    model="openai-images",
    prompt="A beautiful beach in Costa Rica at sunset",
    n=1,
    size="512x512"
)

print(image)

Each base URL includes the specific model name. This differs from the standard OpenAI setup, where the base URL is typically global (e.g., https://api.openai.com/v1/embeddings) and the model name is provided per request. In Core 2, the model name is part of the base URL to align with the internal routing structure. A sanity check ensures that the model in the request matches the one in the URL.
The api_key parameter is required by the OpenAI client but is not used for authentication here. You can provide any dummy value — actual authentication keys should be provided via secrets and loaded as environment variables on the server.

curl Examples

You can also send OpenAI-compatible requests directly via curl:

Chat Completions

curl http://localhost:9000/v2/models/chatgpt/infer/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "chatgpt",
    "messages": [
      {"role": "developer", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ]
  }'

Embeddings

curl http://localhost:9000/v2/models/openai-embeddings/infer/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": "The food was delicious and the waiter...",
    "model": "openai-embeddings",
    "encoding_format": "float"
  }'

Image Generation

curl http://localhost:9000/v2/models/openai-images/infer/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-images",
    "prompt": "A cute baby sea otter",
    "n": 1,
    "size": "1024x1024"
  }'

Note that these endpoints differ slightly from standard Seldon Core model inference endpoints. Each request path includes the corresponding OpenAI API route (e.g., /chat/completions, /embeddings, /images/generations).

PreviousStreaming NextMemory

Last updated 10 days ago

Was this helpful?