OpenAI Function Calling

In this example we demonstrate how to use function calling with the OpenAI runtime on Seldon Core 2. We will follow closely the OpenAI docs from here.

We begin by deploying an OpenAI model on Seldon Core 2. The namespace we are using in this example is seldon, but feel free to replace it based on your configuration.

For this tutorial you will need to create a secret with the OpenAI API key and deploy the API Runtime server. Please check our installation tutorial to see how you can do so.

We can deploy the OpenAI model using the following manifest file:

!cat manifests/openai-chat-completions.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: openai-chat-completions
spec:
  storageUri: "gs://seldon-models/llm-runtimes/examples/api/openai-func-call/models/openai_chat_completions"
  requirements:
  - openai

The MLServer setting file is:

!cat models/openai_chat_completions/model-settings.json
{
  "name": "openai_chat_completions",
  "implementation": "mlserver_llm_api.LLMRuntime",
  "parameters": {
    "extra": {
      "provider_id": "openai",
      "config": {
        "model_id": "gpt-4o",
        "model_type": "chat.completions"
      }
    }
  }
}

As you can see, we are deploying a "gpt-4o" model to be used for chat completions.

To deploy the model, run the following command:

!kubectl apply -f manifests/openai-chat-completions.yaml -n seldon
!kubectl wait --for condition=ready --timeout=600s model --all -n seldon
model.mlops.seldon.io/openai-chat-completions created
model.mlops.seldon.io/openai-chat-completions condition met

Function calling

Now that we have our model deployed, we are ready to send some requests.

We begin by defining the actual function we are about to call and the associated tools definition which is going to be a list of jsons.

import json
from datetime import datetime

def get_delivery_date(order_id: str):
    return datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    
tools = [
    json.dumps({
        "type": "function",
        "function": {
            "name": "get_delivery_date",
            "description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID."
                    }
                },
                "required": ["order_id"],
                "additionalProperties": False
            }
        }
    })
]

We can now construct the first request in our conversation with "gpt-4o".

import subprocess

def get_mesh_ip():
    cmd = f"kubectl get svc seldon-mesh -n seldon -o jsonpath='{{.status.loadBalancer.ingress[0].ip}}'"
    return subprocess.check_output(cmd, shell=True).decode('utf-8')
inference_request = {
    "inputs": [
        {
            "name": "role",
            "shape": [2],
            "datatype": "BYTES",
            "data": [
                "system", 
                "user"
            ]
        },
        {
            "name": "content",
            "shape": [2],
            "datatype": "BYTES",
            "data": [
                json.dumps(["You are a helpful customer support assistant. Use the supplied tools to assist the user."]), 
                json.dumps(["Where is my package?"])
            ]
        },
        {
            "name": "type",
            "shape": [2],
            "datatype": "BYTES",
            "data": [
                json.dumps(["text"]), 
                json.dumps(["text"])
            ]
        },
        {
            "name": "tools",
            "shape": [1],
            "datatype": "BYTES",
            "data": tools
        }
    ],
    "parameters": {
        "llm_parameters": {
            "temperature": 0.0,  # we set temperature to 0.0 to get deterministic results
        }
    },
}
import pprint
import requests

endpoint = f"http://{get_mesh_ip()}/v2/models/openai-chat-completions/infer"
response = requests.post(endpoint, json=inference_request)
pprint.pprint(response.json(), depth=4)
{'id': '211f597c-0176-484e-b042-d95a7175a111',
 'model_name': 'openai-chat-completions_1',
 'model_version': '1',
 'outputs': [{'data': ['assistant'],
              'datatype': 'BYTES',
              'name': 'role',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': ['Could you please provide me with your order ID so I '
                       'can check the delivery status for you?'],
              'datatype': 'BYTES',
              'name': 'content',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': ['text'],
              'datatype': 'BYTES',
              'name': 'type',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': ['{\n'
                       '  "id": "chatcmpl-ANHzH0nrtVkfwc22kHbTs7wGEq9z4",\n'
                       '  "choices": [\n'
                       '    {\n'
                       '      "finish_reason": "stop",\n'
                       '      "index": 0,\n'
                       '      "logprobs": null,\n'
                       '      "message": {\n'
                       '        "content": "Could you please provide me with '
                       'your order ID so I can check the delivery status for '
                       'you?",\n'
                       '        "refusal": null,\n'
                       '        "role": "assistant"\n'
                       '      }\n'
                       '    }\n'
                       '  ],\n'
                       '  "created": 1730114051,\n'
                       '  "model": "gpt-4o-2024-08-06",\n'
                       '  "object": "chat.completion",\n'
                       '  "system_fingerprint": "fp_72bbfa6014",\n'
                       '  "usage": {\n'
                       '    "completion_tokens": 20,\n'
                       '    "prompt_tokens": 99,\n'
                       '    "total_tokens": 119,\n'
                       '    "completion_tokens_details": {\n'
                       '      "reasoning_tokens": 0\n'
                       '    },\n'
                       '    "prompt_tokens_details": {\n'
                       '      "cached_tokens": 0\n'
                       '    }\n'
                       '  }\n'
                       '}'],
              'datatype': 'BYTES',
              'name': 'output_all',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]}],
 'parameters': {}}

We construct and send another request in which we append the LLM's answer and the user's answer.

inference_request = {
    "inputs": [
        {
            "name": "role",
            "shape": [3],
            "datatype": "BYTES",
            "data": [
                "system", 
                "user",
                "assistant",
                "user",
            ]
        },
        {
            "name": "content",
            "shape": [3],
            "datatype": "BYTES",
            "data": [
                json.dumps(["You are a helpful customer support assistant. Use the supplied tools to assist the user."]), 
                json.dumps(["Where is my package?"]),
                json.dumps(["Could you please provide me with your order ID so I can check the delivery status for you?"]),
                json.dumps(["I think it is order_12345"])
            ]
        },
        {
            "name": "type",
            "shape": [3],
            "datatype": "BYTES",
            "data": [
                json.dumps(["text"]), 
                json.dumps(["text"]),
                json.dumps(["text"]),
                json.dumps(["text"])
            ]
        },
        {
            "name": "tools",
            "shape": [1],
            "datatype": "BYTES",
            "data": tools
        }
    ],
    "parameters": {
        "llm_parameters": {
            "temperature": 0.0,  # we set temperature to 0.0 to get deterministic results
        }
    },
}
endpoint = f"http://{get_mesh_ip()}/v2/models/openai-chat-completions/infer"
response = requests.post(endpoint, json=inference_request)
pprint.pprint(response.json(), depth=4)
{'id': '891cb35c-d15e-430c-b5e4-3f7fb89f4fca',
 'model_name': 'openai-chat-completions_1',
 'model_version': '1',
 'outputs': [{'data': ['assistant'],
              'datatype': 'BYTES',
              'name': 'role',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': [''],
              'datatype': 'BYTES',
              'name': 'content',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': ['text'],
              'datatype': 'BYTES',
              'name': 'type',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': ['{"id":"call_EWtFXqSLDUmfp04D3YKXjS4m","function":{"arguments":"{\\"order_id\\":\\"order_12345\\"}","name":"get_delivery_date"},"type":"function"}'],
              'datatype': 'BYTES',
              'name': 'tool_calls',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': ['{\n'
                       '  "id": "chatcmpl-ANHzKtAxHYPsln7wCCyIdlvpNsPbT",\n'
                       '  "choices": [\n'
                       '    {\n'
                       '      "finish_reason": "tool_calls",\n'
                       '      "index": 0,\n'
                       '      "logprobs": null,\n'
                       '      "message": {\n'
                       '        "content": null,\n'
                       '        "refusal": null,\n'
                       '        "role": "assistant",\n'
                       '        "tool_calls": [\n'
                       '          {\n'
                       '            "id": "call_EWtFXqSLDUmfp04D3YKXjS4m",\n'
                       '            "function": {\n'
                       '              "arguments": '
                       '"{\\"order_id\\":\\"order_12345\\"}",\n'
                       '              "name": "get_delivery_date"\n'
                       '            },\n'
                       '            "type": "function"\n'
                       '          }\n'
                       '        ]\n'
                       '      }\n'
                       '    }\n'
                       '  ],\n'
                       '  "created": 1730114054,\n'
                       '  "model": "gpt-4o-2024-08-06",\n'
                       '  "object": "chat.completion",\n'
                       '  "system_fingerprint": "fp_72bbfa6014",\n'
                       '  "usage": {\n'
                       '    "completion_tokens": 19,\n'
                       '    "prompt_tokens": 134,\n'
                       '    "total_tokens": 153,\n'
                       '    "completion_tokens_details": {\n'
                       '      "reasoning_tokens": 0\n'
                       '    },\n'
                       '    "prompt_tokens_details": {\n'
                       '      "cached_tokens": 0\n'
                       '    }\n'
                       '  }\n'
                       '}'],
              'datatype': 'BYTES',
              'name': 'output_all',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]}],
 'parameters': {}}

The model has responded now with tool_calls which contains the json of the parameters we have to call our function with. We decode the inference response and call our function to get the response.

# extract tool_calls
tool_calls = response.json()["outputs"][3]["data"]
tool_calls_dec = [json.loads(tool_call) for tool_call in tool_calls]

# decode args
args = json.loads(tool_calls_dec[0]["function"]["arguments"])

# call function
delivery_date = get_delivery_date(**args)
delivery_date
'2024-10-28 11:14:16'
tool_content = json.dumps({
    "order_id": "order_12345",
    "delivery_date": delivery_date
})

Now after we have the delivery date, we feed it back to the LLM.

inference_request = {
    "inputs": [
        {
            "name": "role",
            "shape": [6],
            "datatype": "BYTES",
            "data": [
                "system", 
                "user",
                "assistant",
                "user",
                "assistant",
                "tool",
            ]
        },
        {
            "name": "content",
            "shape": [6],
            "datatype": "BYTES",
            "data": [
                json.dumps(["You are a helpful customer support assistant. Use the supplied tools to assist the user."]), 
                json.dumps(["Where is my package?"]),
                json.dumps(["Could you please provide me with your order ID so I can check the delivery status for you?"]),
                json.dumps(["I think it is order_12345"]),
                json.dumps([""]),
                json.dumps([tool_content])
            ]
        },
        {
            "name": "type",
            "shape": [6],
            "datatype": "BYTES",
            "data": [
                json.dumps(["text"]), 
                json.dumps(["text"]),
                json.dumps(["text"]),
                json.dumps(["text"]),
                json.dumps(["text"]),
                json.dumps(["text"])
            ]
        },
        {
            "name": "tool_calls",
            "shape": [6],
            "datatype": "BYTES",
            "data": [
                "", 
                "",
                "",
                "",
                json.dumps(tool_calls),
                ""
            ]
        },
        {
            "name": "tool_call_id",
            "shape": [6],
            "datatype": "BYTES",
            "data": [
                "", 
                "",
                "",
                "",
                "",
                tool_calls_dec[0]['id']
            ]
        },
        {
            "name": "tools",
            "shape": [1],
            "datatype": "BYTES",
            "data": tools
        }
    ],
    "parameters": {
        "llm_parameters": {
            "temperature": 0.0,  # we set temperature to 0.0 to get deterministic results
        }
    },
}
endpoint = f"http://{get_mesh_ip()}/v2/models/openai-chat-completions/infer"
response = requests.post(endpoint, json=inference_request)
pprint.pprint(response.json(), depth=4)
{'id': 'c73f9993-bf29-40bb-84c9-319e765cba02',
 'model_name': 'openai-chat-completions_1',
 'model_version': '1',
 'outputs': [{'data': ['assistant'],
              'datatype': 'BYTES',
              'name': 'role',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': ['Your package is scheduled to be delivered on October '
                       '28, 2024, at 11:14 AM. If you have any more questions '
                       'or need further assistance, feel free to ask!'],
              'datatype': 'BYTES',
              'name': 'content',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': ['text'],
              'datatype': 'BYTES',
              'name': 'type',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': ['{\n'
                       '  "id": "chatcmpl-ANHzSjeu51PsLan3zi8T0vzKTp6gM",\n'
                       '  "choices": [\n'
                       '    {\n'
                       '      "finish_reason": "stop",\n'
                       '      "index": 0,\n'
                       '      "logprobs": null,\n'
                       '      "message": {\n'
                       '        "content": "Your package is scheduled to be '
                       'delivered on October 28, 2024, at 11:14 AM. If you '
                       'have any more questions or need further assistance, '
                       'feel free to ask!",\n'
                       '        "refusal": null,\n'
                       '        "role": "assistant"\n'
                       '      }\n'
                       '    }\n'
                       '  ],\n'
                       '  "created": 1730114062,\n'
                       '  "model": "gpt-4o-2024-08-06",\n'
                       '  "object": "chat.completion",\n'
                       '  "system_fingerprint": "fp_72bbfa6014",\n'
                       '  "usage": {\n'
                       '    "completion_tokens": 40,\n'
                       '    "prompt_tokens": 190,\n'
                       '    "total_tokens": 230,\n'
                       '    "completion_tokens_details": {\n'
                       '      "reasoning_tokens": 0\n'
                       '    },\n'
                       '    "prompt_tokens_details": {\n'
                       '      "cached_tokens": 0\n'
                       '    }\n'
                       '  }\n'
                       '}'],
              'datatype': 'BYTES',
              'name': 'output_all',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]}],
 'parameters': {}}

We can observe that the date we provided is included in the LLMs response. One inconvenience with the request above is the necessity of keeping track of the conversation history and the necessity of introducing some sort of padding. Those to problems will be addressed by the memory component which can keep track of the conversation history automatically.

Parallel function calling

Besides function calling, the OpenAI runtime includes support for parallel function calling as well.

tool_calls=[
    json.dumps({
        "id": "call_62136355",
        "function": {
            "name": "check_weather",
            "arguments": json.dumps({"city": "New York"})
        },
        "type": "function"
    }),
    json.dumps({
        "id": "call_62136356",
        "function": {
            "name": "check_weather",
            "arguments": json.dumps({"city": "London"})
        },
        "type": "function"
    }),
    json.dumps({
        "id": "call_62136357",
        "function": {
            "name": "check_weather",
            "arguments": json.dumps({"city": "Tokyo"})
        },
        "type": "function"
    })
]

# decode tool calls
tool_calls_dec = [json.loads(tool_call) for tool_call in tool_calls]

# Assume we have fetched the weather data from somewhere
weather_data = {
    "New York": {"temperature": "22°C", "condition": "Sunny"},
    "London": {"temperature": "15°C", "condition": "Cloudy"},
    "Tokyo": {"temperature": "25°C", "condition": "Rainy"}
}
inference_request = {
    "inputs": [
        {
            "name": "role",
            "shape": [6],
            "datatype": "BYTES",
            "data": [
                "system", 
                "user",
                "assistant",
                "tool",
                "tool",
                "tool"
            ]
        },
        {
            "name": "content",
            "shape": [6],
            "datatype": "BYTES",
            "data": [
                json.dumps(["You are a helpful assistant providing weather updates."]), 
                json.dumps(["Can you tell me the weather in New York, London, and Tokyo?"]),
                json.dumps([""]),
                json.dumps([
                    json.dumps({"city": "New York", "weather": weather_data["New York"]}),
                ]),
                json.dumps([
                    json.dumps({"city": "London", "weather": weather_data["London"]}),
                ]),
                json.dumps([
                    json.dumps({"city": "Tokyo", "weather": weather_data["Tokyo"]}),
                ])
            ]
        },
        {
            "name": "type",
            "shape": [6],
            "datatype": "BYTES",
            "data": [
                json.dumps(["text"]), 
                json.dumps(["text"]),
                json.dumps(["text"]),
                json.dumps(["text"]),
                json.dumps(["text"]),
                json.dumps(["text"]),
            ]
        },
        {
            "name": "tool_calls",
            "shape": [6],
            "datatype": "BYTES",
            "data": [
                "", 
                "",
                json.dumps(tool_calls),
                "",
                "",
                ""
            ]
        },
        {
            "name": "tool_call_id",
            "shape": [6],
            "datatype": "BYTES",
            "data": [
                "", 
                "",
                "",
                tool_calls_dec[0]['id'],
                tool_calls_dec[1]['id'],
                tool_calls_dec[2]['id']
            ]
        }
    ]
}
endpoint = f"http://{get_mesh_ip()}/v2/models/openai-chat-completions/infer"
response = requests.post(endpoint, json=inference_request)
pprint.pprint(response.json(), depth=4)
{'id': 'c68838d4-d10c-4c80-92da-ecd5770e94a9',
 'model_name': 'openai-chat-completions_1',
 'model_version': '1',
 'outputs': [{'data': ['assistant'],
              'datatype': 'BYTES',
              'name': 'role',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': ['Here is the current weather for the cities you asked '
                       'about:\n'
                       '\n'
                       '- **New York:** 22°C and Sunny\n'
                       '- **London:** 15°C and Cloudy\n'
                       '- **Tokyo:** 25°C and Rainy\n'
                       '\n'
                       'If you need more details, feel free to ask!'],
              'datatype': 'BYTES',
              'name': 'content',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': ['text'],
              'datatype': 'BYTES',
              'name': 'type',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': ['{\n'
                       '  "id": "chatcmpl-ANHzVWM6B4y67U3BRQWVo2XAMaVBG",\n'
                       '  "choices": [\n'
                       '    {\n'
                       '      "finish_reason": "stop",\n'
                       '      "index": 0,\n'
                       '      "logprobs": null,\n'
                       '      "message": {\n'
                       '        "content": "Here is the current weather for '
                       'the cities you asked about:\\n\\n- **New York:** 22°C '
                       'and Sunny\\n- **London:** 15°C and Cloudy\\n- '
                       '**Tokyo:** 25°C and Rainy\\n\\nIf you need more '
                       'details, feel free to ask!",\n'
                       '        "refusal": null,\n'
                       '        "role": "assistant"\n'
                       '      }\n'
                       '    }\n'
                       '  ],\n'
                       '  "created": 1730114065,\n'
                       '  "model": "gpt-4o-2024-08-06",\n'
                       '  "object": "chat.completion",\n'
                       '  "system_fingerprint": "fp_90354628f2",\n'
                       '  "usage": {\n'
                       '    "completion_tokens": 56,\n'
                       '    "prompt_tokens": 189,\n'
                       '    "total_tokens": 245,\n'
                       '    "completion_tokens_details": {\n'
                       '      "reasoning_tokens": 0\n'
                       '    },\n'
                       '    "prompt_tokens_details": {\n'
                       '      "cached_tokens": 0\n'
                       '    }\n'
                       '  }\n'
                       '}'],
              'datatype': 'BYTES',
              'name': 'output_all',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]}],
 'parameters': {}}

Note that the LLM provides the answer for all provided cities.

Tool choice

In addition, to parallel function calling, the user can configure the function calling behaviour through the tool_choice tensor. For example, if we are interested in disabling function calling and force the model to only generate a user-facing message, you can either provide no tools, or set tool_choice to "none".

tools = [
    json.dumps({
        "type": "function",
        "function": {
            "name": "get_weather",
            "strict": True,
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"},
                    "unit": {"type": "string", "enum": ["c", "f"]},
                },
                "required": ["location", "unit"],
                "additionalProperties": False,
            },
        },
    }),
    json.dumps({
        "type": "function",
        "function": {
            "name": "get_stock_price",
            "strict": True,
            "parameters": {
                "type": "object",
                "properties": {
                    "symbol": {"type": "string"},
                },
                "required": ["symbol"],
                "additionalProperties": False,
            },
        },
    }),
]
inference_request = {
    "inputs": [
        {
            "name": "role",
            "shape": [1],
            "datatype": "BYTES",
            "data": ["user"]
        },
        {
            "name": "content",
            "shape": [1],
            "datatype": "BYTES",
            "data": ["What is the weather like in Boston today?"]
        },
        {
            "name": "type",
            "shape": [1],
            "datatype": "BYTES",
            "data": ["text"]
        },
        {
            "name": "tools",
            "shape": [2],
            "datatype": "BYTES",
            "data": tools
        },
        {
            "name": "tool_choice",
            "shape": [1],
            "datatype": "BYTES",
            "data": ["required"]
        }
    ]
}
endpoint = f"http://{get_mesh_ip()}/v2/models/openai-chat-completions/infer"
response = requests.post(endpoint, json=inference_request)
pprint.pprint(response.json(), depth=4)
{'id': '08ec4f69-9702-4e93-9126-3df5ada10542',
 'model_name': 'openai-chat-completions_1',
 'model_version': '1',
 'outputs': [{'data': ['assistant'],
              'datatype': 'BYTES',
              'name': 'role',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': [''],
              'datatype': 'BYTES',
              'name': 'content',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': ['text'],
              'datatype': 'BYTES',
              'name': 'type',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': ['{"id":"call_gUpJlV6hD7VMJBDr4QhMp042","function":{"arguments":"{\\"location\\":\\"Boston\\",\\"unit\\":\\"c\\"}","name":"get_weather"},"type":"function"}'],
              'datatype': 'BYTES',
              'name': 'tool_calls',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': ['{\n'
                       '  "id": "chatcmpl-ANHzZJGR402hVSHXaOS7z3WrcPDlf",\n'
                       '  "choices": [\n'
                       '    {\n'
                       '      "finish_reason": "tool_calls",\n'
                       '      "index": 0,\n'
                       '      "logprobs": null,\n'
                       '      "message": {\n'
                       '        "content": null,\n'
                       '        "refusal": null,\n'
                       '        "role": "assistant",\n'
                       '        "tool_calls": [\n'
                       '          {\n'
                       '            "id": "call_gUpJlV6hD7VMJBDr4QhMp042",\n'
                       '            "function": {\n'
                       '              "arguments": '
                       '"{\\"location\\":\\"Boston\\",\\"unit\\":\\"c\\"}",\n'
                       '              "name": "get_weather"\n'
                       '            },\n'
                       '            "type": "function"\n'
                       '          }\n'
                       '        ]\n'
                       '      }\n'
                       '    }\n'
                       '  ],\n'
                       '  "created": 1730114069,\n'
                       '  "model": "gpt-4o-2024-08-06",\n'
                       '  "object": "chat.completion",\n'
                       '  "system_fingerprint": "fp_90354628f2",\n'
                       '  "usage": {\n'
                       '    "completion_tokens": 18,\n'
                       '    "prompt_tokens": 71,\n'
                       '    "total_tokens": 89,\n'
                       '    "completion_tokens_details": {\n'
                       '      "reasoning_tokens": 0\n'
                       '    },\n'
                       '    "prompt_tokens_details": {\n'
                       '      "cached_tokens": 0\n'
                       '    }\n'
                       '  }\n'
                       '}'],
              'datatype': 'BYTES',
              'name': 'output_all',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]}],
 'parameters': {}}
inference_request = {
    "inputs": [
        {
            "name": "role",
            "shape": [1],
            "datatype": "BYTES",
            "data": ["user"]
        },
        {
            "name": "content",
            "shape": [1],
            "datatype": "BYTES",
            "data": ["What is the stock price of AAPL?"]
        },
        {
            "name": "type",
            "shape": [1],
            "datatype": "BYTES",
            "data": ["text"]
        },
        {
            "name": "tools",
            "shape": [2],
            "datatype": "BYTES",
            "data": tools
        },
        {
            "name": "tool_choice",
            "shape": [1],
            "datatype": "BYTES",
            "data": [
                json.dumps({
                    "type": "function", "function": {"name": "get_stock_price"}
                })
            ]
        }
    ]
}
endpoint = f"http://{get_mesh_ip()}/v2/models/openai-chat-completions/infer"
response = requests.post(endpoint, json=inference_request)
pprint.pprint(response.json(), depth=4)
{'id': 'aacbd0d2-f847-42f6-aa03-29bcd38da7e7',
 'model_name': 'openai-chat-completions_1',
 'model_version': '1',
 'outputs': [{'data': ['assistant'],
              'datatype': 'BYTES',
              'name': 'role',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': [''],
              'datatype': 'BYTES',
              'name': 'content',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': ['text'],
              'datatype': 'BYTES',
              'name': 'type',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': ['{"id":"call_D2jjcH59v10haTVQYieEmdgC","function":{"arguments":"{\\"symbol\\":\\"AAPL\\"}","name":"get_stock_price"},"type":"function"}'],
              'datatype': 'BYTES',
              'name': 'tool_calls',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]},
             {'data': ['{\n'
                       '  "id": "chatcmpl-ANHzdRooTX6TTxYOEf9hzsRxwQVfU",\n'
                       '  "choices": [\n'
                       '    {\n'
                       '      "finish_reason": "stop",\n'
                       '      "index": 0,\n'
                       '      "logprobs": null,\n'
                       '      "message": {\n'
                       '        "content": null,\n'
                       '        "refusal": null,\n'
                       '        "role": "assistant",\n'
                       '        "tool_calls": [\n'
                       '          {\n'
                       '            "id": "call_D2jjcH59v10haTVQYieEmdgC",\n'
                       '            "function": {\n'
                       '              "arguments": '
                       '"{\\"symbol\\":\\"AAPL\\"}",\n'
                       '              "name": "get_stock_price"\n'
                       '            },\n'
                       '            "type": "function"\n'
                       '          }\n'
                       '        ]\n'
                       '      }\n'
                       '    }\n'
                       '  ],\n'
                       '  "created": 1730114073,\n'
                       '  "model": "gpt-4o-2024-08-06",\n'
                       '  "object": "chat.completion",\n'
                       '  "system_fingerprint": "fp_90354628f2",\n'
                       '  "usage": {\n'
                       '    "completion_tokens": 6,\n'
                       '    "prompt_tokens": 81,\n'
                       '    "total_tokens": 87,\n'
                       '    "completion_tokens_details": {\n'
                       '      "reasoning_tokens": 0\n'
                       '    },\n'
                       '    "prompt_tokens_details": {\n'
                       '      "cached_tokens": 0\n'
                       '    }\n'
                       '  }\n'
                       '}'],
              'datatype': 'BYTES',
              'name': 'output_all',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]}],
 'parameters': {}}

Congrats, you've now leveraged an OpenAI model to call functions using a model deployed in Kubernetes! To remove the model, run the following command:

!kubectl delete -f manifests/openai-chat-completions.yaml -n seldon
model.mlops.seldon.io "openai-chat-completions" deleted
server.mlops.seldon.io "mlserver-llm-api" deleted
secret "openai-api-key" deleted

Last updated

Was this helpful?