Function Calling
In this example we will showcase how to deploy a chat bot agent with function calling capabilities. We will create a pipeline consisting of the a model deployed using the API runtime and two memory models. What differs from using the function calling from the API Runtime as a standalone component (see example here) is the automatic memory management by the Memory Runtime. This relieves the burden for the user to keep track of the conversation history.
To run this tutorial you will need to create a secret with the OpenaAI API key and have the API and Memory Runtimes up and running. Please check our installation tutorial to see how you can set up all those components.
After deploying both of our servers, we can now deploy the models on top of them.
The model-settings.json
for the API model looks as follows:
!cat models/chatgpt/model-settings.json
{
"name": "chatgpt",
"implementation": "mlserver_llm_api.LLMRuntime",
"parameters": {
"extra": {
"provider_id": "openai",
"config": {
"model_id": "gpt-4o",
"model_type": "chat.completions"
}
}
}
}
In this example, we will deploy a gpt-4o
model for chat completions.
The model-settings.json
file for the memory model looks as follows:
!cat models/memory/model-settings.json
{
"name": "memory",
"implementation": "mlserver_memory.ConversationalMemory",
"parameters": {
"extra": {
"database": "filesys",
"config": {
"window_size": 100,
"tensor_names": ["content", "role", "type", "tool_call_id", "tool_calls"]
}
}
}
}
Note that we've included two additional columns from the chat-bot
example: "tool_call_id"
and "tool_calls"
.
To deploy all the models, we will use the following manifest:
!cat manifests/models.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
name: combine-question
spec:
storageUri: "gs://seldon-models/llm-runtimes-settings/function-calling/models/memory"
requirements:
- memory
---
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
name: combine-answer
spec:
storageUri: "gs://seldon-models/llm-runtimes-settings/function-calling/models/memory"
requirements:
- memory
---
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
name: chatgpt
spec:
storageUri: "gs://seldon-models/llm-runtimes-settings/function-calling/models/chatgpt"
requirements:
- openai
To deploy the models, run the following command:
!kubectl apply -f manifests/models.yaml -n seldon
!kubectl wait --for condition=ready --timeout=600s model --all -n seldon
model.mlops.seldon.io/combine-question created
model.mlops.seldon.io/combine-answer created
model.mlops.seldon.io/chatgpt created
model.mlops.seldon.io/chatgpt condition met
model.mlops.seldon.io/combine-answer condition met
model.mlops.seldon.io/combine-question condition met
At this point we should have all the necessary models up and running. We can now create the pipeline.
The manifest file for the pipeline looks as follows:
!cat manifests/pipeline.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
name: chat-app
spec:
steps:
- name: combine-question
inputs:
- chat-app.inputs.memory_id
- chat-app.inputs.role
- chat-app.inputs.content
- chat-app.inputs.type
- chat-app.inputs.tool_call_id
- name: chatgpt
inputs:
- combine-question.outputs.history
- chat-app.inputs.tools
- name: combine-answer
inputs:
- chat-app.inputs.memory_id
- chatgpt.outputs.role
- chatgpt.outputs.content
- chatgpt.outputs.type
- chatgpt.outputs.tool_calls
output:
steps:
- chatgpt
- combine-answer
The main difference from the chat-bot
example is that some tensor inputs are optional. Thus, we have to use outer joins for the inputs of our pipeline components. We can now deploy our pipeline by running the following command:
!kubectl apply -f manifests/pipeline.yaml -n seldon
!kubectl wait --for condition=ready --timeout=600s pipeline --all -n seldon
pipeline.mlops.seldon.io/chat-app created
pipeline.mlops.seldon.io/chat-app condition met
Once the pipeline is ready, we can now start sending requests. We will begin by defining some helper function to do so:
import requests
import subprocess
from typing import Optional, Union, List
def get_mesh_ip():
cmd = f"kubectl get svc seldon-mesh -n seldon -o jsonpath='{{.status.loadBalancer.ingress[0].ip}}'"
return subprocess.check_output(cmd, shell=True).decode('utf-8')
def send_request(
memory_id: str,
content: Union[str, List[str]],
role: Union[str, List[str]],
type: Union[str, List[str]],
tools: Optional[str] = None,
tool_call_id: Optional[str] = None,
):
if isinstance(content, str):
content = [content]
if isinstance(role, str):
role = [role]
if isinstance(type, str):
type = [type]
inputs = [
{
"name": "memory_id",
"shape": [1],
"datatype": "BYTES",
"data": [memory_id],
"parameters": {"content_type": "str"},
},
{
"name": "role",
"shape": [len(role)],
"datatype": "BYTES",
"data": role,
"parameters": {"content_type": "str"},
},
{
"name": "content",
"shape": [len(content)],
"datatype": "BYTES",
"data": content,
"parameters": {"content_type": "str"},
},
{
"name": "type",
"shape": [len(type)],
"datatype": "BYTES",
"data": type,
"parameters": {"content_type": "str"},
}
]
if tools is not None:
inputs.append(
{
"name": "tools",
"shape": [len(tools)],
"datatype": "BYTES",
"data": tools,
"parameters": {"content_type": "str"},
}
)
if tool_call_id is not None:
tool_call_id = [tool_call_id] if isinstance(tool_call_id, str) else tool_call_id
inputs.append(
{
"name": "tool_call_id",
"shape": [len(tool_call_id)],
"datatype": "BYTES",
"data": tool_call_id,
"parameters": {"content_type": "str"},
}
)
inference_request = {"inputs": inputs}
endpoint = f"http://{get_mesh_ip()}/v2/pipelines/chat-app/infer"
return requests.post(
endpoint,
json=inference_request,
)
For our example, we will use the following tools
and we will simulate a function which computes a delivery date:
import json
from uuid import uuid4
from datetime import datetime
def get_delivery_date(order_id: str):
return datetime.now().strftime('%Y-%m-%d %H:%M:%S')
tools = [
json.dumps({
"type": "function",
"function": {
"name": "get_delivery_date",
"description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'",
"parameters": {
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "The customer's order ID."
}
},
"required": ["order_id"],
"additionalProperties": False
}
}
})
]
# define memory id
memory_id = str(uuid4())
Now that we have defined the tools
, we can proceed and ask "Where is my package?"
:
response = send_request(
memory_id=memory_id,
role=["system", "user"],
content=[
json.dumps(["You are a helpful customer support assistant. Use the supplied tools to assist the user."]),
json.dumps(["Where is my package?"]),
],
type=[
json.dumps(["text"]),
json.dumps(["text"]),
],
tools=tools
)
To which the model replies with:
response.json()["outputs"][1]["data"][0]
'Could you please provide me with your order ID so I can check the delivery date for you?'
We can provide a dummy id, "order_12345"
:
response = send_request(
memory_id=memory_id,
role="user",
content="I think it is order_12345",
type="text",
tools=tools
)
Now, the model replied with the arguments which we have to pass to our get_delivery_date
method and the id for the tool call:
response.json()["outputs"][3]['data'][0]
'{"id":"call_QTRjpGLVidVudGuyJtAvPMqp","function":{"arguments":"{\\"order_id\\":\\"order_12345\\"}","name":"get_delivery_date"},"type":"function"}'
We can the get_delivery_date
with the arguments returned by the model:
# extract tool_calls
tool_calls = response.json()["outputs"][3]["data"]
tool_calls_dec = [json.loads(tool_call) for tool_call in tool_calls]
# decode args
args = json.loads(tool_calls_dec[0]["function"]["arguments"])
# call function
delivery_date = get_delivery_date(**args)
delivery_date
'2025-06-06 09:36:15'
We now send the response of calling the method get_delivery_date
back to the model:
response = send_request(
memory_id=memory_id,
role="tool",
content=json.dumps({
"order_id": "order_12345",
"delivery_date": delivery_date
}),
type="text",
tool_call_id=tool_calls_dec[0]["id"],
tools=tools,
)
The model is now provided with all the information to respond to our question:
response.json()["outputs"][1]["data"][0]
'Your package with order ID "order_12345" is expected to be delivered on June 6, 2025, at 9:36 AM. If you have any other questions or need further assistance, feel free to ask!'
To delete the pipeline run the following command:
!kubectl delete -f manifests/pipeline.yaml -n seldon
pipeline.mlops.seldon.io "chat-app" deleted
To unload the models, run the following command:
!kubectl delete -f manifests/models.yaml -n seldon
model.mlops.seldon.io "combine-question" deleted
model.mlops.seldon.io "combine-answer" deleted
model.mlops.seldon.io "chatgpt" deleted
Last updated
Was this helpful?