Function Calling
In this example we will showcase how to deploy a chat bot agent with function calling capabilities. We will create a pipeline consisting of the a model deployed using the API runtime and two memory models. What differs from using the function calling from the API Runtime as a standalone component (see example here) is the automatic memory management by the Memory Runtime. This relieves the burden for the user to keep track of the conversation history.
To run this tutorial you will need to create a secret with the OpenaAI API key and have the API and Memory Runtimes up and running. Please check our installation tutorial to see how you can set up all those components.
After deploying both of our servers, we can now deploy the models on top of them.
The model-settings.json for the API model looks as follows:
!cat models/chatgpt/model-settings.json{
"name": "chatgpt",
"implementation": "mlserver_llm_api.LLMRuntime",
"parameters": {
"extra": {
"provider_id": "openai",
"config": {
"model_id": "gpt-4o",
"model_type": "chat.completions"
}
}
}
}In this example, we will deploy a gpt-4o model for chat completions.
The model-settings.json file for the memory model looks as follows:
!cat models/memory/model-settings.json{
"name": "memory",
"implementation": "mlserver_memory.ConversationalMemory",
"parameters": {
"extra": {
"database": "filesys",
"config": {
"window_size": 100,
"tensor_names": ["content", "role", "type", "tool_call_id", "tool_calls"]
}
}
}
}Note that we've included two additional columns from the chat-bot example: "tool_call_id" and "tool_calls".
To deploy all the models, we will use the following manifest:
!cat manifests/models.yamlapiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
name: combine-question
spec:
storageUri: "gs://seldon-models/llm-runtimes-settings/function-calling/models/memory"
requirements:
- memory
---
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
name: combine-answer
spec:
storageUri: "gs://seldon-models/llm-runtimes-settings/function-calling/models/memory"
requirements:
- memory
---
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
name: chatgpt
spec:
storageUri: "gs://seldon-models/llm-runtimes-settings/function-calling/models/chatgpt"
requirements:
- openaiTo deploy the models, run the following command:
!kubectl apply -f manifests/models.yaml -n seldon
!kubectl wait --for condition=ready --timeout=600s model --all -n seldonmodel.mlops.seldon.io/combine-question created
model.mlops.seldon.io/combine-answer created
model.mlops.seldon.io/chatgpt created
model.mlops.seldon.io/chatgpt condition met
model.mlops.seldon.io/combine-answer condition met
model.mlops.seldon.io/combine-question condition metAt this point we should have all the necessary models up and running. We can now create the pipeline.
The manifest file for the pipeline looks as follows:
!cat manifests/pipeline.yamlapiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
name: chat-app
spec:
steps:
- name: combine-question
inputs:
- chat-app.inputs.memory_id
- chat-app.inputs.role
- chat-app.inputs.content
- chat-app.inputs.type
- chat-app.inputs.tool_call_id
- name: chatgpt
inputs:
- combine-question.outputs.history
- chat-app.inputs.tools
- name: combine-answer
inputs:
- chat-app.inputs.memory_id
- chatgpt.outputs.role
- chatgpt.outputs.content
- chatgpt.outputs.type
- chatgpt.outputs.tool_calls
output:
steps:
- chatgpt
- combine-answerThe main difference from the chat-bot example is that some tensor inputs are optional. Thus, we have to use outer joins for the inputs of our pipeline components. We can now deploy our pipeline by running the following command:
!kubectl apply -f manifests/pipeline.yaml -n seldon
!kubectl wait --for condition=ready --timeout=600s pipeline --all -n seldonpipeline.mlops.seldon.io/chat-app created
pipeline.mlops.seldon.io/chat-app condition metOnce the pipeline is ready, we can now start sending requests. We will begin by defining some helper function to do so:
import requests
import subprocess
from typing import Optional, Union, List
def get_mesh_ip():
cmd = f"kubectl get svc seldon-mesh -n seldon -o jsonpath='{{.status.loadBalancer.ingress[0].ip}}'"
return subprocess.check_output(cmd, shell=True).decode('utf-8')
def send_request(
memory_id: str,
content: Union[str, List[str]],
role: Union[str, List[str]],
type: Union[str, List[str]],
tools: Optional[str] = None,
tool_call_id: Optional[str] = None,
):
if isinstance(content, str):
content = [content]
if isinstance(role, str):
role = [role]
if isinstance(type, str):
type = [type]
inputs = [
{
"name": "memory_id",
"shape": [1],
"datatype": "BYTES",
"data": [memory_id],
"parameters": {"content_type": "str"},
},
{
"name": "role",
"shape": [len(role)],
"datatype": "BYTES",
"data": role,
"parameters": {"content_type": "str"},
},
{
"name": "content",
"shape": [len(content)],
"datatype": "BYTES",
"data": content,
"parameters": {"content_type": "str"},
},
{
"name": "type",
"shape": [len(type)],
"datatype": "BYTES",
"data": type,
"parameters": {"content_type": "str"},
}
]
if tools is not None:
inputs.append(
{
"name": "tools",
"shape": [len(tools)],
"datatype": "BYTES",
"data": tools,
"parameters": {"content_type": "str"},
}
)
if tool_call_id is not None:
tool_call_id = [tool_call_id] if isinstance(tool_call_id, str) else tool_call_id
inputs.append(
{
"name": "tool_call_id",
"shape": [len(tool_call_id)],
"datatype": "BYTES",
"data": tool_call_id,
"parameters": {"content_type": "str"},
}
)
inference_request = {"inputs": inputs}
endpoint = f"http://{get_mesh_ip()}/v2/pipelines/chat-app/infer"
return requests.post(
endpoint,
json=inference_request,
)For our example, we will use the following tools and we will simulate a function which computes a delivery date:
import json
from uuid import uuid4
from datetime import datetime
def get_delivery_date(order_id: str):
return datetime.now().strftime('%Y-%m-%d %H:%M:%S')
tools = [
json.dumps({
"type": "function",
"function": {
"name": "get_delivery_date",
"description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'",
"parameters": {
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "The customer's order ID."
}
},
"required": ["order_id"],
"additionalProperties": False
}
}
})
]
# define memory id
memory_id = str(uuid4())Now that we have defined the tools, we can proceed and ask "Where is my package?":
response = send_request(
memory_id=memory_id,
role=["system", "user"],
content=[
json.dumps(["You are a helpful customer support assistant. Use the supplied tools to assist the user."]),
json.dumps(["Where is my package?"]),
],
type=[
json.dumps(["text"]),
json.dumps(["text"]),
],
tools=tools
)To which the model replies with:
response.json()["outputs"][1]["data"][0]'Could you please provide me with your order ID so I can check the delivery date for you?'We can provide a dummy id, "order_12345":
response = send_request(
memory_id=memory_id,
role="user",
content="I think it is order_12345",
type="text",
tools=tools
)Now, the model replied with the arguments which we have to pass to our get_delivery_date method and the id for the tool call:
response.json()["outputs"][3]['data'][0]'{"id":"call_QTRjpGLVidVudGuyJtAvPMqp","function":{"arguments":"{\\"order_id\\":\\"order_12345\\"}","name":"get_delivery_date"},"type":"function"}'We can the get_delivery_date with the arguments returned by the model:
# extract tool_calls
tool_calls = response.json()["outputs"][3]["data"]
tool_calls_dec = [json.loads(tool_call) for tool_call in tool_calls]
# decode args
args = json.loads(tool_calls_dec[0]["function"]["arguments"])
# call function
delivery_date = get_delivery_date(**args)
delivery_date'2025-06-06 09:36:15'We now send the response of calling the method get_delivery_date back to the model:
response = send_request(
memory_id=memory_id,
role="tool",
content=json.dumps({
"order_id": "order_12345",
"delivery_date": delivery_date
}),
type="text",
tool_call_id=tool_calls_dec[0]["id"],
tools=tools,
)The model is now provided with all the information to respond to our question:
response.json()["outputs"][1]["data"][0]'Your package with order ID "order_12345" is expected to be delivered on June 6, 2025, at 9:36 AM. If you have any other questions or need further assistance, feel free to ask!'To delete the pipeline run the following command:
!kubectl delete -f manifests/pipeline.yaml -n seldonpipeline.mlops.seldon.io "chat-app" deletedTo unload the models, run the following command:
!kubectl delete -f manifests/models.yaml -n seldonmodel.mlops.seldon.io "combine-question" deleted
model.mlops.seldon.io "combine-answer" deleted
model.mlops.seldon.io "chatgpt" deletedLast updated
Was this helpful?