Function Calling

In this example we will showcase how to deploy a chat bot agent with function calling capabilities. We will create a pipeline consisting of the a model deployed using the API runtime and two memory models. What differs from using the function calling from the API Runtime as a standalone component (see example here) is the automatic memory management by the Memory Runtime. This relieves the burden for the user to keep track of the conversation history.

To run this tutorial you will need to create a secret with the OpenaAI API key and have the API and Memory Runtimes up and running. Please check our installation tutorial to see how you can set up all those components.

After deploying both of our servers, we can now deploy the models on top of them.

The model-settings.json for the API model looks as follows:

!cat models/chatgpt/model-settings.json
{
    "name": "chatgpt",
    "implementation": "mlserver_llm_api.LLMRuntime",
    "parameters": {
      "extra": {
        "provider_id": "openai",
        "config": {
          "model_id": "gpt-4o",
          "model_type": "chat.completions"
        }
      }
    }
  }

In this example, we will deploy a gpt-4o model for chat completions.

The model-settings.json file for the memory model looks as follows:

!cat models/memory/model-settings.json
{
    "name": "memory",
    "implementation": "mlserver_memory.ConversationalMemory",
    "parameters": {
        "extra": {
            "database": "filesys",
            "config": {
                "window_size": 100,
                "tensor_names": ["content", "role", "type", "tool_call_id", "tool_calls"]
            }
        }
    }
}

Note that we've included two additional columns from the chat-bot example: "tool_call_id" and "tool_calls".

To deploy all the models, we will use the following manifest:

!cat manifests/models.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: combine-question
spec:
  storageUri: "gs://seldon-models/llm-runtimes-settings/function-calling/models/memory"
  requirements:
  - memory
---
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: combine-answer
spec:
  storageUri: "gs://seldon-models/llm-runtimes-settings/function-calling/models/memory"
  requirements:
  - memory
---
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: chatgpt
spec:
  storageUri: "gs://seldon-models/llm-runtimes-settings/function-calling/models/chatgpt"
  requirements:
  - openai

To deploy the models, run the following command:

!kubectl apply -f manifests/models.yaml -n seldon
!kubectl wait --for condition=ready --timeout=600s model --all -n seldon
model.mlops.seldon.io/combine-question created
model.mlops.seldon.io/combine-answer created
model.mlops.seldon.io/chatgpt created
model.mlops.seldon.io/chatgpt condition met
model.mlops.seldon.io/combine-answer condition met
model.mlops.seldon.io/combine-question condition met

At this point we should have all the necessary models up and running. We can now create the pipeline.

The manifest file for the pipeline looks as follows:

!cat manifests/pipeline.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: chat-app
spec:
  steps:
    - name: combine-question
      inputs:
        - chat-app.inputs.memory_id
        - chat-app.inputs.role
        - chat-app.inputs.content
        - chat-app.inputs.type
        - chat-app.inputs.tool_call_id
    - name: chatgpt
      inputs:
        - combine-question.outputs.history
        - chat-app.inputs.tools
    - name: combine-answer
      inputs:
      - chat-app.inputs.memory_id
      - chatgpt.outputs.role
      - chatgpt.outputs.content
      - chatgpt.outputs.type
      - chatgpt.outputs.tool_calls
  output:
    steps:
    - chatgpt
    - combine-answer

The main difference from the chat-bot example is that some tensor inputs are optional. Thus, we have to use outer joins for the inputs of our pipeline components. We can now deploy our pipeline by running the following command:

!kubectl apply -f manifests/pipeline.yaml -n seldon
!kubectl wait --for condition=ready --timeout=600s pipeline --all -n seldon
pipeline.mlops.seldon.io/chat-app created
pipeline.mlops.seldon.io/chat-app condition met

Once the pipeline is ready, we can now start sending requests. We will begin by defining some helper function to do so:

import requests
import subprocess
from typing import Optional, Union, List

def get_mesh_ip():
    cmd = f"kubectl get svc seldon-mesh -n seldon -o jsonpath='{{.status.loadBalancer.ingress[0].ip}}'"
    return subprocess.check_output(cmd, shell=True).decode('utf-8')


def send_request(
    memory_id: str,
    content: Union[str, List[str]],
    role: Union[str, List[str]],
    type: Union[str, List[str]],    
    tools: Optional[str] = None,
    tool_call_id: Optional[str] = None,
):
    if isinstance(content, str):
        content = [content]

    if isinstance(role, str):
        role = [role]
    
    if isinstance(type, str):
        type = [type]
        
    inputs = [
        {
            "name": "memory_id",
            "shape": [1],
            "datatype": "BYTES",
            "data": [memory_id],
            "parameters": {"content_type": "str"},
        },
        {
            "name": "role",
            "shape": [len(role)],
            "datatype": "BYTES",
            "data": role,
            "parameters": {"content_type": "str"},
        },
        {
            "name": "content",
            "shape": [len(content)],
            "datatype": "BYTES",
            "data": content,
            "parameters": {"content_type": "str"},
        },
        {
            "name": "type",
            "shape": [len(type)],
            "datatype": "BYTES",
            "data": type,
            "parameters": {"content_type": "str"},
        }
    ]
    
    if tools is not None:
        inputs.append(
            {
                "name": "tools",
                "shape": [len(tools)],
                "datatype": "BYTES",
                "data": tools,
                "parameters": {"content_type": "str"},
            }
        )
        
    if tool_call_id is not None:
        tool_call_id = [tool_call_id] if isinstance(tool_call_id, str) else tool_call_id
        inputs.append(
            {
                "name": "tool_call_id",
                "shape": [len(tool_call_id)],
                "datatype": "BYTES",
                "data": tool_call_id,
                "parameters": {"content_type": "str"},
            }
        )
    
    inference_request = {"inputs": inputs}
    endpoint =  f"http://{get_mesh_ip()}/v2/pipelines/chat-app/infer"
    return requests.post(
        endpoint,
        json=inference_request,
)

For our example, we will use the following tools and we will simulate a function which computes a delivery date:

import json
from uuid import uuid4
from datetime import datetime

def get_delivery_date(order_id: str):
    return datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    
tools = [
    json.dumps({
        "type": "function",
        "function": {
            "name": "get_delivery_date",
            "description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID."
                    }
                },
                "required": ["order_id"],
                "additionalProperties": False
            }
        }
    })
]

# define memory id
memory_id = str(uuid4())

Now that we have defined the tools, we can proceed and ask "Where is my package?":

response = send_request(
    memory_id=memory_id,
    role=["system", "user"],
    content=[
        json.dumps(["You are a helpful customer support assistant. Use the supplied tools to assist the user."]),
        json.dumps(["Where is my package?"]),
    ],
    type=[
        json.dumps(["text"]),
        json.dumps(["text"]),
    ],
    tools=tools
)

To which the model replies with:

response.json()["outputs"][1]["data"][0]
'Could you please provide me with your order ID so I can check the delivery date for you?'

We can provide a dummy id, "order_12345":

response = send_request(
    memory_id=memory_id,
    role="user",
    content="I think it is order_12345",
    type="text",
    tools=tools
)

Now, the model replied with the arguments which we have to pass to our get_delivery_date method and the id for the tool call:

response.json()["outputs"][3]['data'][0]
'{"id":"call_QTRjpGLVidVudGuyJtAvPMqp","function":{"arguments":"{\\"order_id\\":\\"order_12345\\"}","name":"get_delivery_date"},"type":"function"}'

We can the get_delivery_date with the arguments returned by the model:

# extract tool_calls
tool_calls = response.json()["outputs"][3]["data"]
tool_calls_dec = [json.loads(tool_call) for tool_call in tool_calls]

# decode args
args = json.loads(tool_calls_dec[0]["function"]["arguments"])

# call function
delivery_date = get_delivery_date(**args)
delivery_date
'2025-06-06 09:36:15'

We now send the response of calling the method get_delivery_date back to the model:

response = send_request(
    memory_id=memory_id,
    role="tool",
    content=json.dumps({
        "order_id": "order_12345",
        "delivery_date": delivery_date
    }),
    type="text",
    tool_call_id=tool_calls_dec[0]["id"],
    tools=tools,
)

The model is now provided with all the information to respond to our question:

response.json()["outputs"][1]["data"][0]
'Your package with order ID "order_12345" is expected to be delivered on June 6, 2025, at 9:36 AM. If you have any other questions or need further assistance, feel free to ask!'

To delete the pipeline run the following command:

!kubectl delete -f manifests/pipeline.yaml -n seldon
pipeline.mlops.seldon.io "chat-app" deleted

To unload the models, run the following command:

!kubectl delete -f manifests/models.yaml -n seldon
model.mlops.seldon.io "combine-question" deleted
model.mlops.seldon.io "combine-answer" deleted
model.mlops.seldon.io "chatgpt" deleted

Last updated

Was this helpful?