# Function Calling

In this example we will showcase how to deploy a chat bot agent with function calling capabilities. We will create a pipeline consisting of the a model deployed using the API runtime and two memory models. What differs from using the function calling from the API Runtime as a standalone component (see example [here](/llm-module/use-cases/function-calling/openai-func-call.md)) is the automatic memory management by the Memory Runtime. This relieves the burden for the user to keep track of the conversation history.

To run this tutorial you will need to create a secret with the OpenaAI API key and have the API and Memory Runtimes up and running. Please check our [installation tutorial](/llm-module/introduction/installation.md) to see how you can set up all those components.

After deploying both of our servers, we can now deploy the models on top of them.

The `model-settings.json` for the API model looks as follows:

```python
!cat models/chatgpt/model-settings.json
```

```
{
    "name": "chatgpt",
    "implementation": "mlserver_llm_api.LLMRuntime",
    "parameters": {
      "extra": {
        "provider_id": "openai",
        "config": {
          "model_id": "gpt-4o",
          "model_type": "chat.completions"
        }
      }
    }
  }
```

In this example, we will deploy a `gpt-4o` model for chat completions.

The `model-settings.json` file for the memory model looks as follows:

```python
!cat models/memory/model-settings.json
```

```
{
    "name": "memory",
    "implementation": "mlserver_memory.ConversationalMemory",
    "parameters": {
        "extra": {
            "database": "filesys",
            "config": {
                "window_size": 100,
                "tensor_names": ["content", "role", "type", "tool_call_id", "tool_calls"]
            }
        }
    }
}
```

Note that we've included two additional columns from the `chat-bot` example: `"tool_call_id"` and `"tool_calls"`.

To deploy all the models, we will use the following manifest:

```python
!cat manifests/models.yaml
```

```
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: combine-question
spec:
  storageUri: "gs://seldon-models/llm-runtimes-settings/function-calling/models/memory"
  requirements:
  - memory
---
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: combine-answer
spec:
  storageUri: "gs://seldon-models/llm-runtimes-settings/function-calling/models/memory"
  requirements:
  - memory
---
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: chatgpt
spec:
  storageUri: "gs://seldon-models/llm-runtimes-settings/function-calling/models/chatgpt"
  requirements:
  - openai
```

To deploy the models, run the following command:

```python
!kubectl apply -f manifests/models.yaml -n seldon
!kubectl wait --for condition=ready --timeout=600s model --all -n seldon
```

```
model.mlops.seldon.io/combine-question created
model.mlops.seldon.io/combine-answer created
model.mlops.seldon.io/chatgpt created
model.mlops.seldon.io/chatgpt condition met
model.mlops.seldon.io/combine-answer condition met
model.mlops.seldon.io/combine-question condition met
```

At this point we should have all the necessary models up and running. We can now create the pipeline.

The manifest file for the pipeline looks as follows:

```python
!cat manifests/pipeline.yaml
```

```
apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: chat-app
spec:
  steps:
    - name: combine-question
      inputs:
        - chat-app.inputs.memory_id
        - chat-app.inputs.role
        - chat-app.inputs.content
        - chat-app.inputs.type
        - chat-app.inputs.tool_call_id
    - name: chatgpt
      inputs:
        - combine-question.outputs.history
        - chat-app.inputs.tools
    - name: combine-answer
      inputs:
      - chat-app.inputs.memory_id
      - chatgpt.outputs.role
      - chatgpt.outputs.content
      - chatgpt.outputs.type
      - chatgpt.outputs.tool_calls
  output:
    steps:
    - chatgpt
    - combine-answer
```

The main difference from the `chat-bot` example is that some tensor inputs are optional. Thus, we have to use outer joins for the inputs of our pipeline components. We can now deploy our pipeline by running the following command:

```python
!kubectl apply -f manifests/pipeline.yaml -n seldon
!kubectl wait --for condition=ready --timeout=600s pipeline --all -n seldon
```

```
pipeline.mlops.seldon.io/chat-app created
pipeline.mlops.seldon.io/chat-app condition met
```

Once the pipeline is ready, we can now start sending requests. We will begin by defining some helper function to do so:

```python
import requests
import subprocess
from typing import Optional, Union, List

def get_mesh_ip():
    cmd = f"kubectl get svc seldon-mesh -n seldon -o jsonpath='{{.status.loadBalancer.ingress[0].ip}}'"
    return subprocess.check_output(cmd, shell=True).decode('utf-8')


def send_request(
    memory_id: str,
    content: Union[str, List[str]],
    role: Union[str, List[str]],
    type: Union[str, List[str]],    
    tools: Optional[str] = None,
    tool_call_id: Optional[str] = None,
):
    if isinstance(content, str):
        content = [content]

    if isinstance(role, str):
        role = [role]
    
    if isinstance(type, str):
        type = [type]
        
    inputs = [
        {
            "name": "memory_id",
            "shape": [1],
            "datatype": "BYTES",
            "data": [memory_id],
            "parameters": {"content_type": "str"},
        },
        {
            "name": "role",
            "shape": [len(role)],
            "datatype": "BYTES",
            "data": role,
            "parameters": {"content_type": "str"},
        },
        {
            "name": "content",
            "shape": [len(content)],
            "datatype": "BYTES",
            "data": content,
            "parameters": {"content_type": "str"},
        },
        {
            "name": "type",
            "shape": [len(type)],
            "datatype": "BYTES",
            "data": type,
            "parameters": {"content_type": "str"},
        }
    ]
    
    if tools is not None:
        inputs.append(
            {
                "name": "tools",
                "shape": [len(tools)],
                "datatype": "BYTES",
                "data": tools,
                "parameters": {"content_type": "str"},
            }
        )
        
    if tool_call_id is not None:
        tool_call_id = [tool_call_id] if isinstance(tool_call_id, str) else tool_call_id
        inputs.append(
            {
                "name": "tool_call_id",
                "shape": [len(tool_call_id)],
                "datatype": "BYTES",
                "data": tool_call_id,
                "parameters": {"content_type": "str"},
            }
        )
    
    inference_request = {"inputs": inputs}
    endpoint =  f"http://{get_mesh_ip()}/v2/pipelines/chat-app/infer"
    return requests.post(
        endpoint,
        json=inference_request,
)
```

For our example, we will use the following `tools` and we will simulate a function which computes a delivery date:

```python
import json
from uuid import uuid4
from datetime import datetime

def get_delivery_date(order_id: str):
    return datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    
tools = [
    json.dumps({
        "type": "function",
        "function": {
            "name": "get_delivery_date",
            "description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID."
                    }
                },
                "required": ["order_id"],
                "additionalProperties": False
            }
        }
    })
]

# define memory id
memory_id = str(uuid4())
```

Now that we have defined the `tools`, we can proceed and ask `"Where is my package?"`:

```python
response = send_request(
    memory_id=memory_id,
    role=["system", "user"],
    content=[
        json.dumps(["You are a helpful customer support assistant. Use the supplied tools to assist the user."]),
        json.dumps(["Where is my package?"]),
    ],
    type=[
        json.dumps(["text"]),
        json.dumps(["text"]),
    ],
    tools=tools
)
```

To which the model replies with:

```python
response.json()["outputs"][1]["data"][0]
```

```
'Could you please provide me with your order ID so I can check the delivery date for you?'
```

We can provide a dummy id, `"order_12345"`:

```python
response = send_request(
    memory_id=memory_id,
    role="user",
    content="I think it is order_12345",
    type="text",
    tools=tools
)
```

Now, the model replied with the arguments which we have to pass to our `get_delivery_date` method and the id for the tool call:

```python
response.json()["outputs"][3]['data'][0]
```

```
'{"id":"call_QTRjpGLVidVudGuyJtAvPMqp","function":{"arguments":"{\\"order_id\\":\\"order_12345\\"}","name":"get_delivery_date"},"type":"function"}'
```

We can the `get_delivery_date` with the arguments returned by the model:

```python
# extract tool_calls
tool_calls = response.json()["outputs"][3]["data"]
tool_calls_dec = [json.loads(tool_call) for tool_call in tool_calls]

# decode args
args = json.loads(tool_calls_dec[0]["function"]["arguments"])

# call function
delivery_date = get_delivery_date(**args)
delivery_date
```

```
'2025-06-06 09:36:15'
```

We now send the response of calling the method `get_delivery_date` back to the model:

```python
response = send_request(
    memory_id=memory_id,
    role="tool",
    content=json.dumps({
        "order_id": "order_12345",
        "delivery_date": delivery_date
    }),
    type="text",
    tool_call_id=tool_calls_dec[0]["id"],
    tools=tools,
)
```

The model is now provided with all the information to respond to our question:

```python
response.json()["outputs"][1]["data"][0]
```

```
'Your package with order ID "order_12345" is expected to be delivered on June 6, 2025, at 9:36 AM. If you have any other questions or need further assistance, feel free to ask!'
```

To delete the pipeline run the following command:

```python
!kubectl delete -f manifests/pipeline.yaml -n seldon
```

```
pipeline.mlops.seldon.io "chat-app" deleted
```

To unload the models, run the following command:

```python
!kubectl delete -f manifests/models.yaml -n seldon
```

```
model.mlops.seldon.io "combine-question" deleted
model.mlops.seldon.io "combine-answer" deleted
model.mlops.seldon.io "chatgpt" deleted
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.seldon.ai/llm-module/use-cases/function-calling.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
