Function Calling

In this example we will showcase how to deploy a chat bot agent with function calling capabilities. We will create a pipeline consisting of the a model deployed using the API runtime and two memory models. What differs from using the function calling from the API Runtime as a standalone component (see example here) is the automatic memory management by the Memory Runtime. This relieves the burden for the user to keep track of the conversation history.

To run this tutorial you will need to create a secret with the OpenaAI API key and have the API and Memory Runtimes up and running. Please check our installation tutorial to see how you can set up all those components.

After deploying both of our servers, we can now deploy the models on top of them.

The model-settings.json for the API model looks as follows:

!cat models/chatgpt/model-settings.json
{
    "name": "chatgpt",
    "implementation": "mlserver_llm_api.LLMRuntime",
    "parameters": {
      "extra": {
        "provider_id": "openai",
        "config": {
          "model_id": "gpt-4o",
          "model_type": "chat.completions"
        }
      }
    }
  }

In this example, we will deploy a gpt-4o model for chat completions.

The model-settings.json file for the memory model looks as follows:

Note that we've included two additional columns from the chat-bot example: "tool_call_id" and "tool_calls".

To deploy all the models, we will use the following manifest:

To deploy the models, run the following command:

At this point we should have all the necessary models up and running. We can now create the pipeline.

The manifest file for the pipeline looks as follows:

The main difference from the chat-bot example is that some tensor inputs are optional. Thus, we have to use outer joins for the inputs of our pipeline components. We can now deploy our pipeline by running the following command:

Once the pipeline is ready, we can now start sending requests. We will begin by defining some helper function to do so:

For our example, we will use the following tools and we will simulate a function which computes a delivery date:

Now that we have defined the tools, we can proceed and ask "Where is my package?":

To which the model replies with:

We can provide a dummy id, "order_12345":

Now, the model replied with the arguments which we have to pass to our get_delivery_date method and the id for the tool call:

We can the get_delivery_date with the arguments returned by the model:

We now send the response of calling the method get_delivery_date back to the model:

The model is now provided with all the information to respond to our question:

To delete the pipeline run the following command:

To unload the models, run the following command:

Last updated

Was this helpful?