Planning
In the previous tutorial, we showcased how to deploy an agentic workflow that implements a Tool Use pattern. Remember that in a tool use pattern, an agent receives a query, and based on the content of the query it decides whether or not it should call a tool. In the case that an answer can be provided based on the general knowledge of the LLM gathered during training, the answer is returned immediately. Otherwise, a tool must be called, and then the answer is provided to the user.
Although the Tool Use pattern can be handy to automate some tasks, it might still require an exchange of messages between the user and the agent to complete more complicated tasks. In our previous example, we defined two tools fetching the temperature and the wind speed for a specific location, denoted as get_temperature and get_wind_speed. Let us now imagine that we have a third tool that returns what the weather is like given the temperature and the wind speed. For example, we can imagine if the temperature is greater than 30 degrees Celsius and the speed is lower than 10 the weather tool can return that it is a hot day with low wind speed. Let's denote the weather tool as get_weather. What we are trying to emphasize with this example is the dependence of get_weather on the information provided by get_temperature and get_wind_speed. So, to call get_weather we must already have the answers from get_temperature and get_wind_speed. This means, that in the Tool Use pattern, we must first ask the agent to provide the temperature for a specific location, then the wind speed for the same location, and finally ask about the weather given the previous context. Those are three interactions that the user must have with the agent to fetch information about the weather in a specific location.
In this tutorial we introduce the Planning Pattern which will allow the agent to solve the dependent tasks in one go through planning. To achieve this, the LLM module makes use of cyclic pipelines supported in Seldon Core 2. Thus, we can deploy a cyclic pipeline that has a feedback loop allowing the agent to plan, call multiple tools, gather the results returned by those tools, call the tool that provides the final answer, and then exit the loop and return the response to the user.
The planning pipeline for our example looks like this:
We see four routes that the data flow can take:
Temperature when the model is asked about the current temperature in a specific location - triggered when the
get_temperaturetensor is present.Wind Speed when the model is asked about the current wind speed in a specific location - triggered when the
get_wind_speedtensor is present.Weather when the model is asked about the current weather for a specific location - triggered when the
get_weathertensor is present.Default which immediately returns the response of the model if none of the above routes are taken.
For more details on how the triggering tensors work and what happens when the model is asked a question for which it can provide the answer based on its internal knowledge, see the previous tutorial
Custom MLServer models
For the get_temperature and get_wind_speed, we use the same models as in the previous tutorial. In this example, we add the get_weather model that depends on the result provided by get_temperature and get_windspeed defined as:
def get_weather(temperature: float, windspeed: float) -> str:
if temperature > 30 and windspeed < 10:
return "hot day with low wind speed."
elif temperature < 10 and windspeed > 20:
return "cold day with high wind speed."
elif temperature > 20 and windspeed > 20:
return "warm day with high wind speed."
elif temperature < 20 and windspeed < 10:
return "cool day with low wind speed."
else:
return "moderate with normal wind speed."Here we define a helper function, which constructs the output response in the format expected by the Memory Runtime component:
We also define some helper functions which will be used in the custom MLServer model:
With all the helper functions in place, we can define the custom MLServer model as follows:
The model-settings.json for the weather model looks as follows:
Here define a tail model, which extracts the last entry from the provided history. The model definition is the following:
The corresponding model-settings.json file is the following:
There is one last model in the pipeline, which is specific to the cyclic pipelines in Core 2 with the following definition:
The identity model simply forwards all the inputs and adds a custom delay. In our case, the custom delay is 1 millisecond which corresponds to the joining window interval from Kafka Streams. This delay is required to avoid infinite loops due to joining operations in Kakfa Stream. For further details about how Kafka Streams works, see the following link.
The model-settings.json file for the identity model is the following:
We are now ready to deploy all the models. The manifest file for all the models is the following:
We can load all the models by running the following command:
At this point, we have all the models up and running and we are ready to define the computation pipeline. The manifest file for the pipeline is the following:
Note that we set allowCycles to true and the maxStepRevisits to 3. The allowCycles flag is required to enable cyclic pipelines in Seldon Core 2, and the maxStepRevisits defines how many times the pipeline can revisit the same step before it stops. In our case, we set it to 3, because we expect the pipeline to revisit the get_temperature, get_wind_speed, and get_weather steps at most once each.
We can now deploy the pipeline by running the following commands:
Before sending the requests to the pipeline, we define a helper function to extract the IP address of the seldon-mesh service. This will help us define the endpoint to which we want to send the requests.
To inform the LLM about the functions we want to use (i.e., get_temperature, get_wind_speed, and get_weather), we define the tools object, which contains all the metadata needed by the LLM (e.g., name, description, arguments, etc.) to be able to provide the appropriate arguments to those functions. See OpenAI tutorial on function calling for more details.
We are now ready to start interacting with the pipeline. We begin by defining the memory UUID which will uniquely identify a chat history:
We also define a helper function to send requests to the pipeline to avoid repetitive code:
We can now ask the agent to provide details about the weather in a particular location:
As we can see, the answer shows that the model can plan the necessary steps to answer our question by fetching the temperature and the wind speed to respond to the weather.
We can check that the tools are called by inspecting the history of the conversation:
In the history of the conversation, we can see exactly what all three tools are called.
One can still ask simpler questions which require calling a single tool or none:
To unload the pipeline and the models, run the following commands:
If the pipeline encounters a failure (e.g., a model inference error), it will halt and return an error message to the user. Since the system prompt is included with every request to the LLM, there's no risk of the chat history becoming inconsistent due to a failure. This is because the memory component retrieves all previous messages up to the last system prompt to provide context to the LLM.
Congrats! You've just deployed an agentic worfklow with the LLM module!
Last updated
Was this helpful?