Quickstart

In this tutorial we will show you how to create a Machine Learning pipeline for loan classsification. The pipeline is composed of an Large Language Model (LLM) prompted for entity extraction, a custom MLServer model for preprocessing the json returned by the LLM, and a Random Forest classifier which output the Loan Status (i.e., a binary variable, either "Yes"(1) or "No"(0)).

Our pipeline will look like this:

The workflow for the diagram above is the following:

  1. The input to the pipeline is natural text - a loan application in which the applicant provides information on the basis of which our pipeline will determine whether or not to automatically approve the loan.

  2. Then the LLM processes that text, extracting information we are interested in, and outputting the relevant fields in json format. Note that the LLM knows what to extract based on the system prompt we provide.

  3. The json is then passed to the preprocessor model which parses the json and converts it into a numpy array to be passed to the Random Forest classifier.

  4. Finally, the Random forest classifier outputs the prediction. We return both the preprocessed json and the classification label.

Now that we clarified what is the task, we are ready to start deploying the models and constructing the computational pipeline. For this, you will need an up and running MLServer instance (comes with the installation of Seldon Core 2) to serve the preprocessor model and the Random Forest classifier, as well as an instance of the Local Runtime to serve the LLM. We will load a mistralai/Mistral-7B-Instruct-v0.2 model on our infrastructure, so make sure you have enough resources to do so.

For running this tutorial we used 4 NVIDIA GeForce RTX 2080 Ti. Other configurations are also possible.

We begin by first defining the model settings for the Random Forest classifier. Note that we already trained a Random Forest classifier, so you don't need to worry about that step. The associated model-settings.json file is:

!cat models/loan-model/model-settings.json
{
    "name": "loan-model",
    "implementation": "mlserver_sklearn.SKLearnModel",
    "parameters": {
        "uri": "./loan-predictor-rf.joblib",
        "version": "v0.1.0"
    }
}

The associated manifest file is:

To deploy the Random Forest classifier, run the following command:

Now we can move to the preprocessor model. Since this is a custom model, we need to first provide the implementation for it. Our implementation is the following:

You don't have to do anything with the code above since we already uploaded all our artifacts in the google bucket. We only provided it for clarity.

The model-settings.json for the preprocessor model looks like this:

The associated manifest file is:

To deploy the preprocessor, run the following command:

We are now ready to deploy the last model, which is the actual LLM. The model-settings.json is:

For this model we specified the following:

  1. The vllm backend used by our inference engine.

  2. The model_type which specifies that the model is going to be used for chat completion.

  3. The model which is the reference to the model name from the HuggingFace model hub.

  4. We split the model on 4 GPUs by specifying tensor_parallel_size=4.

  5. Our hardware only supports float16 to store the weights, but feel free to remove it or replace it with bfloat16 if your hardware supports it.

  6. We specify the maximum amount of tokens to be processed through max_model_len.

  7. Finally, the maximum number of tokens to be generated is set to 1024 through max_tokens.

The manifest file for the LLM is:

To deploy the model, run the following command:

Now that all our models are deployed, we can define and deploy the pipeline. The manifest file which contains the definition of the pipeline is the following:

To deploy the pipeline, run the following commands:

We are now ready to perform inference through our pipeline, but before that we define a helper function to get the IP of the mesh:

We also define the system prompt for the LLM which contains a description of the behaviour we expect from the model.

The prompt we will send to the model which is the actual text to be processed and from which we want to extract the information is the following:

We can send now the request:

The outputs we get are the following:

We can see that the LLM succesfuly generated a valid json which was then processed and sent to the Random Forest classifier for prediction.

Congrats! You just deployed a computational pipeline which combines an LLM and classical machine learning model.

To delete the pipeline and unload the models, run the following commands:

Last updated

Was this helpful?