Example

In this example we demostrate how to deploy a Vector DB client using our Retrieval Runtime with Seldon Core 2. To run this notebook you will need to have Seldon Core 2 up and running on you local machine and the Retrieval Runtime running on core. Please check our installation tutorial to see how to do so.

Before the actual deployment of Retrieval Runtime in our cluster, we need to have the server database up and running and populated. We will use the following synthetic data to populate our database, consisting of a list of sentences with their associated sentiment (i.e. metadata to be stored alongside), and their embedding. For demonstrative purposes, we will use the following dummy one-hot encoding embeddings:

sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
    "The weather is horrible today.",
    "It's so cloudy outside!",
    "She walked to the park.",
]

sentiment = [
    "positive",
    "positive",
    "neutral",
    "negative",
    "negative",
    "neutral",
]

embeddings = [
    [1.0, 0.0, 0.0],
    [0.0, 1.0, 0.0],
    [0.0, 0.0, 1.0],
    [1.0, 0.0, 0.0],
    [0.0, 1.0, 0.0],
    [0.0, 0.0, 1.0],
]

Now that we have defined our dummy data, we will have to populate the databases with this. For this example we will demonstrate how to populate pgvector and qdrant databases.

We also include a helper method to extract the mesh ip. This will be useful for sending the requests to the retrieval model.

import subprocess

def get_mesh_ip():
    cmd = f"kubectl get svc seldon-mesh -n seldon -o jsonpath='{{.status.loadBalancer.ingress[0].ip}}'"
    return subprocess.check_output(cmd, shell=True).decode('utf-8')

Before proceeding any further, ensure you have the following python packages installed:

!pip install requests pg8000==1.31.2 qdrant-client==1.12.0 -q

PGVector

We begin with deploying a pgvector service, defined as follows:

!cat manifests/pgvector-deployment.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
  labels:
    app: postgres
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: ankane/pgvector
        env:
        - name: POSTGRES_USER
          value: admin
        - name: POSTGRES_PASSWORD
          value: admin
        - name: POSTGRES_DB
          value: db
        - name: POSTGRES_HOST_AUTH_METHOD
          value: trust
        ports:
        - containerPort: 5432
        readinessProbe:
          exec:
            command: ["pg_isready", "-U", "admin", "-d", "db"]
          initialDelaySeconds: 5
          periodSeconds: 1
---
apiVersion: v1
kind: Service
metadata:
  name: pgvector-db
  labels:
    app: pgvector-db
spec:
  type: ClusterIP  # Specifies that this is an internal service
  ports:
    - port: 5432
      targetPort: 5432
  selector:
    app: postgres

To apply it to our local cluster, run the following command:

!kubectl apply -f manifests/pgvector-deployment.yaml -n seldon
!kubectl rollout status deployment/postgres -n seldon --timeout=600s
deployment.apps/postgres created
service/pgvector-db created
Waiting for deployment "postgres" rollout to finish: 0 of 1 updated replicas are available...
deployment "postgres" successfully rolled out

We should now have the pgvector server up and running on our local cluster. To populate the database with the dummy data defined above, we need to do a port forwarding from localhost:5432 to the container port 5432.

To do so, open a new terminal and run the following command:

kubectl port-forward postgres-c9784f947-4msmr 5432:5432 -n seldon

or feel free to do port-forwarding directly from a tool like k9s.

Note that the termination of your pod is going to be different than c9784f947-4msmr. Please modify the command above such that it matches your local configuration.

To populate the database, run the following script:

import pg8000

client = pg8000.connect(
    host='localhost',
    port=5432,
    database="db",
    user="admin",
    password="admin",
)

client.run("CREATE EXTENSION IF NOT EXISTS vector")
client.run(
    "CREATE TABLE IF NOT EXISTS embedding_table (id SERIAL PRIMARY KEY, "
    "embedding vector, text text, sentiment text)"
)

for sentence, sentiment, embedding in zip(sentences, sentiment, embeddings):
    client.run(
        f"INSERT INTO embedding_table(embedding, text, sentiment) VALUES (:embedding, :text, :sentiment)",
        embedding=str(embedding),
        text=sentence,
        sentiment=sentiment,
    )

client.commit()
client.close()

At this point, the data should be in the pgvector database. We are now ready to deploy a retrieval model through which we will query the database.

We begin by defining the model-settings.json for the pgvector client as follows:

!cat models/pgvector-client/model-settings.json
{
    "name": "pgvector-client",
    "implementation": "mlserver_vector_db.VectorDBRuntime",
    "parameters": {
        "extra": {
            "provider_id": "pgvector",
            "config": {
                "host": "pgvector-db",
                "port": 5432,
                "database": "db",
                "user": "admin",
                "password": "admin",
                "table": "embedding_table",
                "embedding_column": "embedding",
                "search_parameters": {
                    "columns": ["id", "text", "sentiment"],
                    "where": {
                        "AND": [
                            {"id": {"lt": 10}}, 
                            {"sentiment": "negative"}
                        ]
                    },
                    "limit": 1
                }
            }
        }
    }
}

Besides configurations like provider_id, url and authentication associated configurations, we also included the search parameters configurations. Our runtime offers the option to perform filtered search on the metadata stored alongside the text and the embedding vector. In this case, we will be looking for a text with negative sentiment associated in the first 10 entries of our table. We will also return just on entry in our response. Feel free to modify the filtering options based on your needs.

The associate manifest file is the following:

!cat manifests/pgvector-client.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: pgvector-client
spec:
  storageUri: "gs://seldon-models/llm-runtimes-settings/retrieval/example/models/pgvector-client"
  requirements:
  - vector-db

We can deploy our retrieval client by running the following command:

!kubectl apply -f manifests/pgvector-client.yaml -n seldon
!kubectl wait --for condition=ready --timeout=600s model --all -n seldon
model.mlops.seldon.io/pgvector-client created
model.mlops.seldon.io/pgvector-client condition met

Once the vector-db client is up and running we are ready to send a request.

import requests
from pprint import pprint

inference_request = {
    "inputs": [
        {
            "name": "embedding",
            "shape": [1, 3],
            "datatype": "FP64",
            "parameters": {"content_type": "np"},
            "data": [1.0, 0.0, 0.0]
        }
    ]
}


endpoint = f"http://{get_mesh_ip()}/v2/models/pgvector-client/infer"
response = requests.post(endpoint,json=inference_request)
pprint(response.json(), depth=4)
{'id': '95fb3848-1d6b-46b1-b0ff-f8a228a1b0ad',
 'model_name': 'pgvector-client_1',
 'outputs': [{'data': ['{"id": 4, "text": "The weather is horrible today.", '
                       '"sentiment": "negative", "distance": 0.0}'],
              'datatype': 'BYTES',
              'name': 'documents',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]}],
 'parameters': {}}

We can see that although our dummy embedding was matching two entries in our vector database, we returned only the one associated with a negative sentiment. This is because we specified in our filtering configuration that we are only interested in negative entries.

If you want to overwrite the searching parameters without redeploying the model, you can do so by sending them in the request parameters. The following is an example which modifies the number of entries to be returned (i.e., limit=2).

import requests
from pprint import pprint

inference_request = {
    "inputs": [
        {
            "name": "embedding",
            "shape": [1, 3],
            "datatype": "FP64",
            "parameters": {"content_type": "np"},
            "data": [1.0, 0.0, 0.0]
        }
    ],
    "parameters": {
        "search_parameters": {
            "limit": 2,
        }
    }
}


endpoint = f"http://{get_mesh_ip()}/v2/models/pgvector-client/infer"
response = requests.post(endpoint,json=inference_request)
pprint(response.json(), depth=4)
{'id': '67f62b0f-f9f1-4236-9e6f-2d5bdfd3735b',
 'model_name': 'pgvector-client_1',
 'outputs': [{'data': ['{"id": 4, "text": "The weather is horrible today.", '
                       '"sentiment": "negative", "distance": 0.0}',
                       '{"id": 5, "text": "It\'s so cloudy outside!", '
                       '"sentiment": "negative", "distance": 1.0}'],
              'datatype': 'BYTES',
              'name': 'documents',
              'parameters': {'content_type': 'str'},
              'shape': [2, 1]}],
 'parameters': {}}

As we can see, the same request is now returning two entries instead of one.

To remove the client model and the psql deployment, run the following commands:

!kubectl delete -f manifests/pgvector-client.yaml -n seldon
!kubectl delete -f manifests/pgvector-deployment.yaml -n seldon
model.mlops.seldon.io "pgvector-client" deleted
deployment.apps "postgres" deleted
service "pgvector-db" deleted

Qdrant

We do the same steps for qdrant. We begin by deploying the qdrant server on our k8s cluster:

!kubectl apply -f manifests/qdrant-deployment.yaml -n seldon
!kubectl rollout status deployment/qdrant -n seldon --timeout=600s
deployment.apps/qdrant created
service/qdrant-db created
Waiting for deployment "qdrant" rollout to finish: 0 of 1 updated replicas are available...
deployment "qdrant" successfully rolled out

We can now populate the collection with our dummy dataset. We follow the same procedure as above, by doing a port-forwarding from localhost:6333 to the container port 6333.

To populate the database, run the following script:

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
from qdrant_client.models import PointStruct

qdrant = QdrantClient(location=f"http://localhost:6333")
collection_name = "collection"

if collection_name in [c.name for c in qdrant.get_collections().collections]:
    qdrant.delete_collection(collection_name)

qdrant.create_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(size=3, distance=Distance.COSINE),
)

qdrant.upsert(
    collection_name=collection_name,
    points=[
        PointStruct(
            id=i, 
            vector=embeddings[i], 
            payload={"text": sentences[i], "sentiment": sentiment[i]}
        )
        for i in range(len(sentences))
    ],
)

qdrant.close()

Once the collection is populated, we can deploy the retrieval client with the following model-settings.json:

!cat models/qdrant-client/model-settings.json
{
    "name": "qdrant-client",
    "implementation": "mlserver_vector_db.VectorDBRuntime",
    "parameters": {
        "extra": {
            "provider_id": "qdrant",
            "config": {
                "collection_name": "collection",
                "init_kwargs": {
                    "url": "qdrant-db",
                    "port": 6333
                },
                "search_parameters": {
                    "limit": 1
                }
            },
            "extract_utils": {
                "keys": ["text"],
                "keys_mapping": {"text": "document"}
            }
        }
    }
}

One thing to notice about the configuration settings is the remapping option: the retrieval runtime allows you to extract some particular fields from each response, rename them, and provide them as a response. This is quite useful when you integrate the vector-db runtime in a more complex pipeline. For example, in a RAG application, the prompt used by the LLM might expect the key document instead of text. This configuration will allow you to rename the entry text into documents so that the modules can be combined together.

The associated manifest file is the following:

!cat manifests/qdrant-client.yaml
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: qdrant-client
spec:
  storageUri: "gs://seldon-models/llm-runtimes-settings/retrieval/example/models/qdrant-client"
  requirements:
  - vector-db

To deploy the retrieval client on k8s, run the following command:

!kubectl apply -f manifests/qdrant-client.yaml -n seldon
!kubectl wait --for condition=ready --timeout=600s model --all -n seldon
model.mlops.seldon.io/qdrant-client created
model.mlops.seldon.io/qdrant-client condition met

Once deployed, we can send request.

import requests
from pprint import pprint

inference_request = {
    "inputs": [
        {
            "name": "embedding",
            "shape": [1, 3],
            "datatype": "FP64",
            "parameters": {"content_type": "np"},
            "data": [1.0, 0.0, 0.0]
        }
    ],
}


endpoint = f"http://{get_mesh_ip()}/v2/models/qdrant-client/infer"
response = requests.post(endpoint,json=inference_request)
pprint(response.json(), depth=4)
{'id': '54d27e02-1316-41d8-be04-ec03fe4a4240',
 'model_name': 'qdrant-client_1',
 'outputs': [{'data': ['{"document": "The weather is lovely today."}'],
              'datatype': 'BYTES',
              'name': 'documents',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]}],
 'parameters': {}}

As we can see in the response, we retrieve the appropriate sentence and we also observe that the key has been renamed to document. Note that the response is different from the pgvector because we didn't apply any filtering on the metadata (i.e., on the sentence sentiment).

To clean up the model, and the server, run the following commands:

!kubectl delete -f manifests/qdrant-client.yaml -n seldon
!kubectl delete -f manifests/qdrant-deployment.yaml -n seldon
model.mlops.seldon.io "qdrant-client" deleted
deployment.apps "qdrant" deleted
service "qdrant-db" deleted

Last updated

Was this helpful?