# Example

In this example we demostrate how to deploy a Vector DB client using our Retrieval Runtime with Seldon Core 2. To run this notebook you will need to have Seldon Core 2 up and running on you local machine and the Retrieval Runtime running on core. Please check our [installation tutorial](/llm-module/introduction/installation.md) to see how to do so.

Before the actual deployment of Retrieval Runtime in our cluster, we need to have the server database up and running and populated. We will use the following synthetic data to populate our database, consisting of a list of sentences with their associated sentiment (i.e. metadata to be stored alongside), and their embedding. For demonstrative purposes, we will use the following dummy one-hot encoding embeddings:

```python
sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
    "The weather is horrible today.",
    "It's so cloudy outside!",
    "She walked to the park.",
]

sentiment = [
    "positive",
    "positive",
    "neutral",
    "negative",
    "negative",
    "neutral",
]

embeddings = [
    [1.0, 0.0, 0.0],
    [0.0, 1.0, 0.0],
    [0.0, 0.0, 1.0],
    [1.0, 0.0, 0.0],
    [0.0, 1.0, 0.0],
    [0.0, 0.0, 1.0],
]
```

Now that we have defined our dummy data, we will have to populate the databases with this. For this example we will demonstrate how to populate `pgvector` and `qdrant` databases.

We also include a helper method to extract the mesh ip. This will be useful for sending the requests to the `retrieval` model.

```python
import subprocess

def get_mesh_ip():
    cmd = f"kubectl get svc seldon-mesh -n seldon -o jsonpath='{{.status.loadBalancer.ingress[0].ip}}'"
    return subprocess.check_output(cmd, shell=True).decode('utf-8')

```

Before proceeding any further, ensure you have the following python packages installed:

```python
!pip install requests pg8000==1.31.2 qdrant-client==1.12.0 -q
```

## PGVector

We begin with deploying a `pgvector` service, defined as follows:

```python
!cat manifests/pgvector-deployment.yaml
```

```
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
  labels:
    app: postgres
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: ankane/pgvector
        env:
        - name: POSTGRES_USER
          value: admin
        - name: POSTGRES_PASSWORD
          value: admin
        - name: POSTGRES_DB
          value: db
        - name: POSTGRES_HOST_AUTH_METHOD
          value: trust
        ports:
        - containerPort: 5432
        readinessProbe:
          exec:
            command: ["pg_isready", "-U", "admin", "-d", "db"]
          initialDelaySeconds: 5
          periodSeconds: 1
---
apiVersion: v1
kind: Service
metadata:
  name: pgvector-db
  labels:
    app: pgvector-db
spec:
  type: ClusterIP  # Specifies that this is an internal service
  ports:
    - port: 5432
      targetPort: 5432
  selector:
    app: postgres
```

To apply it to our local cluster, run the following command:

```python
!kubectl apply -f manifests/pgvector-deployment.yaml -n seldon
!kubectl rollout status deployment/postgres -n seldon --timeout=600s
```

```
deployment.apps/postgres created
service/pgvector-db created
Waiting for deployment "postgres" rollout to finish: 0 of 1 updated replicas are available...
deployment "postgres" successfully rolled out
```

We should now have the `pgvector` server up and running on our local cluster. To populate the database with the dummy data defined above, we need to do a port forwarding from `localhost:5432` to the container port `5432`.

To do so, open a new terminal and run the following command:

```bash
kubectl port-forward postgres-c9784f947-4msmr 5432:5432 -n seldon
```

or feel free to do port-forwarding directly from a tool like k9s.

{% hint style="info" %}
Note that the termination of your pod is going to be different than c9784f947-4msmr. Please modify the command above such that it matches your local configuration.
{% endhint %}

To populate the database, run the following script:

```python
import pg8000

client = pg8000.connect(
    host='localhost',
    port=5432,
    database="db",
    user="admin",
    password="admin",
)

client.run("CREATE EXTENSION IF NOT EXISTS vector")
client.run(
    "CREATE TABLE IF NOT EXISTS embedding_table (id SERIAL PRIMARY KEY, "
    "embedding vector, text text, sentiment text)"
)

for sentence, sentiment, embedding in zip(sentences, sentiment, embeddings):
    client.run(
        f"INSERT INTO embedding_table(embedding, text, sentiment) VALUES (:embedding, :text, :sentiment)",
        embedding=str(embedding),
        text=sentence,
        sentiment=sentiment,
    )

client.commit()
client.close()

```

At this point, the data should be in the `pgvector` database. We are now ready to deploy a `retrieval` model through which we will query the database.

We begin by defining the `model-settings.json` for the `pgvector` client as follows:

```python
!cat models/pgvector-client/model-settings.json
```

```
{
    "name": "pgvector-client",
    "implementation": "mlserver_vector_db.VectorDBRuntime",
    "parameters": {
        "extra": {
            "provider_id": "pgvector",
            "config": {
                "host": "pgvector-db",
                "port": 5432,
                "database": "db",
                "user": "admin",
                "password": "admin",
                "table": "embedding_table",
                "embedding_column": "embedding",
                "search_parameters": {
                    "columns": ["id", "text", "sentiment"],
                    "where": {
                        "AND": [
                            {"id": {"lt": 10}}, 
                            {"sentiment": "negative"}
                        ]
                    },
                    "limit": 1
                }
            }
        }
    }
}
```

Besides configurations like `provider_id`, `url` and authentication associated configurations, we also included the `search parameters` configurations. Our runtime offers the option to perform filtered search on the metadata stored alongside the text and the embedding vector. In this case, we will be looking for a text with negative sentiment associated in the first 10 entries of our table. We will also return just on entry in our response. Feel free to modify the filtering options based on your needs.

The associate manifest file is the following:

```python
!cat manifests/pgvector-client.yaml
```

```
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: pgvector-client
spec:
  storageUri: "gs://seldon-models/llm-runtimes-settings/retrieval/example/models/pgvector-client"
  requirements:
  - vector-db
```

We can deploy our `retrieval` client by running the following command:

```python
!kubectl apply -f manifests/pgvector-client.yaml -n seldon
!kubectl wait --for condition=ready --timeout=600s model --all -n seldon
```

```
model.mlops.seldon.io/pgvector-client created
model.mlops.seldon.io/pgvector-client condition met
```

Once the `vector-db` client is up and running we are ready to send a request.

```python
import requests
from pprint import pprint

inference_request = {
    "inputs": [
        {
            "name": "embedding",
            "shape": [1, 3],
            "datatype": "FP64",
            "parameters": {"content_type": "np"},
            "data": [1.0, 0.0, 0.0]
        }
    ]
}


endpoint = f"http://{get_mesh_ip()}/v2/models/pgvector-client/infer"
response = requests.post(endpoint,json=inference_request)
pprint(response.json(), depth=4)
```

```
{'id': '95fb3848-1d6b-46b1-b0ff-f8a228a1b0ad',
 'model_name': 'pgvector-client_1',
 'outputs': [{'data': ['{"id": 4, "text": "The weather is horrible today.", '
                       '"sentiment": "negative", "distance": 0.0}'],
              'datatype': 'BYTES',
              'name': 'documents',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]}],
 'parameters': {}}
```

We can see that although our dummy embedding was matching two entries in our vector database, we returned only the one associated with a negative sentiment. This is because we specified in our filtering configuration that we are only interested in negative entries.

If you want to overwrite the searching parameters without redeploying the model, you can do so by sending them in the request parameters. The following is an example which modifies the number of entries to be returned (i.e., `limit=2`).

```python
import requests
from pprint import pprint

inference_request = {
    "inputs": [
        {
            "name": "embedding",
            "shape": [1, 3],
            "datatype": "FP64",
            "parameters": {"content_type": "np"},
            "data": [1.0, 0.0, 0.0]
        }
    ],
    "parameters": {
        "search_parameters": {
            "limit": 2,
        }
    }
}


endpoint = f"http://{get_mesh_ip()}/v2/models/pgvector-client/infer"
response = requests.post(endpoint,json=inference_request)
pprint(response.json(), depth=4)
```

```
{'id': '67f62b0f-f9f1-4236-9e6f-2d5bdfd3735b',
 'model_name': 'pgvector-client_1',
 'outputs': [{'data': ['{"id": 4, "text": "The weather is horrible today.", '
                       '"sentiment": "negative", "distance": 0.0}',
                       '{"id": 5, "text": "It\'s so cloudy outside!", '
                       '"sentiment": "negative", "distance": 1.0}'],
              'datatype': 'BYTES',
              'name': 'documents',
              'parameters': {'content_type': 'str'},
              'shape': [2, 1]}],
 'parameters': {}}
```

As we can see, the same request is now returning two entries instead of one.

To remove the client model and the `psql` deployment, run the following commands:

```python
!kubectl delete -f manifests/pgvector-client.yaml -n seldon
!kubectl delete -f manifests/pgvector-deployment.yaml -n seldon
```

```
model.mlops.seldon.io "pgvector-client" deleted
deployment.apps "postgres" deleted
service "pgvector-db" deleted
```

## Qdrant

We do the same steps for `qdrant`. We begin by deploying the `qdrant` server on our k8s cluster:

```python
!kubectl apply -f manifests/qdrant-deployment.yaml -n seldon
!kubectl rollout status deployment/qdrant -n seldon --timeout=600s
```

```
deployment.apps/qdrant created
service/qdrant-db created
Waiting for deployment "qdrant" rollout to finish: 0 of 1 updated replicas are available...
deployment "qdrant" successfully rolled out
```

We can now populate the collection with our dummy dataset. We follow the same procedure as above, by doing a port-forwarding from `localhost:6333` to the container port `6333`.

To populate the database, run the following script:

```python
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
from qdrant_client.models import PointStruct

qdrant = QdrantClient(location=f"http://localhost:6333")
collection_name = "collection"

if collection_name in [c.name for c in qdrant.get_collections().collections]:
    qdrant.delete_collection(collection_name)

qdrant.create_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(size=3, distance=Distance.COSINE),
)

qdrant.upsert(
    collection_name=collection_name,
    points=[
        PointStruct(
            id=i, 
            vector=embeddings[i], 
            payload={"text": sentences[i], "sentiment": sentiment[i]}
        )
        for i in range(len(sentences))
    ],
)

qdrant.close()
```

Once the collection is populated, we can deploy the `retrieval` client with the following `model-settings.json`:

```python
!cat models/qdrant-client/model-settings.json
```

```
{
    "name": "qdrant-client",
    "implementation": "mlserver_vector_db.VectorDBRuntime",
    "parameters": {
        "extra": {
            "provider_id": "qdrant",
            "config": {
                "collection_name": "collection",
                "init_kwargs": {
                    "url": "qdrant-db",
                    "port": 6333
                },
                "search_parameters": {
                    "limit": 1
                }
            },
            "extract_utils": {
                "keys": ["text"],
                "keys_mapping": {"text": "document"}
            }
        }
    }
}
```

One thing to notice about the configuration settings is the remapping option: the `retrieval` runtime allows you to extract some particular fields from each response, rename them, and provide them as a response. This is quite useful when you integrate the `vector-db` runtime in a more complex pipeline. For example, in a `RAG` application, the prompt used by the LLM might expect the key `document` instead of `text`. This configuration will allow you to rename the entry `text` into `documents` so that the modules can be combined together.

The associated manifest file is the following:

```python
!cat manifests/qdrant-client.yaml
```

```
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: qdrant-client
spec:
  storageUri: "gs://seldon-models/llm-runtimes-settings/retrieval/example/models/qdrant-client"
  requirements:
  - vector-db
```

To deploy the `retrieval` client on k8s, run the following command:

```python
!kubectl apply -f manifests/qdrant-client.yaml -n seldon
!kubectl wait --for condition=ready --timeout=600s model --all -n seldon
```

```
model.mlops.seldon.io/qdrant-client created
model.mlops.seldon.io/qdrant-client condition met
```

Once deployed, we can send request.

```python
import requests
from pprint import pprint

inference_request = {
    "inputs": [
        {
            "name": "embedding",
            "shape": [1, 3],
            "datatype": "FP64",
            "parameters": {"content_type": "np"},
            "data": [1.0, 0.0, 0.0]
        }
    ],
}


endpoint = f"http://{get_mesh_ip()}/v2/models/qdrant-client/infer"
response = requests.post(endpoint,json=inference_request)
pprint(response.json(), depth=4)
```

```
{'id': '54d27e02-1316-41d8-be04-ec03fe4a4240',
 'model_name': 'qdrant-client_1',
 'outputs': [{'data': ['{"document": "The weather is lovely today."}'],
              'datatype': 'BYTES',
              'name': 'documents',
              'parameters': {'content_type': 'str'},
              'shape': [1, 1]}],
 'parameters': {}}
```

As we can see in the response, we retrieve the appropriate sentence and we also observe that the key has been renamed to `document`. Note that the response is different from the `pgvector` because we didn't apply any filtering on the metadata (i.e., on the sentence sentiment).

To clean up the model, and the server, run the following commands:

```python
!kubectl delete -f manifests/qdrant-client.yaml -n seldon
!kubectl delete -f manifests/qdrant-deployment.yaml -n seldon
```

```
model.mlops.seldon.io "qdrant-client" deleted
deployment.apps "qdrant" deleted
service "qdrant-db" deleted
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.seldon.ai/llm-module/components/retrieval/example.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
