Example
In this example we demostrate how to deploy a Vector DB client using our Retrieval Runtime with Seldon Core 2. To run this notebook you will need to have Seldon Core 2 up and running on you local machine and the Retrieval Runtime running on core. Please check our installation tutorial to see how to do so.
Before the actual deployment of Retrieval Runtime in our cluster, we need to have the server database up and running and populated. We will use the following synthetic data to populate our database, consisting of a list of sentences with their associated sentiment (i.e. metadata to be stored alongside), and their embedding. For demonstrative purposes, we will use the following dummy one-hot encoding embeddings:
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
"The weather is horrible today.",
"It's so cloudy outside!",
"She walked to the park.",
]
sentiment = [
"positive",
"positive",
"neutral",
"negative",
"negative",
"neutral",
]
embeddings = [
[1.0, 0.0, 0.0],
[0.0, 1.0, 0.0],
[0.0, 0.0, 1.0],
[1.0, 0.0, 0.0],
[0.0, 1.0, 0.0],
[0.0, 0.0, 1.0],
]Now that we have defined our dummy data, we will have to populate the databases with this. For this example we will demonstrate how to populate pgvector and qdrant databases.
We also include a helper method to extract the mesh ip. This will be useful for sending the requests to the retrieval model.
import subprocess
def get_mesh_ip():
cmd = f"kubectl get svc seldon-mesh -n seldon -o jsonpath='{{.status.loadBalancer.ingress[0].ip}}'"
return subprocess.check_output(cmd, shell=True).decode('utf-8')
Before proceeding any further, ensure you have the following python packages installed:
!pip install requests pg8000==1.31.2 qdrant-client==1.12.0 -qPGVector
We begin with deploying a pgvector service, defined as follows:
!cat manifests/pgvector-deployment.yaml---
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
labels:
app: postgres
spec:
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: ankane/pgvector
env:
- name: POSTGRES_USER
value: admin
- name: POSTGRES_PASSWORD
value: admin
- name: POSTGRES_DB
value: db
- name: POSTGRES_HOST_AUTH_METHOD
value: trust
ports:
- containerPort: 5432
readinessProbe:
exec:
command: ["pg_isready", "-U", "admin", "-d", "db"]
initialDelaySeconds: 5
periodSeconds: 1
---
apiVersion: v1
kind: Service
metadata:
name: pgvector-db
labels:
app: pgvector-db
spec:
type: ClusterIP # Specifies that this is an internal service
ports:
- port: 5432
targetPort: 5432
selector:
app: postgresTo apply it to our local cluster, run the following command:
!kubectl apply -f manifests/pgvector-deployment.yaml -n seldon
!kubectl rollout status deployment/postgres -n seldon --timeout=600sdeployment.apps/postgres created
service/pgvector-db created
Waiting for deployment "postgres" rollout to finish: 0 of 1 updated replicas are available...
deployment "postgres" successfully rolled outWe should now have the pgvector server up and running on our local cluster. To populate the database with the dummy data defined above, we need to do a port forwarding from localhost:5432 to the container port 5432.
To do so, open a new terminal and run the following command:
kubectl port-forward postgres-c9784f947-4msmr 5432:5432 -n seldonor feel free to do port-forwarding directly from a tool like k9s.
To populate the database, run the following script:
import pg8000
client = pg8000.connect(
host='localhost',
port=5432,
database="db",
user="admin",
password="admin",
)
client.run("CREATE EXTENSION IF NOT EXISTS vector")
client.run(
"CREATE TABLE IF NOT EXISTS embedding_table (id SERIAL PRIMARY KEY, "
"embedding vector, text text, sentiment text)"
)
for sentence, sentiment, embedding in zip(sentences, sentiment, embeddings):
client.run(
f"INSERT INTO embedding_table(embedding, text, sentiment) VALUES (:embedding, :text, :sentiment)",
embedding=str(embedding),
text=sentence,
sentiment=sentiment,
)
client.commit()
client.close()
At this point, the data should be in the pgvector database. We are now ready to deploy a retrieval model through which we will query the database.
We begin by defining the model-settings.json for the pgvector client as follows:
!cat models/pgvector-client/model-settings.json{
"name": "pgvector-client",
"implementation": "mlserver_vector_db.VectorDBRuntime",
"parameters": {
"extra": {
"provider_id": "pgvector",
"config": {
"host": "pgvector-db",
"port": 5432,
"database": "db",
"user": "admin",
"password": "admin",
"table": "embedding_table",
"embedding_column": "embedding",
"search_parameters": {
"columns": ["id", "text", "sentiment"],
"where": {
"AND": [
{"id": {"lt": 10}},
{"sentiment": "negative"}
]
},
"limit": 1
}
}
}
}
}Besides configurations like provider_id, url and authentication associated configurations, we also included the search parameters configurations. Our runtime offers the option to perform filtered search on the metadata stored alongside the text and the embedding vector. In this case, we will be looking for a text with negative sentiment associated in the first 10 entries of our table. We will also return just on entry in our response. Feel free to modify the filtering options based on your needs.
The associate manifest file is the following:
!cat manifests/pgvector-client.yamlapiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
name: pgvector-client
spec:
storageUri: "gs://seldon-models/llm-runtimes-settings/retrieval/example/models/pgvector-client"
requirements:
- vector-dbWe can deploy our retrieval client by running the following command:
!kubectl apply -f manifests/pgvector-client.yaml -n seldon
!kubectl wait --for condition=ready --timeout=600s model --all -n seldonmodel.mlops.seldon.io/pgvector-client created
model.mlops.seldon.io/pgvector-client condition metOnce the vector-db client is up and running we are ready to send a request.
import requests
from pprint import pprint
inference_request = {
"inputs": [
{
"name": "embedding",
"shape": [1, 3],
"datatype": "FP64",
"parameters": {"content_type": "np"},
"data": [1.0, 0.0, 0.0]
}
]
}
endpoint = f"http://{get_mesh_ip()}/v2/models/pgvector-client/infer"
response = requests.post(endpoint,json=inference_request)
pprint(response.json(), depth=4){'id': '95fb3848-1d6b-46b1-b0ff-f8a228a1b0ad',
'model_name': 'pgvector-client_1',
'outputs': [{'data': ['{"id": 4, "text": "The weather is horrible today.", '
'"sentiment": "negative", "distance": 0.0}'],
'datatype': 'BYTES',
'name': 'documents',
'parameters': {'content_type': 'str'},
'shape': [1, 1]}],
'parameters': {}}We can see that although our dummy embedding was matching two entries in our vector database, we returned only the one associated with a negative sentiment. This is because we specified in our filtering configuration that we are only interested in negative entries.
If you want to overwrite the searching parameters without redeploying the model, you can do so by sending them in the request parameters. The following is an example which modifies the number of entries to be returned (i.e., limit=2).
import requests
from pprint import pprint
inference_request = {
"inputs": [
{
"name": "embedding",
"shape": [1, 3],
"datatype": "FP64",
"parameters": {"content_type": "np"},
"data": [1.0, 0.0, 0.0]
}
],
"parameters": {
"search_parameters": {
"limit": 2,
}
}
}
endpoint = f"http://{get_mesh_ip()}/v2/models/pgvector-client/infer"
response = requests.post(endpoint,json=inference_request)
pprint(response.json(), depth=4){'id': '67f62b0f-f9f1-4236-9e6f-2d5bdfd3735b',
'model_name': 'pgvector-client_1',
'outputs': [{'data': ['{"id": 4, "text": "The weather is horrible today.", '
'"sentiment": "negative", "distance": 0.0}',
'{"id": 5, "text": "It\'s so cloudy outside!", '
'"sentiment": "negative", "distance": 1.0}'],
'datatype': 'BYTES',
'name': 'documents',
'parameters': {'content_type': 'str'},
'shape': [2, 1]}],
'parameters': {}}As we can see, the same request is now returning two entries instead of one.
To remove the client model and the psql deployment, run the following commands:
!kubectl delete -f manifests/pgvector-client.yaml -n seldon
!kubectl delete -f manifests/pgvector-deployment.yaml -n seldonmodel.mlops.seldon.io "pgvector-client" deleted
deployment.apps "postgres" deleted
service "pgvector-db" deletedQdrant
We do the same steps for qdrant. We begin by deploying the qdrant server on our k8s cluster:
!kubectl apply -f manifests/qdrant-deployment.yaml -n seldon
!kubectl rollout status deployment/qdrant -n seldon --timeout=600sdeployment.apps/qdrant created
service/qdrant-db created
Waiting for deployment "qdrant" rollout to finish: 0 of 1 updated replicas are available...
deployment "qdrant" successfully rolled outWe can now populate the collection with our dummy dataset. We follow the same procedure as above, by doing a port-forwarding from localhost:6333 to the container port 6333.
To populate the database, run the following script:
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
from qdrant_client.models import PointStruct
qdrant = QdrantClient(location=f"http://localhost:6333")
collection_name = "collection"
if collection_name in [c.name for c in qdrant.get_collections().collections]:
qdrant.delete_collection(collection_name)
qdrant.create_collection(
collection_name=collection_name,
vectors_config=VectorParams(size=3, distance=Distance.COSINE),
)
qdrant.upsert(
collection_name=collection_name,
points=[
PointStruct(
id=i,
vector=embeddings[i],
payload={"text": sentences[i], "sentiment": sentiment[i]}
)
for i in range(len(sentences))
],
)
qdrant.close()Once the collection is populated, we can deploy the retrieval client with the following model-settings.json:
!cat models/qdrant-client/model-settings.json{
"name": "qdrant-client",
"implementation": "mlserver_vector_db.VectorDBRuntime",
"parameters": {
"extra": {
"provider_id": "qdrant",
"config": {
"collection_name": "collection",
"init_kwargs": {
"url": "qdrant-db",
"port": 6333
},
"search_parameters": {
"limit": 1
}
},
"extract_utils": {
"keys": ["text"],
"keys_mapping": {"text": "document"}
}
}
}
}One thing to notice about the configuration settings is the remapping option: the retrieval runtime allows you to extract some particular fields from each response, rename them, and provide them as a response. This is quite useful when you integrate the vector-db runtime in a more complex pipeline. For example, in a RAG application, the prompt used by the LLM might expect the key document instead of text. This configuration will allow you to rename the entry text into documents so that the modules can be combined together.
The associated manifest file is the following:
!cat manifests/qdrant-client.yamlapiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
name: qdrant-client
spec:
storageUri: "gs://seldon-models/llm-runtimes-settings/retrieval/example/models/qdrant-client"
requirements:
- vector-dbTo deploy the retrieval client on k8s, run the following command:
!kubectl apply -f manifests/qdrant-client.yaml -n seldon
!kubectl wait --for condition=ready --timeout=600s model --all -n seldonmodel.mlops.seldon.io/qdrant-client created
model.mlops.seldon.io/qdrant-client condition metOnce deployed, we can send request.
import requests
from pprint import pprint
inference_request = {
"inputs": [
{
"name": "embedding",
"shape": [1, 3],
"datatype": "FP64",
"parameters": {"content_type": "np"},
"data": [1.0, 0.0, 0.0]
}
],
}
endpoint = f"http://{get_mesh_ip()}/v2/models/qdrant-client/infer"
response = requests.post(endpoint,json=inference_request)
pprint(response.json(), depth=4){'id': '54d27e02-1316-41d8-be04-ec03fe4a4240',
'model_name': 'qdrant-client_1',
'outputs': [{'data': ['{"document": "The weather is lovely today."}'],
'datatype': 'BYTES',
'name': 'documents',
'parameters': {'content_type': 'str'},
'shape': [1, 1]}],
'parameters': {}}As we can see in the response, we retrieve the appropriate sentence and we also observe that the key has been renamed to document. Note that the response is different from the pgvector because we didn't apply any filtering on the metadata (i.e., on the sentence sentiment).
To clean up the model, and the server, run the following commands:
!kubectl delete -f manifests/qdrant-client.yaml -n seldon
!kubectl delete -f manifests/qdrant-deployment.yaml -n seldonmodel.mlops.seldon.io "qdrant-client" deleted
deployment.apps "qdrant" deleted
service "qdrant-db" deletedLast updated
Was this helpful?