Installation
This guide assumes we've provided LLM Module images via access credentials to our artefact registry. If not, please reach out to us for a demo.
You can use your standard ways of accessing private artifact registries. We'll cover
some of them. Remember to replace the credentials.json
with the name of the access
credentials file that we've sent to you.
Images
These are the runtimes' available images.
API
{{ registry-url }}/mlserver-llm-api:{{ current-version }}
Local
{{ registry-url }}/mlserver-llm-local:{{ current-version }}
Conversational Memory
{{ registry-url }}/mlserver-llm-memory:{{ current-version }}
VectorDB
{{ registry-url }}/mlserver-vector-db:{{ current-version }}
Local Embeddings
{{ registry-url }}/mlserver-local-embeddings:{{ current-version }}
Prompt
{{ registry-url }}/mlserver-prompt-utils:{{ current-version }}
Authenticating against the Seldon Artifact Registry
Note: change credentials.json
to the filename you have saved the credentials as.
Docker
To authenticate with the Docker CLI, you can run the following command:
REGISTRY=europe-west2-docker.pkg.dev
cat credentials.json | docker login -u _json_key --password-stdin ${REGISTRY}
You'll now be able to pull the private image(s). For example, we'll pull the image for theLocal Runtime
using the following command:
docker pull europe-west2-docker.pkg.dev/seldon-registry/llm/whoami:latest
Kubernetes
In order to pull images directly into a Kubernetes cluster, you'll need to create a Kubernetes secret. For example:
NAMESPACE=seldon
REGISTRY=europe-west2-docker.pkg.dev
CREDENTIALS=$(cat credentials.json)
SECRET=seldon-registry
kubectl create secret docker-registry ${SECRET} \
--docker-server="${REGISTRY}" \
--docker-username="_json_key" \
--docker-password="${CREDENTIALS}" \
--dry-run=client -o yaml | kubectl apply -n ${NAMESPACE} -f -
To test, we can apply the following manifest as a validation step, and ensure that it deploys successfully.
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: whoami
labels:
app: whoami
spec:
replicas: 1
selector:
matchLabels:
app: whoami
template:
metadata:
labels:
app: whoami
spec:
containers:
- name: whoami
image: europe-west2-docker.pkg.dev/seldon-registry/llm/whoami:latest
imagePullSecrets:
- name: seldon-registry
---
apiVersion: v1
kind: Service
metadata:
name: whoami
spec:
type: LoadBalancer
selector:
app: whoami
ports:
- protocol: TCP
port: 8080
targetPort: 80
Deploy the LLM Module runtime Servers
Environment Variables Setup
If you are going to be using the API server you need to set you OpenAI, Azure OpenAI, or Gemini API key as an environment variable. We will do this as a Kubernetes secret.
Example:
apiVersion: v1
kind: Secret
metadata:
name: openai-api-key
type: Opaque
data:
MLSERVER_MODEL_OPENAI_API_KEY: [base64 OpenAI or Azure OpenAI service API key]
Then apply the secret to the namespace you will be deploying the models.
kubectl apply -f openai-secret.yaml -n seldon
Similarly for Gemini:
apiVersion: v1
kind: Secret
metadata:
name: gemini-api-key
type: Opaque
data:
MLSERVER_MODEL_GEMINI_API_KEY: [base64 Gemini service API key]
Then apply the secret to the namespace you will be deploying the models.
kubectl apply -f gemini-secret.yaml -n seldon
If you are going to be deploying models directly from Hugging Face(HF) you will need to set your HF token into the name space you will be deploying models,
Create a Kubernetes secret containing your HF token in the namespace you are deploying your models.
Example:
apiVersion: v1
kind: Secret
metadata:
name: huggingface-token
type: Opaque
data:
HF_TOKEN: [base64 Hugging Face API Key]
Then apply the secret to the namespace you will be deploying the models:
kubectl apply -f hf-secret.yaml -n seldon
API Runtime
We will deploy the API Runtime
server into the namespace where you will be running models.
Create a manifest file called server-api.yaml
and add the following configuration:
apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
name: mlserver-llm-api
spec:
serverConfig: mlserver
capabilities:
- openai
- gemini
podSpec:
imagePullSecrets:
- name: seldon-registry
containers:
- name: mlserver
image: europe-west2-docker.pkg.dev/seldon-registry/llm/mlserver-llm-api:0.7.0
imagePullPolicy: Always
env:
- name: MLSERVER_MODEL_OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: openai-api-key
key: MLSERVER_MODEL_OPENAI_API_KEY
- name: MLSERVER_MODEL_GEMINI_API_KEY
valueFrom:
secretKeyRef:
name: gemini-api-key
key: MLSERVER_MODEL_GEMINI_API_KEY
Then apply the server-api.yaml
file to the namespace you will be deploying the models.
kubectl apply -f server-api.yaml -n seldon
You should now see the pod running the API Runtime
.
Conversational Memory Runtime
We will deploy the Conversational Memory Runtime
server into the namespace where you will be running models.
Create a manifest file called server-memory.yaml
and add the following configuration:
apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
name: mlserver-llm-memory
namespace: seldon
spec:
serverConfig: mlserver
capabilities:
- memory
podSpec:
imagePullSecrets:
- name: seldon-registry
containers:
- name: mlserver
image: europe-west2-docker.pkg.dev/seldon-registry/llm/mlserver-llm-memory:0.7.0
imagePullPolicy: Always
Then apply the server-memory.yaml
file to the namespace you will be deploying the models
kubectl apply -f server-memory.yaml -n seldon
We should now see the pod running the Conversational Memory Runtime
.
Local Runtime with GPU
We will deploy the Local Runtime
server with GPU support into the namespace where you will be running models.
Add the following configuration to your server-local.yaml
file. If needed, please change the resource requirements based on your use case.
apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
name: mlserver-llm-local
namespace: seldon
spec:
serverConfig: mlserver
capabilities:
- llm-local
podSpec:
imagePullSecrets:
- name: seldon-registry
containers:
- name: mlserver
image: europe-west2-docker.pkg.dev/seldon-registry/llm/mlserver-llm-local:0.7.0
imagePullPolicy: Always
env:
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: huggingface-token
key: HF_TOKEN
resources:
requests:
nvidia.com/gpu: 4
memory: 4Gi
limits:
memory: 32Gi
nvidia.com/gpu: 4
volumeMounts:
- mountPath: /dev/shm
name: dshm
readOnly: false
volumes:
- emptyDir:
medium: Memory
sizeLimit: 1Gi
name: dshm
Then apply the server-local.yaml
file to the namespace you will be deploying the models
kubectl apply -f server-local.yaml -n seldon
We should now see the pod running the Local Runtime
.
VectorDB Runtime
We will deploy the VectorDB Runtime
into the namespace where you will be running models
Add the following configuration to your server-vector-db.yaml
file.
apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
name: mlserver-vector-db
namespace: seldon
spec:
serverConfig: mlserver
capabilities:
- vector-db
podSpec:
imagePullSecrets:
- name: seldon-registry
containers:
- name: mlserver
image: europe-west2-docker.pkg.dev/seldon-registry/llm/mlserver-vector-db:0.7.0
imagePullPolicy: Always
Then apply the server-vector-db.yaml
file to the namespace you will be deploying the models
kubectl apply -f server-vector-db.yaml -n seldon
We should now see the pod running for the VectorDB Runtime
.
LocalEmbeddings Runtime
We will deploy the LocalEmbeddings Runtime
into the namespace where you will be running models.
Add the following configuration to your sever-local-embeddings.yaml
file.
apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
name: mlserver-local-embeddings
namespace: seldon
spec:
serverConfig: mlserver
capabilities:
- local-embeddings
podSpec:
imagePullSecrets:
- name: seldon-registry
containers:
- name: mlserver
image: europe-west2-docker.pkg.dev/seldon-registry/llm/mlserver-local-embeddings:0.7.0
imagePullPolicy: Always
resources:
requests:
nvidia.com/gpu: 1
memory: 4Gi
cpu: "1"
limits:
memory: 32Gi
nvidia.com/gpu: 1
cpu: "8"
volumeMounts:
- mountPath: /dev/shm
name: dshm
readOnly: false
volumes:
- emptyDir:
medium: Memory
sizeLimit: 1Gi
name: dshm
The CPU, GPU, and memory requirements are just for reference and should be updated according to your setup.
Then apply the server-local-embeddings.yaml
file to the namespace you will be deploying the models
kubectl apply -f server-local-embeddings.yaml -n seldon
We should now see the pod running the LocalEbeddings Runtime
.
Prompt Runtime
We will deploy the Prompt Runtime
into the namespace where you will be running models.
Add the following configuration to your sever-prompt.yaml
file.
apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
name: mlserver-prompt-utils
namespace: seldon
spec:
serverConfig: mlserver
capabilities:
- prompt
podSpec:
imagePullSecrets:
- name: seldon-registry
containers:
- name: mlserver
image: europe-west2-docker.pkg.dev/seldon-registry/llm/mlserver-prompt-utils:0.7.0
imagePullPolicy: Always
Then apply the server-prompt.yaml
file to the namespace you will be deploying the models
kubectl apply -f server-prompt.yaml -n seldon
We should now see the pod running the Prompt Runtime
.
Key Values from the server file
name: The name you want to call the server
serverConfig: This refers to the ServerConfig
within your Seldon
installation. ServerConfig
are used as templates for the base layer of the LLM modules MLServer Pod
imagePullSecrets: Setting the proper secret variable to authenticate against the private docker registry to pull the LLM module images
Cleaning up
To remove the servers, run the following commands:
kubectl delete -f server-api.yaml
kubectl delete -f server-memory.yaml
kubectl delete -f server-local.yaml
kubectl delete -f server-vector-db.yaml
To remove the secrets, run the following commands:
kubectl delete -f openai-secret.yaml -n seldon
kubectl delete -f gemini-secret.yaml -n seldon
kubectl delete -f hf-secret.yaml -n seldon
At this point, the LLM module should be removed from you cluster.
Troubleshooting
If there is an issue with the access key secret run the command below:
kubectl patch sa seldon-server -n seldon \
-p '"imagePullSecrets": [{"name": "seldon-registry" }]'
Trouble setting the OpenAI secret
Another method you can use is directly through kubectl
.
Set your environment variables:
export MLSERVER_MODEL_OPENAI_API_KEY=[OpenAI or Azure OpenAI service API key]
export NAMESPACE=[Namespace where you are deploying models]
Use kubectl
to apply the secret:
kubectl delete secret openai-api-key -n $(SELDON_NAMESPACE) || echo "openai-api-key secret does not exist - will create"
kubectl create secret generic openai-api-key --from-literal=key=$(MLSERVER_MODEL_OPENAI_API_KEY) -n $(SELDON_NAMESPACE)
The same workflow applies to Gemini and HF secrets.
Next Steps
Now that you're able to pull the images, you can try some of the examples.
Last updated
Was this helpful?