Triton GPT2 Example
In this notebook, we will run an example of text generation using GPT2 model exported from HuggingFace and deployed with Seldon's Triton pre-packed server. the example also covers converting the model to ONNX format. The implemented example below is of the Greedy approach for the next token prediction. more info: https://huggingface.co/transformers/model_doc/gpt2.html?highlight=gpt2
After we have the module deployed to Kubernetes, we will run a simple load test to evaluate the module inference performance.
Steps:
Download pretrained GPT2 model from hugging face
Convert the model to ONNX
Store it in MinIo bucket
Setup Seldon-Core in your kubernetes cluster
Deploy the ONNX model with Seldon’s prepackaged Triton server.
Interact with the model, run a greedy alg example (generate sentence completion)
Run load test using vegeta
Clean-up
Basic requirements
Helm v3.0.0+
A Kubernetes cluster running v1.13 or above (minkube / docker-for-windows work well if enough RAM)
kubectl v1.14+
Python 3.6+
Export HuggingFace TFGPT2LMHeadModel pre-trained model and save it locally
Convert the TensorFlow saved model to ONNX
Copy your model to a local MinIo
Setup MinIo
Use the provided notebook to install MinIo in your cluster and configure mc CLI tool. Instructions also online.
-- Note: You can use your prefer remote storage server (google/ AWS etc.)
Create a Bucket and store your model
Run Seldon in your kubernetes cluster
Follow the Seldon-Core Setup notebook to Setup a cluster with Ambassador Ingress or Istio and install Seldon Core
Deploy your model with Seldon pre-packaged Triton server
Interact with the model: get model metadata (a "test" request to make sure our model is available and loaded correctly)
Run prediction test: generate a sentence completion using GPT2 model - Greedy approach
Run Load Test / Performance Test using vegeta
Install vegeta, for more details take a look in vegeta official documentation
Generate vegeta target file contains "post" cmd with payload in the requiered structure
Clean-up
Last updated
Was this helpful?