Triton GPT2 Example
Steps:
Basic requirements
Export HuggingFace TFGPT2LMHeadModel pre-trained model and save it locally
Convert the TensorFlow saved model to ONNX
Copy your model to a local MinIo
Setup MinIo
Create a Bucket and store your model
Run Seldon in your kubernetes cluster
Deploy your model with Seldon pre-packaged Triton server
Interact with the model: get model metadata (a "test" request to make sure our model is available and loaded correctly)
Run prediction test: generate a sentence completion using GPT2 model - Greedy approach
Run Load Test / Performance Test using vegeta
Install vegeta, for more details take a look in vegeta official documentation
Generate vegeta target file contains "post" cmd with payload in the requiered structure
Clean-up
Last updated
Was this helpful?