Triton GPT2 Example

In this notebook, we will run an example of text generation using GPT2 model exported from HuggingFace and deployed with Seldon's Triton pre-packed server. the example also covers converting the model to ONNX format. The implemented example below is of the Greedy approach for the next token prediction. more info: https://huggingface.co/transformers/model_doc/gpt2.html?highlight=gpt2

After we have the module deployed to Kubernetes, we will run a simple load test to evaluate the module inference performance.

Steps:

  1. Download pretrained GPT2 model from hugging face

  2. Convert the model to ONNX

  3. Store it in MinIo bucket

  4. Setup Seldon-Core in your kubernetes cluster

  5. Deploy the ONNX model with Seldon’s prepackaged Triton server.

  6. Interact with the model, run a greedy alg example (generate sentence completion)

  7. Run load test using vegeta

  8. Clean-up

Basic requirements

  • Helm v3.0.0+

  • A Kubernetes cluster running v1.13 or above (minkube / docker-for-windows work well if enough RAM)

  • kubectl v1.14+

  • Python 3.6+

Export HuggingFace TFGPT2LMHeadModel pre-trained model and save it locally

Convert the TensorFlow saved model to ONNX

Copy your model to a local MinIo

Setup MinIo

Use the provided notebook to install MinIo in your cluster and configure mc CLI tool. Instructions also online.

-- Note: You can use your prefer remote storage server (google/ AWS etc.)

Create a Bucket and store your model

Run Seldon in your kubernetes cluster

Follow the Seldon-Core Setup notebook to Setup a cluster with Ambassador Ingress or Istio and install Seldon Core

Deploy your model with Seldon pre-packaged Triton server

Interact with the model: get model metadata (a "test" request to make sure our model is available and loaded correctly)

Run prediction test: generate a sentence completion using GPT2 model - Greedy approach

Run Load Test / Performance Test using vegeta

Install vegeta, for more details take a look in vegeta official documentation

Generate vegeta target file contains "post" cmd with payload in the requiered structure

Clean-up

Last updated

Was this helpful?