HuggingFace Server
Thanks to our collaboration with the HuggingFace team you can now easily deploy your models from the HuggingFace Hub with Seldon Core.
We also support the high performance optimizations provided by the Transformer Optimum framework.
Pipeline parameters
The parameters that are available for you to configure include:
task
The transformer pipeline task
pretrained_model
The name of the pretrained model in the Hub
pretrained_tokenizer
Transformer name in Hub if different to the one provided with model
optimum_model
Boolean to enable loading model with Optimum framework
Simple Example
You can deploy a HuggingFace model by providing parameters to your pipeline.
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: gpt2-model
spec:
protocol: v2
predictors:
- graph:
name: transformer
implementation: HUGGINGFACE_SERVER
parameters:
- name: task
type: STRING
value: text-generation
- name: pretrained_model
type: STRING
value: distilgpt2
name: default
replicas: 1
Quantized & Optimized Models with Optimum
You can deploy a HuggingFace model loaded using the Optimum library by using the optimum_model
parameter.
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: gpt2-model
spec:
protocol: v2
predictors:
- graph:
name: transformer
implementation: HUGGINGFACE_SERVER
parameters:
- name: task
type: STRING
value: text-generation
- name: pretrained_model
type: STRING
value: distilgpt2
- name: optimum_model
type: BOOL
value: true
name: default
replicas: 1
Custom Model Example
You can deploy a custom HuggingFace model by providing the location of the model artefacts using the modelUri
field.
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: custom-tiny-stories-model
spec:
protocol: v2
predictors:
- graph:
name: transformer
implementation: HUGGINGFACE_SERVER
modelUri: gs://seldon-models/v1.18.0/huggingface/text-gen-custom-tiny-stories
parameters:
- name: task
type: STRING
value: text-generation
name: default
replicas: 1
Last updated
Was this helpful?