MLServer CLI
The MLServer package includes a mlserver CLI designed to help with common tasks in a model’s lifecycle. You can see a high-level outline at any time via:
mlserver --helproot
Command-line interface to manage MLServer models.
root [OPTIONS] COMMAND [ARGS]...Options
--version(Default:False) Show the version and exit.
build
Build a Docker image for a custom MLServer runtime.
root build [OPTIONS] FOLDEROptions
-t,--tag<text>--no-cache(Default:False)
Arguments
FOLDERRequired argument
dockerfile
Generate a Dockerfile
root dockerfile [OPTIONS] FOLDEROptions
-i,--include-dockerignore(Default:False)
Arguments
FOLDERRequired argument
infer
Deprecated: This experimental feature will be removed in future work. Execute batch inference requests against V2 inference server.
Deprecated: This experimental feature will be removed in future work.
root infer [OPTIONS]Options
--url,-u<text>(Default:localhost:8080; Env:MLSERVER_INFER_URL) URL of the MLServer to send inference requests to. Should not contain http or https.--model-name,-m<text>(Required; Env:MLSERVER_INFER_MODEL_NAME) Name of the model to send inference requests to.--input-data-path,-i<path>(Required; Env:MLSERVER_INFER_INPUT_DATA_PATH) Local path to the input file containing inference requests to be processed.--output-data-path,-o<path>(Required; Env:MLSERVER_INFER_OUTPUT_DATA_PATH) Local path to the output file for the inference responses to be written to.--workers,-w<integer>(Default:10; Env:MLSERVER_INFER_WORKERS)--retries,-r<integer>(Default:3; Env:MLSERVER_INFER_RETRIES)--batch-size,-s<integer>(Default:1; Env:MLSERVER_INFER_BATCH_SIZE) Send inference requests grouped together as micro-batches.--binary-data,-b(Default:False; Env:MLSERVER_INFER_BINARY_DATA) Send inference requests as binary data (not fully supported).--verbose,-v(Default:False; Env:MLSERVER_INFER_VERBOSE) Verbose mode.--extra-verbose,-vv(Default:False; Env:MLSERVER_INFER_EXTRA_VERBOSE) Extra verbose mode (shows detailed requests and responses).--transport,-t<choice>(Options:rest|grpc; Default:rest; Env:MLSERVER_INFER_TRANSPORT) Transport type to use to send inference requests. Can be 'rest' or 'grpc' (not yet supported).--request-headers,-H<text>(Env:MLSERVER_INFER_REQUEST_HEADERS) Headers to be set on each inference request send to the server. Multiple options are allowed as: -H 'Header1: Val1' -H 'Header2: Val2'. When setting up as environmental provide as 'Header1:Val1 Header2:Val2'.--timeout<integer>(Default:60; Env:MLSERVER_INFER_CONNECTION_TIMEOUT) Connection timeout to be passed to tritonclient.--batch-interval<float>(Default:0; Env:MLSERVER_INFER_BATCH_INTERVAL) Minimum time interval (in seconds) between requests made by each worker.--batch-jitter<float>(Default:0; Env:MLSERVER_INFER_BATCH_JITTER) Maximum random jitter (in seconds) added to batch interval between requests.--use-ssl(Default:False; Env:MLSERVER_INFER_USE_SSL) Use SSL in communications with inference server.--insecure(Default:False; Env:MLSERVER_INFER_INSECURE) Disable SSL verification in communications. Use with caution.
init
Generate a base project template
root init [OPTIONS]Options
-t,--template<text>(Default:https://github.com/EthicalML/sml-security/)
start
Start serving a machine learning model with MLServer.
root start [OPTIONS] FOLDERArguments
FOLDERRequired argument
Last updated
Was this helpful?
