# Changelog

## [0.7.0](https://github.com/SeldonIO/llm-runtimes/releases/tag/0.7.0) - 2025-06-06

## Features

* Added support of agentic workflows via OpenAI API (e.g. tools, planning, reflection patterns).

## What's Changed

* Update CHANGELOG by [@github-actions](https://github.com/github-actions) in [#672](https://github.com/SeldonIO/llm-runtimes/pull/672)
* Bumped version to 0.6.0 in docs by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in [#673](https://github.com/SeldonIO/llm-runtimes/pull/673)
* Bumped version to 0.7.0.dev1 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in [#674](https://github.com/SeldonIO/llm-runtimes/pull/674)
* Improved rag example by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in [#675](https://github.com/SeldonIO/llm-runtimes/pull/675)
* build(deps-dev): Bump pytest-asyncio from 0.25.3 to 0.26.0 in /prompt-utils by [@dependabot](https://github.com/dependabot) in [#677](https://github.com/SeldonIO/llm-runtimes/pull/677)
* build(deps-dev): Bump syrupy from 4.9.0 to 4.9.1 in /prompt-utils by [@dependabot](https://github.com/dependabot) in [#676](https://github.com/SeldonIO/llm-runtimes/pull/676)
* Updated changelog for 0.5.0 and 0.6.0 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in [#681](https://github.com/SeldonIO/llm-runtimes/pull/681)
* Add files via upload by [@paulb-seldon](https://github.com/paulb-seldon) in [#683](https://github.com/SeldonIO/llm-runtimes/pull/683)
* Fixed async operation on the agentic workflow pipeline by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in [#684](https://github.com/SeldonIO/llm-runtimes/pull/684)
* Fix database sync for chatbot and agentic workflow by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in [#686](https://github.com/SeldonIO/llm-runtimes/pull/686)
* Included support for agents implemented with cyclic pipelines by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in [#685](https://github.com/SeldonIO/llm-runtimes/pull/685)
* Fixed tool args for openai and gemini by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in [#688](https://github.com/SeldonIO/llm-runtimes/pull/688)
* build(deps-dev): Bump wheel from 0.44.0 to 0.46.1 in /prompt-utils by [@dependabot](https://github.com/dependabot) in [#682](https://github.com/SeldonIO/llm-runtimes/pull/682)
* build(deps-dev): Bump flake8 from 7.1.2 to 7.2.0 in /prompt-utils by [@dependabot](https://github.com/dependabot) in [#680](https://github.com/SeldonIO/llm-runtimes/pull/680)
* build(deps-dev): Bump types-requests from 2.32.0.20250306 to 2.32.0.20250328 in /prompt-utils by [@dependabot](https://github.com/dependabot) in [#679](https://github.com/SeldonIO/llm-runtimes/pull/679)
* Planning agent pattern example by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in [#687](https://github.com/SeldonIO/llm-runtimes/pull/687)
* add agents docs changes by [@paulb-seldon](https://github.com/paulb-seldon) in [#689](https://github.com/SeldonIO/llm-runtimes/pull/689)
* FIx introduction links by [@paulb-seldon](https://github.com/paulb-seldon) in [#690](https://github.com/SeldonIO/llm-runtimes/pull/690)
* Change email to reach out to us by [@paulb-seldon](https://github.com/paulb-seldon) in [#691](https://github.com/SeldonIO/llm-runtimes/pull/691)
* Bumped MLServer to 1.7.0 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in [#693](https://github.com/SeldonIO/llm-runtimes/pull/693)
* Bumped st<4.2.0 and included upper bound for optimum by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in [#694](https://github.com/SeldonIO/llm-runtimes/pull/694)
* build(deps-dev): Bump types-requests from 2.32.0.20250328 to 2.32.0.20250515 in /prompt-utils by [@dependabot](https://github.com/dependabot) in [#695](https://github.com/SeldonIO/llm-runtimes/pull/695)
* Set release large runner for local runtime by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in [#696](https://github.com/SeldonIO/llm-runtimes/pull/696)
* Bump MLServer to 1.7.1.rc1 and LLMIS to 0.3.3 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in [#702](https://github.com/SeldonIO/llm-runtimes/pull/702)
* ci: Merge change for release 0.7.0 ([#702](https://github.com/SeldonIO/llm-runtimes/issues/702)) by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in [#703](https://github.com/SeldonIO/llm-runtimes/pull/703)
* Fixed docker file for local runtime by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in [#704](https://github.com/SeldonIO/llm-runtimes/pull/704)
* Fix docs for 0.7.0 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in [#705](https://github.com/SeldonIO/llm-runtimes/pull/705)
* Bumped MLServer to 1.7.1 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in [#706](https://github.com/SeldonIO/llm-runtimes/pull/706)
* ci: Merge change for release 0.7.0 \[2] by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in [#707](https://github.com/SeldonIO/llm-runtimes/pull/707)

**Full Changelog**: <https://github.com/SeldonIO/llm-runtimes/compare/0.6.0...0.7.0>

## [0.6.0](https://github.com/SeldonIO/llm-runtimes/releases/tag/0.6.0) - Mar 13, 2025

## Main Features

* New runtimes:
  * Sentence Transformers for text embeddings
  * Prompt Runtime - allows to reference the same model deployed locally with different prompts
* Support for conditional routing through Core 2 pipelines - this feature allows conditional data flows based on the output of the LLM enabling support for agentic workflows

## What's Changed

* Replaced local images with the ones from artifacts registry by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/606>
* Remove redundant server configs by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/607>
* Update Models Overview by [@paulb-seldon](https://github.com/paulb-seldon) in <https://github.com/SeldonIO/llm-runtimes/pull/609>
* Quickstart example by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/608>
* Fixed models and pipelines naming by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/612>
* Included upperbound on httpx by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/615>
* Clean docs by [@paulb-seldon](https://github.com/paulb-seldon) in <https://github.com/SeldonIO/llm-runtimes/pull/613>
* Remove PVC from docs examples by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/616>
* Updated images tag to 0.5.0 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/618>
* Fixed installation test image path by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/619>
* Cat the pipeline definition before deploying it by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/621>
* Fixed typo in azure docs by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/622>
* Fixed name of vector db env prefix by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/628>
* Sentence transformers support by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/626>
* Simplify content when content is a single text message by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/630>
* Refactored extra settings init by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/631>
* Passed database as string in get\_db\_interface by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/633>
* Fixed memory psql db url by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/634>
* Prompt runtime for local llms by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/627>
* Docs for local-embedding runtime by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/632>
* Renamed url to infer\_uri in PromptRuntime by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/641>
* Included output postprocessing for conditional routing by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/637>
* Wrote prompt runtime docs by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/636>
* Wrote docs for routing by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/643>
* Renamed routing to agents by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/645>
* build(deps-dev): Bump black from 24.10.0 to 25.1.0 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/644>
* build(deps-dev): Bump isort from 5.13.2 to 6.0.0 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/640>
* build(deps-dev): Bump tox from 4.23.0 to 4.24.1 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/639>
* build(deps-dev): Bump syrupy from 4.7.2 to 4.8.1 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/635>
* build(deps-dev): Bump mypy from 1.13.0 to 1.14.1 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/624>
* build(deps-dev): Bump pytest-asyncio from 0.24.0 to 0.25.3 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/642>
* Implemented routing for prompt runtime by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/647>
* Included installation steps for prompt runtime by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/649>
* Guardrails docs example by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/646>
* Bumped mlserver to 1.7.0.rc1 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/650>
* bump LLMIS to 0.3.0 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/651>
* Workflow for prompt-utils runtime by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/652>
* Workflow for prompt-utils runtime (#652) by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/653>
* Fixed capabilities for prompt utils server by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/654>
* Docs for http sse streaming by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/660>
* Bumped LLMIS to 0.3.1 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/666>
* build(deps-dev): Bump flake8 from 7.1.1 to 7.1.2 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/657>
* build(deps-dev): Bump isort from 6.0.0 to 6.0.1 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/659>
* build(deps-dev): Bump pytest from 8.3.3 to 8.3.5 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/662>
* build(deps-dev): Bump types-requests from 2.32.0.20241016 to 2.32.0.20250306 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/663>
* build(deps-dev): Bump tox from 4.24.1 to 4.24.2 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/664>
* build(deps-dev): Bump syrupy from 4.8.1 to 4.9.0 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/665>
* ci: Merge change for release 0.6.0 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/667>
* Fix docs for 0.6.0.rc3 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/668>
* Fix docs for 0.6.0.rc3 (#668) by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/669>
* Bump mlserver to 1.7.0.rc2 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/670>
* Bump mlserver to 1.7.0.rc2 (#670) by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/671>

**Full Changelog**: <https://github.com/SeldonIO/llm-runtimes/compare/0.5.0...0.6.0>

## [0.5.0](https://github.com/SeldonIO/llm-runtimes/releases/tag/0.5.0) - Dec 4, 2024

## Main Features

* New runtimes:
  * Gemini
  * VectorDB with support for [Qdrant](https://qdrant.tech/) and [PGVector](https://github.com/pgvector/pgvector)
* Streaming support for OpenAI and Gemini
* Standardized IO accross all runtimes (i.e. prompting)
* RAG support for the Local Runtime

## What's Changed

* Update CHANGELOG by [@github-actions](https://github.com/github-actions) in <https://github.com/SeldonIO/llm-runtimes/pull/403>
* Bump openai 1.35.3 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/402>
* Included streaming support for OpenAI by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/404>
* build(deps-dev): Bump tox from 4.15.1 to 4.16.0 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/409>
* build(deps-dev): Bump tox from 4.15.1 to 4.16.0 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/408>
* build(deps-dev): Bump tox from 4.15.1 to 4.16.0 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/407>
* build(deps): Bump orjson from 3.10.4 to 3.10.6 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/406>
* build(deps-dev): Bump mypy from 1.10.0 to 1.10.1 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/398>
* build(deps-dev): Bump mypy from 1.10.0 to 1.10.1 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/399>
* build(deps-dev): Bump mypy from 1.10.0 to 1.10.1 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/397>
* build(deps-dev): Bump types-requests from 2.32.0.20240602 to 2.32.0.20240622 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/396>
* build(deps-dev): Bump types-requests from 2.32.0.20240602 to 2.32.0.20240622 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/395>
* build(deps-dev): Bump types-requests from 2.32.0.20240602 to 2.32.0.20240622 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/394>
* build(deps-dev): Bump flake8 from 7.0.0 to 7.1.0 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/387>
* build(deps-dev): Bump flake8 from 7.0.0 to 7.1.0 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/386>
* build(deps-dev): Bump flake8 from 7.0.0 to 7.1.0 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/385>
* build(deps): Bump urllib3 from 2.2.1 to 2.2.2 in /docs by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/384>
* build(deps): Bump urllib3 from 2.2.1 to 2.2.2 in /local/requirements by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/383>
* Add a `DISABLE_CYTHON` environment variable for development purposes by @jklaise in <https://github.com/SeldonIO/llm-runtimes/pull/411>
* Do not exclude packages when installing in dev mode without Cython by @jklaise in <https://github.com/SeldonIO/llm-runtimes/pull/412>
* added the create the servers pieces to the documentation by @joshsgoldstein in <https://github.com/SeldonIO/llm-runtimes/pull/416>
* Feat/gemini api runtime by @jklaise in <https://github.com/SeldonIO/llm-runtimes/pull/420>
* Migration to GitBook and General Doc Improvements by @ramonpzg in <https://github.com/SeldonIO/llm-runtimes/pull/312>
* Update dependabot.yml to remove docs scan by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/427>
* fix: CI failing because because of a change of permission by[@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/429>
* fix: remove docs in version script by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/430>
* bump versions from `0.3.0` to `0.5.0.dev` by @mauicv in <https://github.com/SeldonIO/llm-runtimes/pull/428>
* Fix/gemini grpc bytes decoding by @jklaise in <https://github.com/SeldonIO/llm-runtimes/pull/434>
* Add types tensor to Gemini output by @jklaise in <https://github.com/SeldonIO/llm-runtimes/pull/435>
* fix: add security workflow for llm-runtimes by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/442>
* fix: move workflow to workflows directory by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/443>
* fix: use one big security scan task by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/444>
* fix: add build base to local docker image build step by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/458>
* fix: Update security.yml to exclude our runtimes from 3rd party deps by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/465>
* Feat/gemini multi modal input by @jklaise in <https://github.com/SeldonIO/llm-runtimes/pull/459>
* Streaming support for Gemini by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/466>
* Fix setting stream flag for other tasks than completion and chat by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/474>
* build(deps-dev): Bump black from 24.4.2 to 24.8.0 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/439>
* build(deps-dev): Bump black from 24.4.2 to 24.8.0 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/440>
* build(deps-dev): Bump black from 24.4.2 to 24.8.0 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/441>
* build(deps-dev): Bump flake8 from 7.1.0 to 7.1.1 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/449>
* Included VectorDB runtime by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/480>
* build(deps-dev): Bump flake8 from 7.1.0 to 7.1.1 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/445>
* build(deps-dev): Bump wheel from 0.43.0 to 0.44.0 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/450>
* build(deps-dev): Bump flake8 from 7.1.0 to 7.1.1 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/454>
* build(deps): Bump orjson from 3.10.6 to 3.10.7 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/460>
* build(deps-dev): Bump mypy from 1.10.1 to 1.11.2 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/473>
* build(deps-dev): Bump wheel from 0.43.0 to 0.44.0 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/446>
* build(deps-dev): Bump black from 24.4.2 to 24.8.0 in /vector-db by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/499>
* build(deps-dev): Bump flake8 from 7.1.0 to 7.1.1 in /vector-db by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/497>
* build(deps-dev): Bump tox from 4.16.0 to 4.18.1 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/481>
* build(deps-dev): Bump tox from 4.16.0 to 4.18.1 in /vector-db by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/498>
* build(deps-dev): Bump types-requests from 2.32.0.20240622 to 2.32.0.20240907 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/500>
* build(deps): Update google-generativeai requirement from ==0.7.\* to >=0.7,<0.9 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/489>
* build(deps-dev): Bump tox from 4.16.0 to 4.18.1 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/485>
* build(deps-dev): Bump pytest-postgresql from 6.0.0 to 6.1.1 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/479>
* build(deps): Bump sqlalchemy from 2.0.31 to 2.0.34 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/476>
* build(deps-dev): Bump wheel from 0.43.0 to 0.44.0 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/452>
* build(deps-dev): Bump types-requests from 2.32.0.20240622 to 2.32.0.20240907 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/484>
* build(deps-dev): Bump types-requests from 2.32.0.20240622 to 2.32.0.20240907 in /vector-db by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/494>
* build(deps-dev): Bump mypy from 1.10.1 to 1.11.2 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/472>
* build(deps-dev): Bump mypy from 1.10.1 to 1.11.2 in /vector-db by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/495>
* build(deps-dev): Bump wheel from 0.43.0 to 0.44.0 in /vector-db by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/493>
* build(deps-dev): Bump tox from 4.16.0 to 4.18.1 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/483>
* build(deps-dev): Bump types-requests from 2.32.0.20240622 to 2.32.0.20240907 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/482>
* Update security.yml to include vector-db for code scan by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/502>
* build(deps-dev): Bump syrupy from 4.6.1 to 4.7.1 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/470>
* build(deps-dev): Bump mypy from 1.10.1 to 1.11.2 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/471>
* Bumped MLServer to 1.6.1 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/501>
* Bumped llmis to 0.2.0 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/503>
* Included missing files for qdrant by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/506>
* Included pgvector support by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/505>
* build(deps-dev): Bump tox from 4.18.1 to 4.19.0 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/516>
* build(deps-dev): Bump tox from 4.18.1 to 4.19.0 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/515>
* build(deps-dev): Bump tox from 4.18.1 to 4.19.0 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/514>
* build(deps-dev): Bump tox from 4.18.1 to 4.19.0 in /vector-db by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/513>
* build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20240914 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/511>
* build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20240914 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/510>
* build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20240914 in /vector-db by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/508>
* build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20240914 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/509>
* build(deps): Bump sqlalchemy from 2.0.34 to 2.0.35 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/512>
* build(deps-dev): Bump pytest from 7.3.1 to 8.3.3 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/490>
* build(deps-dev): Bump pytest from 7.3.1 to 8.3.3 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/488>
* build(deps-dev): Bump pytest from 7.3.1 to 8.3.3 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/487>
* build(deps-dev): Bump pytest-asyncio from 0.23.7 to 0.24.0 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/469>
* build(deps-dev): Bump pytest-asyncio from 0.23.7 to 0.24.0 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/468>
* build(deps-dev): Bump pytest-asyncio from 0.23.7 to 0.24.0 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/517>
* Prototype/add prompt and preprocessing by @mauicv in <https://github.com/SeldonIO/llm-runtimes/pull/461>
* Add gitbook docs to master branch by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/548>
* Integrate 461 in memory by @mauicv in <https://github.com/SeldonIO/llm-runtimes/pull/507>
* Moved prompt-utils in requirements.txt by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/549>
* Integrate #461 in API runtime by @mauicv in <https://github.com/SeldonIO/llm-runtimes/pull/486>
* Prototype integrate 461 in local by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/557>
* Prototype integrate 461 in vector db by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/564>
* build(deps-dev): Bump pytest from 7.3.1 to 8.3.3 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/527>
* build(deps-dev): Bump pytest-asyncio from 0.23.7 to 0.24.0 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/526>
* build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /vector-db by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/528>
* build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/529>
* build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/531>
* build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/532>
* build(deps-dev): Bump tox from 4.19.0 to 4.21.2 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/544>
* build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/530>
* build(deps-dev): Bump tox from 4.19.0 to 4.21.2 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/545>
* build(deps-dev): Bump types-requests from 2.32.0.20240914 to 2.32.0.20241016 in /vector-db by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/565>
* build(deps-dev): Bump syrupy from 4.7.1 to 4.7.2 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/555>
* build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20241016 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/566>
* Rag support for local runtime by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/570>
* build(deps-dev): Bump tox from 4.19.0 to 4.21.2 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/546>
* build(deps-dev): Bump black from 24.8.0 to 24.10.0 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/551>
* build(deps-dev): Bump black from 24.8.0 to 24.10.0 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/553>
* build(deps-dev): Bump tox from 4.19.0 to 4.23.0 in /vector-db by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/567>
* Fix memory system prompt by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/569>
* Runtime settings consistency by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/572>
* build(deps-dev): Bump black from 24.8.0 to 24.10.0 in /vector-db by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/550>
* build(deps-dev): Bump syrupy from 4.7.1 to 4.7.2 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/552>
* build(deps-dev): Bump black from 24.8.0 to 24.10.0 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/554>
* build(deps): Update qdrant-client requirement from <1.12.0 to <1.13.0 in /vector-db by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/556>
* build(deps-dev): Bump mypy from 1.11.2 to 1.12.0 in /vector-db by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/558>
* build(deps-dev): Bump mypy from 1.11.2 to 1.12.0 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/559>
* build(deps-dev): Bump mypy from 1.11.2 to 1.12.0 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/561>
* build(deps-dev): Bump tox from 4.18.1 to 4.23.0 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/568>
* build(deps-dev): Bump mypy from 1.10.1 to 1.13.0 in /prompt-utils by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/574>
* Default model type initialization in prompt utils by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/573>
* Renamed mlserver-llm-openai to mlserver-llm-api by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/577>
* Fix openai setting config by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/578>
* Updated openai docs with IO by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/579>
* OpenAI function calling docs by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/580>
* IO gemini docs by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/581>
* Refactored the azure docs by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/584>
* IO memory docs by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/586>
* IO vector-db docs by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/587>
* IO local docs - chat model by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/588>
* IO local mms docs by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/589>
* IO local chat template by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/591>
* fix(docs): pull changes from docs-master by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/583>
* IO local quantization docs by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/590>
* IO chat bot by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/592>
* Fixed tensor names for vector-db example by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/597>
* Updated azure model settings and image by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/598>
* Improved local server manifest by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/599>
* Included wait ready for models, pipelines, and deployments by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/600>
* Updated and added docs refs by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/596>
* Updated installation docs by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/601>
* Removed k8s example by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/602>
* Docs for monitoring by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/593>
* Updated runtimes docs intro by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/603>
* Docs restructure by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/604>
* ci: Merge change for release 0.5.0 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/605>

**Full Changelog**: <https://github.com/SeldonIO/llm-runtimes/commits/0.5.0>

## [0.4.0](https://github.com/SeldonIO/llm-runtimes/releases/tag/0.4.0) - 27 Jun 2024

## Main Features

* Code transpilation using `cython`, which allows code obfuscation
* Output streaming support for local runtime, which allows faster time to first token back to the user
* SQL backend for memory runtime, which allows the use of dbs such as postgres or mysql to store chat messages
* Support for MMS, which allows the use of multiple models on the same GPU (limited support)

## What's Changed

* build(ci): Use a correct example by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/259>
* docs: Remove link to GitHub repository by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/260>
* build: Add license by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/261>
* feat(local): Bump LLMIS to `0.0.1.rc2` by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/263>
* build(deps): bump idna from 3.6 to 3.7 in /docs by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/265>
* docs: Correct link to memory examples by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/262>
* build(deps-dev): bump black from 24.3.0 to 24.4.0 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/266>
* build(deps-dev): bump black from 24.3.0 to 24.4.0 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/267>
* build(deps-dev): bump black from 24.3.0 to 24.4.0 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/269>
* build(deps-dev): bump black from 24.3.0 to 24.4.0 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/268>
* build(deps): bump sphinx-autobuild from 2024.2.4 to 2024.4.13 in /docs by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/270>
* Remove duplicate docs examples by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/271>
* docs(local): Add Core 2 model examples by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/242>
* docs: Add changelog by [@github-actions](https://github.com/github-actions) in <https://github.com/SeldonIO/llm-runtimes/pull/272>
* release: placeholder for promotion action by [@RafalSkolasinski](https://github.com/RafalSkolasinski) in <https://github.com/SeldonIO/llm-runtimes/pull/273>
* build(deps): bump sphinx-autobuild from 2024.4.13 to 2024.4.16 in /docs by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/275>
* release: promote images from dev to production registry by [@RafalSkolasinski](https://github.com/RafalSkolasinski) in <https://github.com/SeldonIO/llm-runtimes/pull/274>
* feat(docs): Add e2e example by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/126>
* Bugfix/add docs pr changes by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/277>
* build(deps): bump notebook from 7.1.2 to 7.1.3 in /docs by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/278>
* fix(local): Remove dupe `kwargs` setting by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/279>
* Fix links and MakeFile by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/280>
* Add minor note on MinIO to chatbot example by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/282>
* Update the image in the chatbot example doc by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/284>
* build(deps): bump myst-parser from 2.0.0 to 3.0.0 in /docs by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/285>
* fix(deps): Use LLMIS 0.0.1 by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/296>
* docs: Add installation by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/281>
* build(deps-dev): bump black from 24.4.0 to 24.4.1 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/294>
* build(deps-dev): bump mypy from 1.9.0 to 1.10.0 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/287>
* build(deps-dev): bump black from 24.4.0 to 24.4.1 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/289>
* build(deps-dev): bump black from 24.4.0 to 24.4.2 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/297>
* build(deps-dev): bump mypy from 1.9.0 to 1.10.0 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/291>
* build(deps-dev): bump mypy from 1.9.0 to 1.10.0 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/295>
* build(deps-dev): bump mypy from 1.9.0 to 1.10.0 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/290>
* build(deps): update torch requirement from <2.3.0,>=2.0.1 to >=2.0.1,<2.4.0 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/293>
* build(deps-dev): bump black from 24.4.0 to 24.4.2 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/299>
* Update CHANGELOG by [@github-actions](https://github.com/github-actions) in <https://github.com/SeldonIO/llm-runtimes/pull/300>
* Bump version to 0.3.0 by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/301>
* docs(api): Fix header level by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/302>
* Add section at end of e2e example for changing api to local rt by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/283>
* Wholesale docs edits pb by [@paulb-seldon](https://github.com/paulb-seldon) in <https://github.com/SeldonIO/llm-runtimes/pull/298>
* run CI tests only for version of Python that is used within the Docker image by [@RafalSkolasinski](https://github.com/RafalSkolasinski) in <https://github.com/SeldonIO/llm-runtimes/pull/303>
* deprecate and archive deepspeed by [@RafalSkolasinski](https://github.com/RafalSkolasinski) in <https://github.com/SeldonIO/llm-runtimes/pull/304>
* fix: Remove references to removed runtime by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/313>
* Update API examples for docker access and kubernetes deployment by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/321>
* Included local runtime limitations section. by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/320>
* use {{ current-version }} in example manifest by [@RafalSkolasinski](https://github.com/RafalSkolasinski) in <https://github.com/SeldonIO/llm-runtimes/pull/319>
* Feature/add prompt note by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/324>
* Docs example for quantization and tensor parallelism by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/323>
* Add local model loading from settings relative path by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/329>
* docs: Update docs for azure openai by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/326>
* build(deps): bump notebook from 7.1.3 to 7.2.0 in /docs by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/330>
* build(deps): bump jinja2 from 3.1.3 to 3.1.4 in /docs by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/327>
* build(deps): bump myst-parser from 3.0.0 to 3.0.1 in /docs by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/318>
* build(deps-dev): bump pytest from 8.1.1 to 8.2.0 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/314>
* build(deps-dev): bump pytest from 8.1.1 to 8.2.0 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/315>
* build(deps-dev): bump pytest from 8.1.1 to 8.2.0 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/316>
* build(deps-dev): bump tox from 4.14.2 to 4.15.0 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/310>
* build(deps-dev): bump tox from 4.14.2 to 4.15.0 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/306>
* build(deps-dev): bump black from 24.4.1 to 24.4.2 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/308>
* build(deps-dev): bump tox from 4.14.2 to 4.15.0 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/307>
* build(deps): bump requests from 2.31.0 to 2.32.0 in /docs by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/337>
* build(deps-dev): bump pytest-asyncio from 0.23.6 to 0.23.7 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/336>
* build(deps-dev): bump pytest-asyncio from 0.23.6 to 0.23.7 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/334>
* build(deps-dev): bump pytest-asyncio from 0.23.6 to 0.23.7 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/332>
* build(deps-dev): bump pytest from 8.2.0 to 8.2.1 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/331>
* build(deps-dev): bump pytest from 8.2.0 to 8.2.1 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/333>
* build(deps-dev): bump pytest from 8.2.0 to 8.2.1 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/335>
* build(deps-dev): Bump types-requests from 2.31.0.20240406 to 2.32.0.20240521 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/339>
* build(deps-dev): Bump types-requests from 2.31.0.20240406 to 2.32.0.20240521 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/340>
* build(deps-dev): Bump types-requests from 2.31.0.20240406 to 2.32.0.20240521 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/338>
* feat(memory): Add postgres backend by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/134>
* Included documentation and examples for mms by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/347>
* build(deps-dev): Bump types-requests from 2.32.0.20240521 to 2.32.0.20240523 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/344>
* build(deps-dev): Bump types-requests from 2.32.0.20240521 to 2.32.0.20240523 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/346>
* build(deps): Bump sphinx-design from 0.5.0 to 0.6.0 in /docs by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/342>
* Pin open telemetry sdk package to fix ci by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/353>
* Ensure db portability in memory rt by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/352>
* build(deps-dev): Bump types-requests from 2.32.0.20240523 to 2.32.0.20240602 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/359>
* build(deps-dev): Bump types-requests from 2.32.0.20240521 to 2.32.0.20240602 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/360>
* build(deps-dev): Bump types-requests from 2.32.0.20240523 to 2.32.0.20240602 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/362>
* Update incorrect prompt settings in model-settings.json cell by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/357>
* build(deps): Bump opentelemetry-sdk from 1.24.0 to 1.25.0 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/355>
* build(deps): Bump opentelemetry-sdk from 1.24.0 to 1.25.0 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/356>
* Revert "build(deps): Bump opentelemetry-sdk from 1.24.0 to 1.25.0 in /memory" by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/363>
* Revert "build(deps): Bump opentelemetry-sdk from 1.24.0 to 1.25.0 in /api" by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/364>
* Remove pinned opentelemetry dep for memory, api and local rt by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/365>
* build(deps-dev): Bump pytest from 8.2.1 to 8.2.2 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/368>
* build(deps-dev): Bump pytest from 8.2.1 to 8.2.2 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/367>
* build(deps-dev): Bump pytest from 8.2.1 to 8.2.2 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/366>
* SQL backend docs by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/358>
* docs(local): Add output for example by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/369>
* docs(local): Fix dupe `image` field by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/370>
* Migrate to pydantic v2 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/349>
* Bumped mlserver to 1.6.0.dev2 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/380>
* Cython support for llm-runtimes by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/354>
* Streaming support for local runtime by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/351>
* build(deps-dev): Bump tox from 4.15.0 to 4.15.1 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/375>
* build(deps-dev): Bump tox from 4.15.0 to 4.15.1 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/373>
* build(deps-dev): Bump tox from 4.15.0 to 4.15.1 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/372>
* build(deps): Bump orjson from 3.10.3 to 3.10.4 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/381>
* build(deps): Bump tornado from 6.4 to 6.4.1 in /docs by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/374>
* build(deps): Bump notebook from 7.2.0 to 7.2.1 in /docs by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/377>
* Rename db option by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/371>
* Bumped llmis to 0.1.1 by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/389>
* use mlserver 1.6.0.rc1 by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/390>
* build(deps): Bump sqlalchemy from 2.0.30 to 2.0.31 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/388>
* ci: Merge changes from master to release branch by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/391>
* Update tests.yml by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/392>
* ci: Merge changes from master to release branch by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/393>
* use mlserver 1.6.0 by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/400>
* ci: Merge changes from master to release branch by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/401>

## New Contributors

* [@paulb-seldon](https://github.com/paulb-seldon) made their first contribution in <https://github.com/SeldonIO/llm-runtimes/pull/298>

**Full Changelog**: <https://github.com/SeldonIO/llm-runtimes/commits/0.4.0>

[Changes](https://github.com/SeldonIO/llm-runtimes/compare/0.3.0...0.4.0)

## [0.3.0](https://github.com/SeldonIO/llm-runtimes/releases/tag/0.3.0) - 26 Apr 2024

## Overview

**Content**

This is the initial release of Seldon’s LLM Module. The LLM Module provides a set of MLServer runtimes to enable serving and deployment of large language (and other GenAI) model applications using Seldon Core v2 models and pipelines. The initial set of runtimes includes an API gateway to access LLM solutions hosted by 3rd parties, a Local self-hosted for on-prem LLM deployment, and Conversational Memory to build stateful LLM applications.

**API**

There are many exciting companies who provide access to LLMs as a service. They all vary in their APIs. This runtime provides a unified way to access them, starting with OpenAI. This requires egress access to OpenAI’s endpoints.

**Local**

You can use both open foundational or fine-tuned models, such as those from Mistral and Meta, or your own custom models, trained from scratch. There’s many different performance characteristics of the different ways to run these models. We provide a unified way to access leading backends, including Transformers, vLLM, and DeepSpeed.

**Conversational Memory**

This facilitates the building of stateful chatbots that save conversational history. With this, you’re able to memorise (aka store) conversations long-term. This means that the context is kept and is able to be used by the different models through the **API** and **Local** runtimes.

We’ve carefully benchmarked each runtime to ensure that there is minimal overhead, and in some circumstances, have made our own improvements on top of the supported backends.

## What's Changed

* Add tox tests and github workflow by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/2>
* Updates to allow runtime to be built for MLServer by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/4>
* Refactor openai runtime directory name by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/5>
* Deepspeed runtime by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/8>
* Initial commit for Microsoft Guidance Runtime by [@ukclivecox](https://github.com/ukclivecox) in <https://github.com/SeldonIO/llm-runtimes/pull/9>
* Update workflow to add linting for Guidance by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/12>
* Update notebook and readme by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/13>
* Update notebook (2) by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/14>
* Add mms notebook for deepspeed by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/15>
* Change title of MMS notebook by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/16>
* Update deepspeed mii to 0.0.6 by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/21>
* Return root error if failing to call openai endpoints by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/25>
* Add MS azure openai support by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/23>
* Update packages (mlserver 1.4.rc6 and mii 0.0.8) by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/29>
* Add parsing json string for HF params by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/31>
* Add extra args in inference config to be processed by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/35>
* Remove reference to apache2 in codebase by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/36>
* Openai images generation model fix + other small improvements by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/38>
* CI for building docker images by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/40>
* Add ability to specify version for deepspeed image by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/41>
* Fix bytes payload in openai runtime by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/43>
* upgrade deepspeed mii to `0.1.0` by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/46>
* Remove guidance by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/50>
* Release workflow by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/51>
* CI fix using envar by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/52>
* Adjust envar CI by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/53>
* CI fix: add env to name for action by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/54>
* Feature/add periodic ci and dependabot by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/58>
* Bump types-requests from 2.28.11.5 to 2.31.0.20240125 in /llm-api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/80>
* Bump types-requests from 2.28.11.5 to 2.31.0.20240125 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/79>
* Bump pytest-mock from 3.10.0 to 3.12.0 in /llm-api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/75>
* Bump pytest-cases from 3.6.14 to 3.8.2 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/71>
* Bump pytest-asyncio from 0.21.0 to 0.23.3 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/67>
* Bump deepspeed-mii from 0.1.0 to 0.2.0 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/66>
* Bump mypy from 1.2.0 to 1.8.0 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/62>
* Bump flake8 from 6.0.0 to 7.0.0 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/64>
* Bump flake8 from 6.0.0 to 7.0.0 in /llm-api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/74>
* Update dependabot.yml to include dockerfiles by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/85>
* Bump pytest-asyncio from 0.23.3 to 0.23.4 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/88>
* Update torch requirement from \~=2.0.1 to >=2.0.1,<2.2.0 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/86>
* Bump pytest from 7.3.1 to 8.0.0 in /llm-api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/83>
* Bump pytest-mock from 3.10.0 to 3.12.0 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/60>
* Bump mypy from 1.2.0 to 1.8.0 in /llm-api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/76>
* Bump pytest-cases from 3.6.14 to 3.8.2 in /llm-api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/69>
* docs(readme): Correct spelling mistake by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/96>
* \[File System] Conversational Buffer Memory Runtime by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/56>
* Compatibility changes to openai-runtime for memory runtime requirements by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/55>
* fix: Use consistent package naming by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/103>
* fix(build): Missed folder name for prefix change by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/104>
* Bump pytest-asyncio from 0.23.4 to 0.23.5 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/99>
* Add pydantic validation for api base runtime by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/112>
* build(ci): Migrate from Node 16 actions to at least Node 20 by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/113>
* build(ci): Correct disk usage maximising values by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/114>
* Bump types-requests from 2.31.0.20240125 to 2.31.0.20240218 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/110>
* Bump black from 23.3.0 to 24.2.0 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/101>
* Bump pytest from 7.3.1 to 8.0.1 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/108>
* Update torch requirement from <2.2.0,>=2.0.1 to >=2.0.1,<2.3.0 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/93>
* Bump deepspeed-mii from 0.2.0 to 0.2.2 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/106>
* build(dependabot): Correct the config and update DeepSeed README by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/116>
* fix(api): Use latest formatting by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/115>
* Bump pytest-asyncio from 0.21.0 to 0.23.5 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/120>
* Bump pytest from 8.0.0 to 8.0.1 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/124>
* Bump types-requests from 2.31.0.20240125 to 2.31.0.20240218 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/123>
* Remove depreciated edits endpoint by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/125>
* Fix incorrect window size on bulk upload of messages by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/129>
* Bump seldonio/mlserver from 1.4.0.rc5-slim to 1.4.0-slim in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/131>
* Bump seldonio/mlserver from 1.4.0.rc5-slim to 1.4.0-slim in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/130>
* Bump pytest from 8.0.1 to 8.0.2 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/128>
* Bump pytest from 8.0.1 to 8.0.2 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/127>
* build: Bump to latest MLServer release by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/133>
* Bump seldonio/mlserver from 1.4.0-slim to 1.5.0-slim in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/137>
* Bump seldonio/mlserver from 1.4.0-slim to 1.5.0-slim in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/136>
* Add memory docs by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/107>
* Bump opentelemetry-instrumentation from 0.39b0 to 0.41b0 in /docs by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/140>
* Bump starlette from 0.27.0 to 0.36.2 in /docs by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/141>
* Bump pytest-asyncio from 0.23.5 to 0.23.5.post1 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/143>
* Bump pytest-cases from 3.8.2 to 3.8.3 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/144>
* Bump mypy from 1.8.0 to 1.9.0 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/145>
* Bump pytest-cases from 3.8.2 to 3.8.3 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/146>
* Bump pytest-asyncio from 0.23.5 to 0.23.5.post1 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/147>
* Bump deepspeed-mii from 0.2.2 to 0.2.3 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/149>
* Bump fastapi from 0.97.0 to 0.109.1 in /docs by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/142>
* Bump mypy from 1.8.0 to 1.9.0 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/148>
* feat(local): Initial support by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/100>
* feat(memory):Refactor filesys interface by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/154>
* feat(docs): Add OpenAI runtime docs by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/109>
* feat(local): Add prompting by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/155>
* feat(local): Add text generation by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/156>
* fix(local): Remove uneeded files by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/159>
* build(ci): Fix testing by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/158>
* build(deps-dev): bump black from 24.2.0 to 24.3.0 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/163>
* fix(local): Correctly pass parameters and check for lengths by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/161>
* build(deps-dev): bump black from 24.2.0 to 24.3.0 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/162>
* Bump types-requests from 2.31.0.20240218 to 2.31.0.20240311 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/150>
* Bump types-requests from 2.31.0.20240218 to 2.31.0.20240311 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/153>
* refactor(local): Support choosing LLMIS runtimes by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/165>
* refactor(local): Use async generation by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/166>
* build(ci): Add missing images by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/168>
* fix(deps): Add dependabot workflow + upgrade mlserver to 1.5 by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/171>
* build(deps-dev): bump tox from 4.4.12 to 4.14.1 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/189>
* build(deps-dev): bump mypy from 1.8.0 to 1.9.0 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/183>
* build(deps-dev): Bump to latest for local by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/190>
* feat(local): Add vLLM and DeepSpeed examples and tests by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/191>
* refactor(local): Don't require the model ready to unload by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/192>
* fix(local): Remove unused function stub by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/193>
* docs: Correct install target by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/198>
* docs(local): Update README with newer examples by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/199>
* fix(local): Pin LLMIS commit by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/200>
* refactor(local): Share response collection by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/194>
* refactor(local): Return correct error types by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/195>
* refactor(local): Appropriately log for debugging by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/196>
* feat(local): relative prompt path in local rt model-settings by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/203>
* Rename memory image by [@mauicv](https://github.com/mauicv) in <https://github.com/SeldonIO/llm-runtimes/pull/208>
* fix(ci): fix GH worker space issue by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/209>
* fix(build): fix failing build for local runtime by [@RafalSkolasinski](https://github.com/RafalSkolasinski) in <https://github.com/SeldonIO/llm-runtimes/pull/202>
* fix: upgrade image to cuda 12.1 by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/214>
* build(deps-dev): bump pytest-mock from 3.12.0 to 3.14.0 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/205>
* build(deps-dev): bump tox from 4.14.1 to 4.14.2 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/213>
* build(deps-dev): bump mypy from 1.2.0 to 1.9.0 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/179>
* Bump pytest from 8.0.2 to 8.1.1 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/151>
* build(deps-dev): bump tox from 4.4.12 to 4.14.2 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/212>
* Bump pytest from 8.0.2 to 8.1.1 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/152>
* build(deps-dev): bump pytest-asyncio from 0.23.5.post1 to 0.23.6 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/172>
* build(deps-dev): bump pytest-cases from 3.6.14 to 3.8.4 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/178>
* build(deps-dev): bump pytest-asyncio from 0.23.5.post1 to 0.23.6 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/180>
* build(deps-dev): bump black from 23.3.0 to 24.3.0 in /memory/requirements by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/201>
* build(deps-dev): bump pytest-cases from 3.8.3 to 3.8.4 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/169>
* build(deps-dev): bump pytest-asyncio from 0.21.0 to 0.23.6 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/182>
* build(deps-dev): bump pytest-mock from 3.12.0 to 3.14.0 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/204>
* build(deps-dev): bump pytest from 7.3.1 to 8.1.1 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/177>
* build(deps-dev): bump pytest-mock from 3.12.0 to 3.14.0 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/207>
* build(deps-dev): bump flake8 from 6.0.0 to 7.0.0 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/174>
* build(deps-dev): bump pytest-cases from 3.8.3 to 3.8.4 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/170>
* build(deps-dev): bump tox from 4.4.12 to 4.14.2 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/210>
* build(deps-dev): bump types-requests from 2.28.11.5 to 2.31.0.20240311 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/181>
* build(deps-dev): bump tox from 4.4.12 to 4.14.2 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/211>
* build(deps-dev): bump pytest-mock from 3.10.0 to 3.14.0 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/206>
* build(local): Remove Pytorch dev dependency by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/217>
* build(docs): Add Dependabot by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/218>
* docs: New structure and Local content by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/216>
* build(deps): bump notebook from 7.1.1 to 7.1.2 in /docs by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/220>
* build: Use wheel for distribution by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/222>
* build(docs): Ignore common virtualenv directories by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/223>
* docs: Continue restructure and add reference for Local by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/224>
* docs(local): Add inference requests and backends' model settings by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/225>
* docs: Add analytics, fix quoted sidebar, and add versions by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/227>
* fix(local): Allow Pydantic models' validators reuse by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/228>
* docs: Add ReadTheDocs config by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/233>
* build(deps-dev): bump types-requests from 2.31.0.20240311 to 2.31.0.20240403 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/234>
* build(deps-dev): bump types-requests from 2.31.0.20240311 to 2.31.0.20240403 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/235>
* build(deps-dev): bump types-requests from 2.31.0.20240311 to 2.31.0.20240403 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/236>
* build(deps-dev): bump types-requests from 2.31.0.20240311 to 2.31.0.20240403 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/237>
* refactor: Align versioning by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/238>
* fix: upgrade to cuda 12.1.1 by [@sakoush](https://github.com/sakoush) in <https://github.com/SeldonIO/llm-runtimes/pull/240>
* OpenAI api default generation kwargs by [@RobertSamoilescu](https://github.com/RobertSamoilescu) in <https://github.com/SeldonIO/llm-runtimes/pull/239>
* fix(local): Don't delete `parameters.extra` by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/241>
* build(deps-dev): bump pytest-cases from 3.8.4 to 3.8.5 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/243>
* build(deps-dev): bump pytest-cases from 3.8.4 to 3.8.5 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/244>
* build(deps-dev): bump pytest-cases from 3.8.4 to 3.8.5 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/245>
* build(deps-dev): bump pytest-cases from 3.8.4 to 3.8.5 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/246>
* feat(local): Allow `kwargs` as serialised JSON string by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/247>
* test(local): Use parameterisation by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/248>
* feat(api): Allow `llm_parameters` as serialised JSON string by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/249>
* build(local): Bump LLMIS version to `0.0.1rc1` by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/226>
* build(ci): Align version check by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/251>
* build(ci): Improve space saving when build images by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/253>
* build(ci): Use correct names and SSH agent by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/254>
* build(deps-dev): bump types-requests from 2.31.0.20240403 to 2.31.0.20240406 in /memory by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/258>
* build(deps-dev): bump types-requests from 2.31.0.20240403 to 2.31.0.20240406 in /api by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/255>
* build(deps-dev): bump types-requests from 2.31.0.20240403 to 2.31.0.20240406 in /local by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/256>
* build(deps-dev): bump types-requests from 2.31.0.20240403 to 2.31.0.20240406 in /deepspeed by [@dependabot](https://github.com/dependabot) in <https://github.com/SeldonIO/llm-runtimes/pull/257>
* build(ci): Migrate from Node 16 by [@jesse-c](https://github.com/jesse-c) in <https://github.com/SeldonIO/llm-runtimes/pull/252>

**Full Changelog**: <https://github.com/SeldonIO/llm-runtimes/commits/0.3.0>

[Changes](https://github.com/SeldonIO/llm-runtimes/tree/0.3.0)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.seldon.ai/llm-module/resources/changelog.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
