Changelog

0.7.0 - 2025-06-06

Features

Added support of agentic workflows via OpenAI API (e.g. tools, planning, reflection patterns).

What's Changed

Update CHANGELOG by @github-actions in #672
Bumped version to 0.6.0 in docs by @RobertSamoilescu in #673
Bumped version to 0.7.0.dev1 by @RobertSamoilescu in #674
Improved rag example by @RobertSamoilescu in #675
build(deps-dev): Bump pytest-asyncio from 0.25.3 to 0.26.0 in /prompt-utils by @dependabot in #677
build(deps-dev): Bump syrupy from 4.9.0 to 4.9.1 in /prompt-utils by @dependabot in #676
Updated changelog for 0.5.0 and 0.6.0 by @RobertSamoilescu in #681
Add files via upload by @paulb-seldon in #683
Fixed async operation on the agentic workflow pipeline by @RobertSamoilescu in #684
Fix database sync for chatbot and agentic workflow by @RobertSamoilescu in #686
Included support for agents implemented with cyclic pipelines by @RobertSamoilescu in #685
Fixed tool args for openai and gemini by @RobertSamoilescu in #688
build(deps-dev): Bump wheel from 0.44.0 to 0.46.1 in /prompt-utils by @dependabot in #682
build(deps-dev): Bump flake8 from 7.1.2 to 7.2.0 in /prompt-utils by @dependabot in #680
build(deps-dev): Bump types-requests from 2.32.0.20250306 to 2.32.0.20250328 in /prompt-utils by @dependabot in #679
Planning agent pattern example by @RobertSamoilescu in #687
add agents docs changes by @paulb-seldon in #689
FIx introduction links by @paulb-seldon in #690
Change email to reach out to us by @paulb-seldon in #691
Bumped MLServer to 1.7.0 by @RobertSamoilescu in #693
Bumped st<4.2.0 and included upper bound for optimum by @RobertSamoilescu in #694
build(deps-dev): Bump types-requests from 2.32.0.20250328 to 2.32.0.20250515 in /prompt-utils by @dependabot in #695
Set release large runner for local runtime by @RobertSamoilescu in #696
Bump MLServer to 1.7.1.rc1 and LLMIS to 0.3.3 by @RobertSamoilescu in #702
ci: Merge change for release 0.7.0 (#702) by @RobertSamoilescu in #703
Fixed docker file for local runtime by @RobertSamoilescu in #704
Fix docs for 0.7.0 by @RobertSamoilescu in #705
Bumped MLServer to 1.7.1 by @RobertSamoilescu in #706
ci: Merge change for release 0.7.0 [2] by @RobertSamoilescu in #707

Full Changelog: https://github.com/SeldonIO/llm-runtimes/compare/0.6.0...0.7.0

0.6.0 - Mar 13, 2025

Main Features

New runtimes:
- Sentence Transformers for text embeddings
- Prompt Runtime - allows to reference the same model deployed locally with different prompts
Support for conditional routing through Core 2 pipelines - this feature allows conditional data flows based on the output of the LLM enabling support for agentic workflows

What's Changed

Replaced local images with the ones from artifacts registry by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/606
Remove redundant server configs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/607
Update Models Overview by @paulb-seldon in https://github.com/SeldonIO/llm-runtimes/pull/609
Quickstart example by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/608
Fixed models and pipelines naming by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/612
Included upperbound on httpx by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/615
Clean docs by @paulb-seldon in https://github.com/SeldonIO/llm-runtimes/pull/613
Remove PVC from docs examples by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/616
Updated images tag to 0.5.0 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/618
Fixed installation test image path by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/619
Cat the pipeline definition before deploying it by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/621
Fixed typo in azure docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/622
Fixed name of vector db env prefix by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/628
Sentence transformers support by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/626
Simplify content when content is a single text message by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/630
Refactored extra settings init by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/631
Passed database as string in get_db_interface by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/633
Fixed memory psql db url by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/634
Prompt runtime for local llms by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/627
Docs for local-embedding runtime by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/632
Renamed url to infer_uri in PromptRuntime by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/641
Included output postprocessing for conditional routing by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/637
Wrote prompt runtime docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/636
Wrote docs for routing by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/643
Renamed routing to agents by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/645
build(deps-dev): Bump black from 24.10.0 to 25.1.0 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/644
build(deps-dev): Bump isort from 5.13.2 to 6.0.0 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/640
build(deps-dev): Bump tox from 4.23.0 to 4.24.1 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/639
build(deps-dev): Bump syrupy from 4.7.2 to 4.8.1 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/635
build(deps-dev): Bump mypy from 1.13.0 to 1.14.1 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/624
build(deps-dev): Bump pytest-asyncio from 0.24.0 to 0.25.3 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/642
Implemented routing for prompt runtime by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/647
Included installation steps for prompt runtime by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/649
Guardrails docs example by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/646
Bumped mlserver to 1.7.0.rc1 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/650
bump LLMIS to 0.3.0 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/651
Workflow for prompt-utils runtime by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/652
Workflow for prompt-utils runtime (#652) by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/653
Fixed capabilities for prompt utils server by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/654
Docs for http sse streaming by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/660
Bumped LLMIS to 0.3.1 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/666
build(deps-dev): Bump flake8 from 7.1.1 to 7.1.2 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/657
build(deps-dev): Bump isort from 6.0.0 to 6.0.1 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/659
build(deps-dev): Bump pytest from 8.3.3 to 8.3.5 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/662
build(deps-dev): Bump types-requests from 2.32.0.20241016 to 2.32.0.20250306 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/663
build(deps-dev): Bump tox from 4.24.1 to 4.24.2 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/664
build(deps-dev): Bump syrupy from 4.8.1 to 4.9.0 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/665
ci: Merge change for release 0.6.0 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/667
Fix docs for 0.6.0.rc3 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/668
Fix docs for 0.6.0.rc3 (#668) by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/669
Bump mlserver to 1.7.0.rc2 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/670
Bump mlserver to 1.7.0.rc2 (#670) by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/671

Full Changelog: https://github.com/SeldonIO/llm-runtimes/compare/0.5.0...0.6.0

0.5.0 - Dec 4, 2024

Main Features

New runtimes:
- Gemini
- VectorDB with support for Qdrant and PGVector
Streaming support for OpenAI and Gemini
Standardized IO accross all runtimes (i.e. prompting)
RAG support for the Local Runtime

What's Changed

Update CHANGELOG by @github-actions in https://github.com/SeldonIO/llm-runtimes/pull/403
Bump openai 1.35.3 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/402
Included streaming support for OpenAI by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/404
build(deps-dev): Bump tox from 4.15.1 to 4.16.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/409
build(deps-dev): Bump tox from 4.15.1 to 4.16.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/408
build(deps-dev): Bump tox from 4.15.1 to 4.16.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/407
build(deps): Bump orjson from 3.10.4 to 3.10.6 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/406
build(deps-dev): Bump mypy from 1.10.0 to 1.10.1 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/398
build(deps-dev): Bump mypy from 1.10.0 to 1.10.1 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/399
build(deps-dev): Bump mypy from 1.10.0 to 1.10.1 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/397
build(deps-dev): Bump types-requests from 2.32.0.20240602 to 2.32.0.20240622 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/396
build(deps-dev): Bump types-requests from 2.32.0.20240602 to 2.32.0.20240622 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/395
build(deps-dev): Bump types-requests from 2.32.0.20240602 to 2.32.0.20240622 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/394
build(deps-dev): Bump flake8 from 7.0.0 to 7.1.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/387
build(deps-dev): Bump flake8 from 7.0.0 to 7.1.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/386
build(deps-dev): Bump flake8 from 7.0.0 to 7.1.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/385
build(deps): Bump urllib3 from 2.2.1 to 2.2.2 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/384
build(deps): Bump urllib3 from 2.2.1 to 2.2.2 in /local/requirements by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/383
Add a DISABLE_CYTHON environment variable for development purposes by @jklaise in https://github.com/SeldonIO/llm-runtimes/pull/411
Do not exclude packages when installing in dev mode without Cython by @jklaise in https://github.com/SeldonIO/llm-runtimes/pull/412
added the create the servers pieces to the documentation by @joshsgoldstein in https://github.com/SeldonIO/llm-runtimes/pull/416
Feat/gemini api runtime by @jklaise in https://github.com/SeldonIO/llm-runtimes/pull/420
Migration to GitBook and General Doc Improvements by @ramonpzg in https://github.com/SeldonIO/llm-runtimes/pull/312
Update dependabot.yml to remove docs scan by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/427
fix: CI failing because because of a change of permission by@sakoush in https://github.com/SeldonIO/llm-runtimes/pull/429
fix: remove docs in version script by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/430
bump versions from 0.3.0 to 0.5.0.dev by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/428
Fix/gemini grpc bytes decoding by @jklaise in https://github.com/SeldonIO/llm-runtimes/pull/434
Add types tensor to Gemini output by @jklaise in https://github.com/SeldonIO/llm-runtimes/pull/435
fix: add security workflow for llm-runtimes by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/442
fix: move workflow to workflows directory by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/443
fix: use one big security scan task by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/444
fix: add build base to local docker image build step by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/458
fix: Update security.yml to exclude our runtimes from 3rd party deps by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/465
Feat/gemini multi modal input by @jklaise in https://github.com/SeldonIO/llm-runtimes/pull/459
Streaming support for Gemini by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/466
Fix setting stream flag for other tasks than completion and chat by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/474
build(deps-dev): Bump black from 24.4.2 to 24.8.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/439
build(deps-dev): Bump black from 24.4.2 to 24.8.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/440
build(deps-dev): Bump black from 24.4.2 to 24.8.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/441
build(deps-dev): Bump flake8 from 7.1.0 to 7.1.1 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/449
Included VectorDB runtime by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/480
build(deps-dev): Bump flake8 from 7.1.0 to 7.1.1 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/445
build(deps-dev): Bump wheel from 0.43.0 to 0.44.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/450
build(deps-dev): Bump flake8 from 7.1.0 to 7.1.1 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/454
build(deps): Bump orjson from 3.10.6 to 3.10.7 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/460
build(deps-dev): Bump mypy from 1.10.1 to 1.11.2 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/473
build(deps-dev): Bump wheel from 0.43.0 to 0.44.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/446
build(deps-dev): Bump black from 24.4.2 to 24.8.0 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/499
build(deps-dev): Bump flake8 from 7.1.0 to 7.1.1 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/497
build(deps-dev): Bump tox from 4.16.0 to 4.18.1 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/481
build(deps-dev): Bump tox from 4.16.0 to 4.18.1 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/498
build(deps-dev): Bump types-requests from 2.32.0.20240622 to 2.32.0.20240907 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/500
build(deps): Update google-generativeai requirement from ==0.7.* to >=0.7,<0.9 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/489
build(deps-dev): Bump tox from 4.16.0 to 4.18.1 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/485
build(deps-dev): Bump pytest-postgresql from 6.0.0 to 6.1.1 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/479
build(deps): Bump sqlalchemy from 2.0.31 to 2.0.34 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/476
build(deps-dev): Bump wheel from 0.43.0 to 0.44.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/452
build(deps-dev): Bump types-requests from 2.32.0.20240622 to 2.32.0.20240907 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/484
build(deps-dev): Bump types-requests from 2.32.0.20240622 to 2.32.0.20240907 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/494
build(deps-dev): Bump mypy from 1.10.1 to 1.11.2 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/472
build(deps-dev): Bump mypy from 1.10.1 to 1.11.2 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/495
build(deps-dev): Bump wheel from 0.43.0 to 0.44.0 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/493
build(deps-dev): Bump tox from 4.16.0 to 4.18.1 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/483
build(deps-dev): Bump types-requests from 2.32.0.20240622 to 2.32.0.20240907 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/482
Update security.yml to include vector-db for code scan by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/502
build(deps-dev): Bump syrupy from 4.6.1 to 4.7.1 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/470
build(deps-dev): Bump mypy from 1.10.1 to 1.11.2 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/471
Bumped MLServer to 1.6.1 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/501
Bumped llmis to 0.2.0 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/503
Included missing files for qdrant by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/506
Included pgvector support by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/505
build(deps-dev): Bump tox from 4.18.1 to 4.19.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/516
build(deps-dev): Bump tox from 4.18.1 to 4.19.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/515
build(deps-dev): Bump tox from 4.18.1 to 4.19.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/514
build(deps-dev): Bump tox from 4.18.1 to 4.19.0 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/513
build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20240914 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/511
build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20240914 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/510
build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20240914 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/508
build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20240914 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/509
build(deps): Bump sqlalchemy from 2.0.34 to 2.0.35 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/512
build(deps-dev): Bump pytest from 7.3.1 to 8.3.3 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/490
build(deps-dev): Bump pytest from 7.3.1 to 8.3.3 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/488
build(deps-dev): Bump pytest from 7.3.1 to 8.3.3 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/487
build(deps-dev): Bump pytest-asyncio from 0.23.7 to 0.24.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/469
build(deps-dev): Bump pytest-asyncio from 0.23.7 to 0.24.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/468
build(deps-dev): Bump pytest-asyncio from 0.23.7 to 0.24.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/517
Prototype/add prompt and preprocessing by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/461
Add gitbook docs to master branch by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/548
Integrate 461 in memory by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/507
Moved prompt-utils in requirements.txt by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/549
Integrate #461 in API runtime by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/486
Prototype integrate 461 in local by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/557
Prototype integrate 461 in vector db by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/564
build(deps-dev): Bump pytest from 7.3.1 to 8.3.3 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/527
build(deps-dev): Bump pytest-asyncio from 0.23.7 to 0.24.0 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/526
build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/528
build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/529
build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/531
build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/532
build(deps-dev): Bump tox from 4.19.0 to 4.21.2 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/544
build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/530
build(deps-dev): Bump tox from 4.19.0 to 4.21.2 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/545
build(deps-dev): Bump types-requests from 2.32.0.20240914 to 2.32.0.20241016 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/565
build(deps-dev): Bump syrupy from 4.7.1 to 4.7.2 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/555
build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20241016 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/566
Rag support for local runtime by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/570
build(deps-dev): Bump tox from 4.19.0 to 4.21.2 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/546
build(deps-dev): Bump black from 24.8.0 to 24.10.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/551
build(deps-dev): Bump black from 24.8.0 to 24.10.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/553
build(deps-dev): Bump tox from 4.19.0 to 4.23.0 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/567
Fix memory system prompt by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/569
Runtime settings consistency by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/572
build(deps-dev): Bump black from 24.8.0 to 24.10.0 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/550
build(deps-dev): Bump syrupy from 4.7.1 to 4.7.2 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/552
build(deps-dev): Bump black from 24.8.0 to 24.10.0 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/554
build(deps): Update qdrant-client requirement from <1.12.0 to <1.13.0 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/556
build(deps-dev): Bump mypy from 1.11.2 to 1.12.0 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/558
build(deps-dev): Bump mypy from 1.11.2 to 1.12.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/559
build(deps-dev): Bump mypy from 1.11.2 to 1.12.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/561
build(deps-dev): Bump tox from 4.18.1 to 4.23.0 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/568
build(deps-dev): Bump mypy from 1.10.1 to 1.13.0 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/574
Default model type initialization in prompt utils by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/573
Renamed mlserver-llm-openai to mlserver-llm-api by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/577
Fix openai setting config by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/578
Updated openai docs with IO by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/579
OpenAI function calling docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/580
IO gemini docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/581
Refactored the azure docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/584
IO memory docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/586
IO vector-db docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/587
IO local docs - chat model by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/588
IO local mms docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/589
IO local chat template by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/591
fix(docs): pull changes from docs-master by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/583
IO local quantization docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/590
IO chat bot by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/592
Fixed tensor names for vector-db example by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/597
Updated azure model settings and image by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/598
Improved local server manifest by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/599
Included wait ready for models, pipelines, and deployments by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/600
Updated and added docs refs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/596
Updated installation docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/601
Removed k8s example by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/602
Docs for monitoring by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/593
Updated runtimes docs intro by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/603
Docs restructure by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/604
ci: Merge change for release 0.5.0 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/605

Full Changelog: https://github.com/SeldonIO/llm-runtimes/commits/0.5.0

0.4.0 - 27 Jun 2024

Main Features

Code transpilation using cython, which allows code obfuscation
Output streaming support for local runtime, which allows faster time to first token back to the user
SQL backend for memory runtime, which allows the use of dbs such as postgres or mysql to store chat messages
Support for MMS, which allows the use of multiple models on the same GPU (limited support)

What's Changed

build(ci): Use a correct example by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/259
docs: Remove link to GitHub repository by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/260
build: Add license by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/261
feat(local): Bump LLMIS to 0.0.1.rc2 by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/263
build(deps): bump idna from 3.6 to 3.7 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/265
docs: Correct link to memory examples by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/262
build(deps-dev): bump black from 24.3.0 to 24.4.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/266
build(deps-dev): bump black from 24.3.0 to 24.4.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/267
build(deps-dev): bump black from 24.3.0 to 24.4.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/269
build(deps-dev): bump black from 24.3.0 to 24.4.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/268
build(deps): bump sphinx-autobuild from 2024.2.4 to 2024.4.13 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/270
Remove duplicate docs examples by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/271
docs(local): Add Core 2 model examples by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/242
docs: Add changelog by @github-actions in https://github.com/SeldonIO/llm-runtimes/pull/272
release: placeholder for promotion action by @RafalSkolasinski in https://github.com/SeldonIO/llm-runtimes/pull/273
build(deps): bump sphinx-autobuild from 2024.4.13 to 2024.4.16 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/275
release: promote images from dev to production registry by @RafalSkolasinski in https://github.com/SeldonIO/llm-runtimes/pull/274
feat(docs): Add e2e example by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/126
Bugfix/add docs pr changes by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/277
build(deps): bump notebook from 7.1.2 to 7.1.3 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/278
fix(local): Remove dupe kwargs setting by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/279
Fix links and MakeFile by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/280
Add minor note on MinIO to chatbot example by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/282
Update the image in the chatbot example doc by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/284
build(deps): bump myst-parser from 2.0.0 to 3.0.0 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/285
fix(deps): Use LLMIS 0.0.1 by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/296
docs: Add installation by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/281
build(deps-dev): bump black from 24.4.0 to 24.4.1 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/294
build(deps-dev): bump mypy from 1.9.0 to 1.10.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/287
build(deps-dev): bump black from 24.4.0 to 24.4.1 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/289
build(deps-dev): bump black from 24.4.0 to 24.4.2 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/297
build(deps-dev): bump mypy from 1.9.0 to 1.10.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/291
build(deps-dev): bump mypy from 1.9.0 to 1.10.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/295
build(deps-dev): bump mypy from 1.9.0 to 1.10.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/290
build(deps): update torch requirement from <2.3.0,>=2.0.1 to >=2.0.1,<2.4.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/293
build(deps-dev): bump black from 24.4.0 to 24.4.2 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/299
Update CHANGELOG by @github-actions in https://github.com/SeldonIO/llm-runtimes/pull/300
Bump version to 0.3.0 by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/301
docs(api): Fix header level by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/302
Add section at end of e2e example for changing api to local rt by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/283
Wholesale docs edits pb by @paulb-seldon in https://github.com/SeldonIO/llm-runtimes/pull/298
run CI tests only for version of Python that is used within the Docker image by @RafalSkolasinski in https://github.com/SeldonIO/llm-runtimes/pull/303
deprecate and archive deepspeed by @RafalSkolasinski in https://github.com/SeldonIO/llm-runtimes/pull/304
fix: Remove references to removed runtime by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/313
Update API examples for docker access and kubernetes deployment by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/321
Included local runtime limitations section. by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/320
use {{ current-version }} in example manifest by @RafalSkolasinski in https://github.com/SeldonIO/llm-runtimes/pull/319
Feature/add prompt note by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/324
Docs example for quantization and tensor parallelism by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/323
Add local model loading from settings relative path by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/329
docs: Update docs for azure openai by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/326
build(deps): bump notebook from 7.1.3 to 7.2.0 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/330
build(deps): bump jinja2 from 3.1.3 to 3.1.4 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/327
build(deps): bump myst-parser from 3.0.0 to 3.0.1 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/318
build(deps-dev): bump pytest from 8.1.1 to 8.2.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/314
build(deps-dev): bump pytest from 8.1.1 to 8.2.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/315
build(deps-dev): bump pytest from 8.1.1 to 8.2.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/316
build(deps-dev): bump tox from 4.14.2 to 4.15.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/310
build(deps-dev): bump tox from 4.14.2 to 4.15.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/306
build(deps-dev): bump black from 24.4.1 to 24.4.2 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/308
build(deps-dev): bump tox from 4.14.2 to 4.15.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/307
build(deps): bump requests from 2.31.0 to 2.32.0 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/337
build(deps-dev): bump pytest-asyncio from 0.23.6 to 0.23.7 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/336
build(deps-dev): bump pytest-asyncio from 0.23.6 to 0.23.7 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/334
build(deps-dev): bump pytest-asyncio from 0.23.6 to 0.23.7 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/332
build(deps-dev): bump pytest from 8.2.0 to 8.2.1 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/331
build(deps-dev): bump pytest from 8.2.0 to 8.2.1 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/333
build(deps-dev): bump pytest from 8.2.0 to 8.2.1 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/335
build(deps-dev): Bump types-requests from 2.31.0.20240406 to 2.32.0.20240521 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/339
build(deps-dev): Bump types-requests from 2.31.0.20240406 to 2.32.0.20240521 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/340
build(deps-dev): Bump types-requests from 2.31.0.20240406 to 2.32.0.20240521 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/338
feat(memory): Add postgres backend by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/134
Included documentation and examples for mms by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/347
build(deps-dev): Bump types-requests from 2.32.0.20240521 to 2.32.0.20240523 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/344
build(deps-dev): Bump types-requests from 2.32.0.20240521 to 2.32.0.20240523 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/346
build(deps): Bump sphinx-design from 0.5.0 to 0.6.0 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/342
Pin open telemetry sdk package to fix ci by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/353
Ensure db portability in memory rt by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/352
build(deps-dev): Bump types-requests from 2.32.0.20240523 to 2.32.0.20240602 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/359
build(deps-dev): Bump types-requests from 2.32.0.20240521 to 2.32.0.20240602 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/360
build(deps-dev): Bump types-requests from 2.32.0.20240523 to 2.32.0.20240602 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/362
Update incorrect prompt settings in model-settings.json cell by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/357
build(deps): Bump opentelemetry-sdk from 1.24.0 to 1.25.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/355
build(deps): Bump opentelemetry-sdk from 1.24.0 to 1.25.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/356
Revert "build(deps): Bump opentelemetry-sdk from 1.24.0 to 1.25.0 in /memory" by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/363
Revert "build(deps): Bump opentelemetry-sdk from 1.24.0 to 1.25.0 in /api" by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/364
Remove pinned opentelemetry dep for memory, api and local rt by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/365
build(deps-dev): Bump pytest from 8.2.1 to 8.2.2 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/368
build(deps-dev): Bump pytest from 8.2.1 to 8.2.2 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/367
build(deps-dev): Bump pytest from 8.2.1 to 8.2.2 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/366
SQL backend docs by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/358
docs(local): Add output for example by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/369
docs(local): Fix dupe image field by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/370
Migrate to pydantic v2 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/349
Bumped mlserver to 1.6.0.dev2 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/380
Cython support for llm-runtimes by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/354
Streaming support for local runtime by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/351
build(deps-dev): Bump tox from 4.15.0 to 4.15.1 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/375
build(deps-dev): Bump tox from 4.15.0 to 4.15.1 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/373
build(deps-dev): Bump tox from 4.15.0 to 4.15.1 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/372
build(deps): Bump orjson from 3.10.3 to 3.10.4 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/381
build(deps): Bump tornado from 6.4 to 6.4.1 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/374
build(deps): Bump notebook from 7.2.0 to 7.2.1 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/377
Rename db option by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/371
Bumped llmis to 0.1.1 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/389
use mlserver 1.6.0.rc1 by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/390
build(deps): Bump sqlalchemy from 2.0.30 to 2.0.31 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/388
ci: Merge changes from master to release branch by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/391
Update tests.yml by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/392
ci: Merge changes from master to release branch by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/393
use mlserver 1.6.0 by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/400
ci: Merge changes from master to release branch by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/401

New Contributors

@paulb-seldon made their first contribution in https://github.com/SeldonIO/llm-runtimes/pull/298

Full Changelog: https://github.com/SeldonIO/llm-runtimes/commits/0.4.0

Changes

0.3.0 - 26 Apr 2024

Overview

Content

This is the initial release of Seldon’s LLM Module. The LLM Module provides a set of MLServer runtimes to enable serving and deployment of large language (and other GenAI) model applications using Seldon Core v2 models and pipelines. The initial set of runtimes includes an API gateway to access LLM solutions hosted by 3rd parties, a Local self-hosted for on-prem LLM deployment, and Conversational Memory to build stateful LLM applications.

API

There are many exciting companies who provide access to LLMs as a service. They all vary in their APIs. This runtime provides a unified way to access them, starting with OpenAI. This requires egress access to OpenAI’s endpoints.

Local

You can use both open foundational or fine-tuned models, such as those from Mistral and Meta, or your own custom models, trained from scratch. There’s many different performance characteristics of the different ways to run these models. We provide a unified way to access leading backends, including Transformers, vLLM, and DeepSpeed.

Conversational Memory

This facilitates the building of stateful chatbots that save conversational history. With this, you’re able to memorise (aka store) conversations long-term. This means that the context is kept and is able to be used by the different models through the API and Local runtimes.

We’ve carefully benchmarked each runtime to ensure that there is minimal overhead, and in some circumstances, have made our own improvements on top of the supported backends.

What's Changed

Add tox tests and github workflow by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/2
Updates to allow runtime to be built for MLServer by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/4
Refactor openai runtime directory name by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/5
Deepspeed runtime by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/8
Initial commit for Microsoft Guidance Runtime by @ukclivecox in https://github.com/SeldonIO/llm-runtimes/pull/9
Update workflow to add linting for Guidance by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/12
Update notebook and readme by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/13
Update notebook (2) by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/14
Add mms notebook for deepspeed by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/15
Change title of MMS notebook by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/16
Update deepspeed mii to 0.0.6 by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/21
Return root error if failing to call openai endpoints by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/25
Add MS azure openai support by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/23
Update packages (mlserver 1.4.rc6 and mii 0.0.8) by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/29
Add parsing json string for HF params by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/31
Add extra args in inference config to be processed by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/35
Remove reference to apache2 in codebase by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/36
Openai images generation model fix + other small improvements by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/38
CI for building docker images by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/40
Add ability to specify version for deepspeed image by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/41
Fix bytes payload in openai runtime by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/43
upgrade deepspeed mii to 0.1.0 by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/46
Remove guidance by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/50
Release workflow by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/51
CI fix using envar by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/52
Adjust envar CI by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/53
CI fix: add env to name for action by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/54
Feature/add periodic ci and dependabot by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/58
Bump types-requests from 2.28.11.5 to 2.31.0.20240125 in /llm-api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/80
Bump types-requests from 2.28.11.5 to 2.31.0.20240125 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/79
Bump pytest-mock from 3.10.0 to 3.12.0 in /llm-api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/75
Bump pytest-cases from 3.6.14 to 3.8.2 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/71
Bump pytest-asyncio from 0.21.0 to 0.23.3 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/67
Bump deepspeed-mii from 0.1.0 to 0.2.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/66
Bump mypy from 1.2.0 to 1.8.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/62
Bump flake8 from 6.0.0 to 7.0.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/64
Bump flake8 from 6.0.0 to 7.0.0 in /llm-api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/74
Update dependabot.yml to include dockerfiles by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/85
Bump pytest-asyncio from 0.23.3 to 0.23.4 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/88
Update torch requirement from ~=2.0.1 to >=2.0.1,<2.2.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/86
Bump pytest from 7.3.1 to 8.0.0 in /llm-api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/83
Bump pytest-mock from 3.10.0 to 3.12.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/60
Bump mypy from 1.2.0 to 1.8.0 in /llm-api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/76
Bump pytest-cases from 3.6.14 to 3.8.2 in /llm-api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/69
docs(readme): Correct spelling mistake by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/96
[File System] Conversational Buffer Memory Runtime by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/56
Compatibility changes to openai-runtime for memory runtime requirements by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/55
fix: Use consistent package naming by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/103
fix(build): Missed folder name for prefix change by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/104
Bump pytest-asyncio from 0.23.4 to 0.23.5 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/99
Add pydantic validation for api base runtime by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/112
build(ci): Migrate from Node 16 actions to at least Node 20 by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/113
build(ci): Correct disk usage maximising values by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/114
Bump types-requests from 2.31.0.20240125 to 2.31.0.20240218 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/110
Bump black from 23.3.0 to 24.2.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/101
Bump pytest from 7.3.1 to 8.0.1 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/108
Update torch requirement from <2.2.0,>=2.0.1 to >=2.0.1,<2.3.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/93
Bump deepspeed-mii from 0.2.0 to 0.2.2 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/106
build(dependabot): Correct the config and update DeepSeed README by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/116
fix(api): Use latest formatting by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/115
Bump pytest-asyncio from 0.21.0 to 0.23.5 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/120
Bump pytest from 8.0.0 to 8.0.1 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/124
Bump types-requests from 2.31.0.20240125 to 2.31.0.20240218 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/123
Remove depreciated edits endpoint by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/125
Fix incorrect window size on bulk upload of messages by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/129
Bump seldonio/mlserver from 1.4.0.rc5-slim to 1.4.0-slim in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/131
Bump seldonio/mlserver from 1.4.0.rc5-slim to 1.4.0-slim in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/130
Bump pytest from 8.0.1 to 8.0.2 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/128
Bump pytest from 8.0.1 to 8.0.2 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/127
build: Bump to latest MLServer release by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/133
Bump seldonio/mlserver from 1.4.0-slim to 1.5.0-slim in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/137
Bump seldonio/mlserver from 1.4.0-slim to 1.5.0-slim in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/136
Add memory docs by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/107
Bump opentelemetry-instrumentation from 0.39b0 to 0.41b0 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/140
Bump starlette from 0.27.0 to 0.36.2 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/141
Bump pytest-asyncio from 0.23.5 to 0.23.5.post1 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/143
Bump pytest-cases from 3.8.2 to 3.8.3 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/144
Bump mypy from 1.8.0 to 1.9.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/145
Bump pytest-cases from 3.8.2 to 3.8.3 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/146
Bump pytest-asyncio from 0.23.5 to 0.23.5.post1 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/147
Bump deepspeed-mii from 0.2.2 to 0.2.3 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/149
Bump fastapi from 0.97.0 to 0.109.1 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/142
Bump mypy from 1.8.0 to 1.9.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/148
feat(local): Initial support by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/100
feat(memory):Refactor filesys interface by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/154
feat(docs): Add OpenAI runtime docs by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/109
feat(local): Add prompting by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/155
feat(local): Add text generation by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/156
fix(local): Remove uneeded files by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/159
build(ci): Fix testing by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/158
build(deps-dev): bump black from 24.2.0 to 24.3.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/163
fix(local): Correctly pass parameters and check for lengths by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/161
build(deps-dev): bump black from 24.2.0 to 24.3.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/162
Bump types-requests from 2.31.0.20240218 to 2.31.0.20240311 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/150
Bump types-requests from 2.31.0.20240218 to 2.31.0.20240311 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/153
refactor(local): Support choosing LLMIS runtimes by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/165
refactor(local): Use async generation by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/166
build(ci): Add missing images by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/168
fix(deps): Add dependabot workflow + upgrade mlserver to 1.5 by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/171
build(deps-dev): bump tox from 4.4.12 to 4.14.1 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/189
build(deps-dev): bump mypy from 1.8.0 to 1.9.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/183
build(deps-dev): Bump to latest for local by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/190
feat(local): Add vLLM and DeepSpeed examples and tests by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/191
refactor(local): Don't require the model ready to unload by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/192
fix(local): Remove unused function stub by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/193
docs: Correct install target by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/198
docs(local): Update README with newer examples by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/199
fix(local): Pin LLMIS commit by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/200
refactor(local): Share response collection by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/194
refactor(local): Return correct error types by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/195
refactor(local): Appropriately log for debugging by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/196
feat(local): relative prompt path in local rt model-settings by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/203
Rename memory image by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/208
fix(ci): fix GH worker space issue by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/209
fix(build): fix failing build for local runtime by @RafalSkolasinski in https://github.com/SeldonIO/llm-runtimes/pull/202
fix: upgrade image to cuda 12.1 by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/214
build(deps-dev): bump pytest-mock from 3.12.0 to 3.14.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/205
build(deps-dev): bump tox from 4.14.1 to 4.14.2 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/213
build(deps-dev): bump mypy from 1.2.0 to 1.9.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/179
Bump pytest from 8.0.2 to 8.1.1 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/151
build(deps-dev): bump tox from 4.4.12 to 4.14.2 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/212
Bump pytest from 8.0.2 to 8.1.1 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/152
build(deps-dev): bump pytest-asyncio from 0.23.5.post1 to 0.23.6 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/172
build(deps-dev): bump pytest-cases from 3.6.14 to 3.8.4 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/178
build(deps-dev): bump pytest-asyncio from 0.23.5.post1 to 0.23.6 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/180
build(deps-dev): bump black from 23.3.0 to 24.3.0 in /memory/requirements by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/201
build(deps-dev): bump pytest-cases from 3.8.3 to 3.8.4 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/169
build(deps-dev): bump pytest-asyncio from 0.21.0 to 0.23.6 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/182
build(deps-dev): bump pytest-mock from 3.12.0 to 3.14.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/204
build(deps-dev): bump pytest from 7.3.1 to 8.1.1 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/177
build(deps-dev): bump pytest-mock from 3.12.0 to 3.14.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/207
build(deps-dev): bump flake8 from 6.0.0 to 7.0.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/174
build(deps-dev): bump pytest-cases from 3.8.3 to 3.8.4 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/170
build(deps-dev): bump tox from 4.4.12 to 4.14.2 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/210
build(deps-dev): bump types-requests from 2.28.11.5 to 2.31.0.20240311 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/181
build(deps-dev): bump tox from 4.4.12 to 4.14.2 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/211
build(deps-dev): bump pytest-mock from 3.10.0 to 3.14.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/206
build(local): Remove Pytorch dev dependency by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/217
build(docs): Add Dependabot by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/218
docs: New structure and Local content by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/216
build(deps): bump notebook from 7.1.1 to 7.1.2 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/220
build: Use wheel for distribution by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/222
build(docs): Ignore common virtualenv directories by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/223
docs: Continue restructure and add reference for Local by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/224
docs(local): Add inference requests and backends' model settings by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/225
docs: Add analytics, fix quoted sidebar, and add versions by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/227
fix(local): Allow Pydantic models' validators reuse by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/228
docs: Add ReadTheDocs config by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/233
build(deps-dev): bump types-requests from 2.31.0.20240311 to 2.31.0.20240403 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/234
build(deps-dev): bump types-requests from 2.31.0.20240311 to 2.31.0.20240403 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/235
build(deps-dev): bump types-requests from 2.31.0.20240311 to 2.31.0.20240403 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/236
build(deps-dev): bump types-requests from 2.31.0.20240311 to 2.31.0.20240403 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/237
refactor: Align versioning by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/238
fix: upgrade to cuda 12.1.1 by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/240
OpenAI api default generation kwargs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/239
fix(local): Don't delete parameters.extra by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/241
build(deps-dev): bump pytest-cases from 3.8.4 to 3.8.5 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/243
build(deps-dev): bump pytest-cases from 3.8.4 to 3.8.5 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/244
build(deps-dev): bump pytest-cases from 3.8.4 to 3.8.5 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/245
build(deps-dev): bump pytest-cases from 3.8.4 to 3.8.5 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/246
feat(local): Allow kwargs as serialised JSON string by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/247
test(local): Use parameterisation by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/248
feat(api): Allow llm_parameters as serialised JSON string by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/249
build(local): Bump LLMIS version to 0.0.1rc1 by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/226
build(ci): Align version check by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/251
build(ci): Improve space saving when build images by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/253
build(ci): Use correct names and SSH agent by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/254
build(deps-dev): bump types-requests from 2.31.0.20240403 to 2.31.0.20240406 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/258
build(deps-dev): bump types-requests from 2.31.0.20240403 to 2.31.0.20240406 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/255
build(deps-dev): bump types-requests from 2.31.0.20240403 to 2.31.0.20240406 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/256
build(deps-dev): bump types-requests from 2.31.0.20240403 to 2.31.0.20240406 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/257
build(ci): Migrate from Node 16 by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/252

Full Changelog: https://github.com/SeldonIO/llm-runtimes/commits/0.3.0

Changes

PreviousPromptUtils

Last updated 2 months ago

Was this helpful?