githubEdit

Changelog

0.7.0arrow-up-right - 2025-06-06

Features

  • Added support of agentic workflows via OpenAI API (e.g. tools, planning, reflection patterns).

What's Changed

Full Changelog: https://github.com/SeldonIO/llm-runtimes/compare/0.6.0...0.7.0

0.6.0arrow-up-right - Mar 13, 2025

Main Features

  • New runtimes:

    • Sentence Transformers for text embeddings

    • Prompt Runtime - allows to reference the same model deployed locally with different prompts

  • Support for conditional routing through Core 2 pipelines - this feature allows conditional data flows based on the output of the LLM enabling support for agentic workflows

What's Changed

Full Changelog: https://github.com/SeldonIO/llm-runtimes/compare/0.5.0...0.6.0

0.5.0arrow-up-right - Dec 4, 2024

Main Features

  • New runtimes:

  • Streaming support for OpenAI and Gemini

  • Standardized IO accross all runtimes (i.e. prompting)

  • RAG support for the Local Runtime

What's Changed

  • Update CHANGELOG by @github-actionsarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/403

  • Bump openai 1.35.3 by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/402

  • Included streaming support for OpenAI by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/404

  • build(deps-dev): Bump tox from 4.15.1 to 4.16.0 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/409

  • build(deps-dev): Bump tox from 4.15.1 to 4.16.0 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/408

  • build(deps-dev): Bump tox from 4.15.1 to 4.16.0 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/407

  • build(deps): Bump orjson from 3.10.4 to 3.10.6 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/406

  • build(deps-dev): Bump mypy from 1.10.0 to 1.10.1 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/398

  • build(deps-dev): Bump mypy from 1.10.0 to 1.10.1 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/399

  • build(deps-dev): Bump mypy from 1.10.0 to 1.10.1 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/397

  • build(deps-dev): Bump types-requests from 2.32.0.20240602 to 2.32.0.20240622 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/396

  • build(deps-dev): Bump types-requests from 2.32.0.20240602 to 2.32.0.20240622 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/395

  • build(deps-dev): Bump types-requests from 2.32.0.20240602 to 2.32.0.20240622 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/394

  • build(deps-dev): Bump flake8 from 7.0.0 to 7.1.0 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/387

  • build(deps-dev): Bump flake8 from 7.0.0 to 7.1.0 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/386

  • build(deps-dev): Bump flake8 from 7.0.0 to 7.1.0 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/385

  • build(deps): Bump urllib3 from 2.2.1 to 2.2.2 in /docs by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/384

  • build(deps): Bump urllib3 from 2.2.1 to 2.2.2 in /local/requirements by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/383

  • Add a DISABLE_CYTHON environment variable for development purposes by @jklaise in https://github.com/SeldonIO/llm-runtimes/pull/411

  • Do not exclude packages when installing in dev mode without Cython by @jklaise in https://github.com/SeldonIO/llm-runtimes/pull/412

  • added the create the servers pieces to the documentation by @joshsgoldstein in https://github.com/SeldonIO/llm-runtimes/pull/416

  • Feat/gemini api runtime by @jklaise in https://github.com/SeldonIO/llm-runtimes/pull/420

  • Migration to GitBook and General Doc Improvements by @ramonpzg in https://github.com/SeldonIO/llm-runtimes/pull/312

  • Update dependabot.yml to remove docs scan by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/427

  • fix: CI failing because because of a change of permission by@sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/429

  • fix: remove docs in version script by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/430

  • bump versions from 0.3.0 to 0.5.0.dev by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/428

  • Fix/gemini grpc bytes decoding by @jklaise in https://github.com/SeldonIO/llm-runtimes/pull/434

  • Add types tensor to Gemini output by @jklaise in https://github.com/SeldonIO/llm-runtimes/pull/435

  • fix: add security workflow for llm-runtimes by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/442

  • fix: move workflow to workflows directory by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/443

  • fix: use one big security scan task by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/444

  • fix: add build base to local docker image build step by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/458

  • fix: Update security.yml to exclude our runtimes from 3rd party deps by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/465

  • Feat/gemini multi modal input by @jklaise in https://github.com/SeldonIO/llm-runtimes/pull/459

  • Streaming support for Gemini by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/466

  • Fix setting stream flag for other tasks than completion and chat by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/474

  • build(deps-dev): Bump black from 24.4.2 to 24.8.0 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/439

  • build(deps-dev): Bump black from 24.4.2 to 24.8.0 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/440

  • build(deps-dev): Bump black from 24.4.2 to 24.8.0 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/441

  • build(deps-dev): Bump flake8 from 7.1.0 to 7.1.1 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/449

  • Included VectorDB runtime by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/480

  • build(deps-dev): Bump flake8 from 7.1.0 to 7.1.1 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/445

  • build(deps-dev): Bump wheel from 0.43.0 to 0.44.0 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/450

  • build(deps-dev): Bump flake8 from 7.1.0 to 7.1.1 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/454

  • build(deps): Bump orjson from 3.10.6 to 3.10.7 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/460

  • build(deps-dev): Bump mypy from 1.10.1 to 1.11.2 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/473

  • build(deps-dev): Bump wheel from 0.43.0 to 0.44.0 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/446

  • build(deps-dev): Bump black from 24.4.2 to 24.8.0 in /vector-db by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/499

  • build(deps-dev): Bump flake8 from 7.1.0 to 7.1.1 in /vector-db by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/497

  • build(deps-dev): Bump tox from 4.16.0 to 4.18.1 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/481

  • build(deps-dev): Bump tox from 4.16.0 to 4.18.1 in /vector-db by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/498

  • build(deps-dev): Bump types-requests from 2.32.0.20240622 to 2.32.0.20240907 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/500

  • build(deps): Update google-generativeai requirement from ==0.7.* to >=0.7,<0.9 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/489

  • build(deps-dev): Bump tox from 4.16.0 to 4.18.1 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/485

  • build(deps-dev): Bump pytest-postgresql from 6.0.0 to 6.1.1 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/479

  • build(deps): Bump sqlalchemy from 2.0.31 to 2.0.34 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/476

  • build(deps-dev): Bump wheel from 0.43.0 to 0.44.0 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/452

  • build(deps-dev): Bump types-requests from 2.32.0.20240622 to 2.32.0.20240907 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/484

  • build(deps-dev): Bump types-requests from 2.32.0.20240622 to 2.32.0.20240907 in /vector-db by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/494

  • build(deps-dev): Bump mypy from 1.10.1 to 1.11.2 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/472

  • build(deps-dev): Bump mypy from 1.10.1 to 1.11.2 in /vector-db by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/495

  • build(deps-dev): Bump wheel from 0.43.0 to 0.44.0 in /vector-db by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/493

  • build(deps-dev): Bump tox from 4.16.0 to 4.18.1 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/483

  • build(deps-dev): Bump types-requests from 2.32.0.20240622 to 2.32.0.20240907 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/482

  • Update security.yml to include vector-db for code scan by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/502

  • build(deps-dev): Bump syrupy from 4.6.1 to 4.7.1 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/470

  • build(deps-dev): Bump mypy from 1.10.1 to 1.11.2 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/471

  • Bumped MLServer to 1.6.1 by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/501

  • Bumped llmis to 0.2.0 by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/503

  • Included missing files for qdrant by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/506

  • Included pgvector support by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/505

  • build(deps-dev): Bump tox from 4.18.1 to 4.19.0 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/516

  • build(deps-dev): Bump tox from 4.18.1 to 4.19.0 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/515

  • build(deps-dev): Bump tox from 4.18.1 to 4.19.0 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/514

  • build(deps-dev): Bump tox from 4.18.1 to 4.19.0 in /vector-db by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/513

  • build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20240914 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/511

  • build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20240914 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/510

  • build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20240914 in /vector-db by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/508

  • build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20240914 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/509

  • build(deps): Bump sqlalchemy from 2.0.34 to 2.0.35 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/512

  • build(deps-dev): Bump pytest from 7.3.1 to 8.3.3 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/490

  • build(deps-dev): Bump pytest from 7.3.1 to 8.3.3 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/488

  • build(deps-dev): Bump pytest from 7.3.1 to 8.3.3 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/487

  • build(deps-dev): Bump pytest-asyncio from 0.23.7 to 0.24.0 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/469

  • build(deps-dev): Bump pytest-asyncio from 0.23.7 to 0.24.0 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/468

  • build(deps-dev): Bump pytest-asyncio from 0.23.7 to 0.24.0 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/517

  • Prototype/add prompt and preprocessing by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/461

  • Add gitbook docs to master branch by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/548

  • Integrate 461 in memory by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/507

  • Moved prompt-utils in requirements.txt by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/549

  • Integrate #461 in API runtime by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/486

  • Prototype integrate 461 in local by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/557

  • Prototype integrate 461 in vector db by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/564

  • build(deps-dev): Bump pytest from 7.3.1 to 8.3.3 in /prompt-utils by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/527

  • build(deps-dev): Bump pytest-asyncio from 0.23.7 to 0.24.0 in /prompt-utils by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/526

  • build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /vector-db by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/528

  • build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/529

  • build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /prompt-utils by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/531

  • build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/532

  • build(deps-dev): Bump tox from 4.19.0 to 4.21.2 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/544

  • build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/530

  • build(deps-dev): Bump tox from 4.19.0 to 4.21.2 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/545

  • build(deps-dev): Bump types-requests from 2.32.0.20240914 to 2.32.0.20241016 in /vector-db by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/565

  • build(deps-dev): Bump syrupy from 4.7.1 to 4.7.2 in /prompt-utils by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/555

  • build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20241016 in /prompt-utils by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/566

  • Rag support for local runtime by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/570

  • build(deps-dev): Bump tox from 4.19.0 to 4.21.2 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/546

  • build(deps-dev): Bump black from 24.8.0 to 24.10.0 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/551

  • build(deps-dev): Bump black from 24.8.0 to 24.10.0 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/553

  • build(deps-dev): Bump tox from 4.19.0 to 4.23.0 in /vector-db by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/567

  • Fix memory system prompt by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/569

  • Runtime settings consistency by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/572

  • build(deps-dev): Bump black from 24.8.0 to 24.10.0 in /vector-db by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/550

  • build(deps-dev): Bump syrupy from 4.7.1 to 4.7.2 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/552

  • build(deps-dev): Bump black from 24.8.0 to 24.10.0 in /prompt-utils by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/554

  • build(deps): Update qdrant-client requirement from <1.12.0 to <1.13.0 in /vector-db by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/556

  • build(deps-dev): Bump mypy from 1.11.2 to 1.12.0 in /vector-db by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/558

  • build(deps-dev): Bump mypy from 1.11.2 to 1.12.0 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/559

  • build(deps-dev): Bump mypy from 1.11.2 to 1.12.0 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/561

  • build(deps-dev): Bump tox from 4.18.1 to 4.23.0 in /prompt-utils by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/568

  • build(deps-dev): Bump mypy from 1.10.1 to 1.13.0 in /prompt-utils by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/574

  • Default model type initialization in prompt utils by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/573

  • Renamed mlserver-llm-openai to mlserver-llm-api by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/577

  • Fix openai setting config by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/578

  • Updated openai docs with IO by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/579

  • OpenAI function calling docs by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/580

  • IO gemini docs by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/581

  • Refactored the azure docs by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/584

  • IO memory docs by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/586

  • IO vector-db docs by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/587

  • IO local docs - chat model by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/588

  • IO local mms docs by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/589

  • IO local chat template by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/591

  • fix(docs): pull changes from docs-master by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/583

  • IO local quantization docs by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/590

  • IO chat bot by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/592

  • Fixed tensor names for vector-db example by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/597

  • Updated azure model settings and image by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/598

  • Improved local server manifest by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/599

  • Included wait ready for models, pipelines, and deployments by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/600

  • Updated and added docs refs by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/596

  • Updated installation docs by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/601

  • Removed k8s example by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/602

  • Docs for monitoring by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/593

  • Updated runtimes docs intro by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/603

  • Docs restructure by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/604

  • ci: Merge change for release 0.5.0 by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/605

Full Changelog: https://github.com/SeldonIO/llm-runtimes/commits/0.5.0

0.4.0arrow-up-right - 27 Jun 2024

Main Features

  • Code transpilation using cython, which allows code obfuscation

  • Output streaming support for local runtime, which allows faster time to first token back to the user

  • SQL backend for memory runtime, which allows the use of dbs such as postgres or mysql to store chat messages

  • Support for MMS, which allows the use of multiple models on the same GPU (limited support)

What's Changed

  • build(ci): Use a correct example by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/259

  • docs: Remove link to GitHub repository by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/260

  • build: Add license by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/261

  • feat(local): Bump LLMIS to 0.0.1.rc2 by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/263

  • build(deps): bump idna from 3.6 to 3.7 in /docs by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/265

  • docs: Correct link to memory examples by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/262

  • build(deps-dev): bump black from 24.3.0 to 24.4.0 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/266

  • build(deps-dev): bump black from 24.3.0 to 24.4.0 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/267

  • build(deps-dev): bump black from 24.3.0 to 24.4.0 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/269

  • build(deps-dev): bump black from 24.3.0 to 24.4.0 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/268

  • build(deps): bump sphinx-autobuild from 2024.2.4 to 2024.4.13 in /docs by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/270

  • Remove duplicate docs examples by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/271

  • docs(local): Add Core 2 model examples by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/242

  • docs: Add changelog by @github-actionsarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/272

  • release: placeholder for promotion action by @RafalSkolasinskiarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/273

  • build(deps): bump sphinx-autobuild from 2024.4.13 to 2024.4.16 in /docs by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/275

  • release: promote images from dev to production registry by @RafalSkolasinskiarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/274

  • feat(docs): Add e2e example by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/126

  • Bugfix/add docs pr changes by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/277

  • build(deps): bump notebook from 7.1.2 to 7.1.3 in /docs by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/278

  • fix(local): Remove dupe kwargs setting by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/279

  • Fix links and MakeFile by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/280

  • Add minor note on MinIO to chatbot example by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/282

  • Update the image in the chatbot example doc by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/284

  • build(deps): bump myst-parser from 2.0.0 to 3.0.0 in /docs by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/285

  • fix(deps): Use LLMIS 0.0.1 by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/296

  • docs: Add installation by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/281

  • build(deps-dev): bump black from 24.4.0 to 24.4.1 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/294

  • build(deps-dev): bump mypy from 1.9.0 to 1.10.0 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/287

  • build(deps-dev): bump black from 24.4.0 to 24.4.1 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/289

  • build(deps-dev): bump black from 24.4.0 to 24.4.2 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/297

  • build(deps-dev): bump mypy from 1.9.0 to 1.10.0 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/291

  • build(deps-dev): bump mypy from 1.9.0 to 1.10.0 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/295

  • build(deps-dev): bump mypy from 1.9.0 to 1.10.0 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/290

  • build(deps): update torch requirement from <2.3.0,>=2.0.1 to >=2.0.1,<2.4.0 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/293

  • build(deps-dev): bump black from 24.4.0 to 24.4.2 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/299

  • Update CHANGELOG by @github-actionsarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/300

  • Bump version to 0.3.0 by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/301

  • docs(api): Fix header level by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/302

  • Add section at end of e2e example for changing api to local rt by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/283

  • Wholesale docs edits pb by @paulb-seldonarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/298

  • run CI tests only for version of Python that is used within the Docker image by @RafalSkolasinskiarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/303

  • deprecate and archive deepspeed by @RafalSkolasinskiarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/304

  • fix: Remove references to removed runtime by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/313

  • Update API examples for docker access and kubernetes deployment by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/321

  • Included local runtime limitations section. by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/320

  • use {{ current-version }} in example manifest by @RafalSkolasinskiarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/319

  • Feature/add prompt note by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/324

  • Docs example for quantization and tensor parallelism by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/323

  • Add local model loading from settings relative path by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/329

  • docs: Update docs for azure openai by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/326

  • build(deps): bump notebook from 7.1.3 to 7.2.0 in /docs by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/330

  • build(deps): bump jinja2 from 3.1.3 to 3.1.4 in /docs by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/327

  • build(deps): bump myst-parser from 3.0.0 to 3.0.1 in /docs by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/318

  • build(deps-dev): bump pytest from 8.1.1 to 8.2.0 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/314

  • build(deps-dev): bump pytest from 8.1.1 to 8.2.0 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/315

  • build(deps-dev): bump pytest from 8.1.1 to 8.2.0 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/316

  • build(deps-dev): bump tox from 4.14.2 to 4.15.0 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/310

  • build(deps-dev): bump tox from 4.14.2 to 4.15.0 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/306

  • build(deps-dev): bump black from 24.4.1 to 24.4.2 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/308

  • build(deps-dev): bump tox from 4.14.2 to 4.15.0 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/307

  • build(deps): bump requests from 2.31.0 to 2.32.0 in /docs by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/337

  • build(deps-dev): bump pytest-asyncio from 0.23.6 to 0.23.7 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/336

  • build(deps-dev): bump pytest-asyncio from 0.23.6 to 0.23.7 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/334

  • build(deps-dev): bump pytest-asyncio from 0.23.6 to 0.23.7 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/332

  • build(deps-dev): bump pytest from 8.2.0 to 8.2.1 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/331

  • build(deps-dev): bump pytest from 8.2.0 to 8.2.1 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/333

  • build(deps-dev): bump pytest from 8.2.0 to 8.2.1 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/335

  • build(deps-dev): Bump types-requests from 2.31.0.20240406 to 2.32.0.20240521 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/339

  • build(deps-dev): Bump types-requests from 2.31.0.20240406 to 2.32.0.20240521 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/340

  • build(deps-dev): Bump types-requests from 2.31.0.20240406 to 2.32.0.20240521 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/338

  • feat(memory): Add postgres backend by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/134

  • Included documentation and examples for mms by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/347

  • build(deps-dev): Bump types-requests from 2.32.0.20240521 to 2.32.0.20240523 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/344

  • build(deps-dev): Bump types-requests from 2.32.0.20240521 to 2.32.0.20240523 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/346

  • build(deps): Bump sphinx-design from 0.5.0 to 0.6.0 in /docs by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/342

  • Pin open telemetry sdk package to fix ci by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/353

  • Ensure db portability in memory rt by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/352

  • build(deps-dev): Bump types-requests from 2.32.0.20240523 to 2.32.0.20240602 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/359

  • build(deps-dev): Bump types-requests from 2.32.0.20240521 to 2.32.0.20240602 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/360

  • build(deps-dev): Bump types-requests from 2.32.0.20240523 to 2.32.0.20240602 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/362

  • Update incorrect prompt settings in model-settings.json cell by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/357

  • build(deps): Bump opentelemetry-sdk from 1.24.0 to 1.25.0 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/355

  • build(deps): Bump opentelemetry-sdk from 1.24.0 to 1.25.0 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/356

  • Revert "build(deps): Bump opentelemetry-sdk from 1.24.0 to 1.25.0 in /memory" by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/363

  • Revert "build(deps): Bump opentelemetry-sdk from 1.24.0 to 1.25.0 in /api" by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/364

  • Remove pinned opentelemetry dep for memory, api and local rt by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/365

  • build(deps-dev): Bump pytest from 8.2.1 to 8.2.2 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/368

  • build(deps-dev): Bump pytest from 8.2.1 to 8.2.2 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/367

  • build(deps-dev): Bump pytest from 8.2.1 to 8.2.2 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/366

  • SQL backend docs by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/358

  • docs(local): Add output for example by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/369

  • docs(local): Fix dupe image field by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/370

  • Migrate to pydantic v2 by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/349

  • Bumped mlserver to 1.6.0.dev2 by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/380

  • Cython support for llm-runtimes by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/354

  • Streaming support for local runtime by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/351

  • build(deps-dev): Bump tox from 4.15.0 to 4.15.1 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/375

  • build(deps-dev): Bump tox from 4.15.0 to 4.15.1 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/373

  • build(deps-dev): Bump tox from 4.15.0 to 4.15.1 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/372

  • build(deps): Bump orjson from 3.10.3 to 3.10.4 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/381

  • build(deps): Bump tornado from 6.4 to 6.4.1 in /docs by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/374

  • build(deps): Bump notebook from 7.2.0 to 7.2.1 in /docs by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/377

  • Rename db option by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/371

  • Bumped llmis to 0.1.1 by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/389

  • use mlserver 1.6.0.rc1 by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/390

  • build(deps): Bump sqlalchemy from 2.0.30 to 2.0.31 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/388

  • ci: Merge changes from master to release branch by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/391

  • Update tests.yml by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/392

  • ci: Merge changes from master to release branch by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/393

  • use mlserver 1.6.0 by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/400

  • ci: Merge changes from master to release branch by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/401

New Contributors

Full Changelog: https://github.com/SeldonIO/llm-runtimes/commits/0.4.0

Changesarrow-up-right

0.3.0arrow-up-right - 26 Apr 2024

Overview

Content

This is the initial release of Seldon’s LLM Module. The LLM Module provides a set of MLServer runtimes to enable serving and deployment of large language (and other GenAI) model applications using Seldon Core v2 models and pipelines. The initial set of runtimes includes an API gateway to access LLM solutions hosted by 3rd parties, a Local self-hosted for on-prem LLM deployment, and Conversational Memory to build stateful LLM applications.

API

There are many exciting companies who provide access to LLMs as a service. They all vary in their APIs. This runtime provides a unified way to access them, starting with OpenAI. This requires egress access to OpenAI’s endpoints.

Local

You can use both open foundational or fine-tuned models, such as those from Mistral and Meta, or your own custom models, trained from scratch. There’s many different performance characteristics of the different ways to run these models. We provide a unified way to access leading backends, including Transformers, vLLM, and DeepSpeed.

Conversational Memory

This facilitates the building of stateful chatbots that save conversational history. With this, you’re able to memorise (aka store) conversations long-term. This means that the context is kept and is able to be used by the different models through the API and Local runtimes.

We’ve carefully benchmarked each runtime to ensure that there is minimal overhead, and in some circumstances, have made our own improvements on top of the supported backends.

What's Changed

  • Add tox tests and github workflow by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/2

  • Updates to allow runtime to be built for MLServer by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/4

  • Refactor openai runtime directory name by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/5

  • Deepspeed runtime by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/8

  • Initial commit for Microsoft Guidance Runtime by @ukclivecoxarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/9

  • Update workflow to add linting for Guidance by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/12

  • Update notebook and readme by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/13

  • Update notebook (2) by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/14

  • Add mms notebook for deepspeed by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/15

  • Change title of MMS notebook by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/16

  • Update deepspeed mii to 0.0.6 by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/21

  • Return root error if failing to call openai endpoints by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/25

  • Add MS azure openai support by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/23

  • Update packages (mlserver 1.4.rc6 and mii 0.0.8) by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/29

  • Add parsing json string for HF params by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/31

  • Add extra args in inference config to be processed by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/35

  • Remove reference to apache2 in codebase by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/36

  • Openai images generation model fix + other small improvements by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/38

  • CI for building docker images by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/40

  • Add ability to specify version for deepspeed image by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/41

  • Fix bytes payload in openai runtime by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/43

  • upgrade deepspeed mii to 0.1.0 by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/46

  • Remove guidance by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/50

  • Release workflow by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/51

  • CI fix using envar by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/52

  • Adjust envar CI by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/53

  • CI fix: add env to name for action by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/54

  • Feature/add periodic ci and dependabot by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/58

  • Bump types-requests from 2.28.11.5 to 2.31.0.20240125 in /llm-api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/80

  • Bump types-requests from 2.28.11.5 to 2.31.0.20240125 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/79

  • Bump pytest-mock from 3.10.0 to 3.12.0 in /llm-api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/75

  • Bump pytest-cases from 3.6.14 to 3.8.2 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/71

  • Bump pytest-asyncio from 0.21.0 to 0.23.3 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/67

  • Bump deepspeed-mii from 0.1.0 to 0.2.0 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/66

  • Bump mypy from 1.2.0 to 1.8.0 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/62

  • Bump flake8 from 6.0.0 to 7.0.0 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/64

  • Bump flake8 from 6.0.0 to 7.0.0 in /llm-api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/74

  • Update dependabot.yml to include dockerfiles by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/85

  • Bump pytest-asyncio from 0.23.3 to 0.23.4 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/88

  • Update torch requirement from ~=2.0.1 to >=2.0.1,<2.2.0 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/86

  • Bump pytest from 7.3.1 to 8.0.0 in /llm-api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/83

  • Bump pytest-mock from 3.10.0 to 3.12.0 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/60

  • Bump mypy from 1.2.0 to 1.8.0 in /llm-api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/76

  • Bump pytest-cases from 3.6.14 to 3.8.2 in /llm-api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/69

  • docs(readme): Correct spelling mistake by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/96

  • [File System] Conversational Buffer Memory Runtime by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/56

  • Compatibility changes to openai-runtime for memory runtime requirements by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/55

  • fix: Use consistent package naming by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/103

  • fix(build): Missed folder name for prefix change by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/104

  • Bump pytest-asyncio from 0.23.4 to 0.23.5 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/99

  • Add pydantic validation for api base runtime by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/112

  • build(ci): Migrate from Node 16 actions to at least Node 20 by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/113

  • build(ci): Correct disk usage maximising values by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/114

  • Bump types-requests from 2.31.0.20240125 to 2.31.0.20240218 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/110

  • Bump black from 23.3.0 to 24.2.0 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/101

  • Bump pytest from 7.3.1 to 8.0.1 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/108

  • Update torch requirement from <2.2.0,>=2.0.1 to >=2.0.1,<2.3.0 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/93

  • Bump deepspeed-mii from 0.2.0 to 0.2.2 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/106

  • build(dependabot): Correct the config and update DeepSeed README by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/116

  • fix(api): Use latest formatting by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/115

  • Bump pytest-asyncio from 0.21.0 to 0.23.5 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/120

  • Bump pytest from 8.0.0 to 8.0.1 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/124

  • Bump types-requests from 2.31.0.20240125 to 2.31.0.20240218 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/123

  • Remove depreciated edits endpoint by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/125

  • Fix incorrect window size on bulk upload of messages by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/129

  • Bump seldonio/mlserver from 1.4.0.rc5-slim to 1.4.0-slim in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/131

  • Bump seldonio/mlserver from 1.4.0.rc5-slim to 1.4.0-slim in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/130

  • Bump pytest from 8.0.1 to 8.0.2 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/128

  • Bump pytest from 8.0.1 to 8.0.2 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/127

  • build: Bump to latest MLServer release by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/133

  • Bump seldonio/mlserver from 1.4.0-slim to 1.5.0-slim in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/137

  • Bump seldonio/mlserver from 1.4.0-slim to 1.5.0-slim in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/136

  • Add memory docs by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/107

  • Bump opentelemetry-instrumentation from 0.39b0 to 0.41b0 in /docs by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/140

  • Bump starlette from 0.27.0 to 0.36.2 in /docs by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/141

  • Bump pytest-asyncio from 0.23.5 to 0.23.5.post1 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/143

  • Bump pytest-cases from 3.8.2 to 3.8.3 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/144

  • Bump mypy from 1.8.0 to 1.9.0 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/145

  • Bump pytest-cases from 3.8.2 to 3.8.3 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/146

  • Bump pytest-asyncio from 0.23.5 to 0.23.5.post1 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/147

  • Bump deepspeed-mii from 0.2.2 to 0.2.3 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/149

  • Bump fastapi from 0.97.0 to 0.109.1 in /docs by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/142

  • Bump mypy from 1.8.0 to 1.9.0 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/148

  • feat(local): Initial support by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/100

  • feat(memory):Refactor filesys interface by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/154

  • feat(docs): Add OpenAI runtime docs by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/109

  • feat(local): Add prompting by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/155

  • feat(local): Add text generation by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/156

  • fix(local): Remove uneeded files by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/159

  • build(ci): Fix testing by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/158

  • build(deps-dev): bump black from 24.2.0 to 24.3.0 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/163

  • fix(local): Correctly pass parameters and check for lengths by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/161

  • build(deps-dev): bump black from 24.2.0 to 24.3.0 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/162

  • Bump types-requests from 2.31.0.20240218 to 2.31.0.20240311 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/150

  • Bump types-requests from 2.31.0.20240218 to 2.31.0.20240311 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/153

  • refactor(local): Support choosing LLMIS runtimes by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/165

  • refactor(local): Use async generation by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/166

  • build(ci): Add missing images by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/168

  • fix(deps): Add dependabot workflow + upgrade mlserver to 1.5 by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/171

  • build(deps-dev): bump tox from 4.4.12 to 4.14.1 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/189

  • build(deps-dev): bump mypy from 1.8.0 to 1.9.0 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/183

  • build(deps-dev): Bump to latest for local by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/190

  • feat(local): Add vLLM and DeepSpeed examples and tests by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/191

  • refactor(local): Don't require the model ready to unload by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/192

  • fix(local): Remove unused function stub by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/193

  • docs: Correct install target by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/198

  • docs(local): Update README with newer examples by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/199

  • fix(local): Pin LLMIS commit by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/200

  • refactor(local): Share response collection by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/194

  • refactor(local): Return correct error types by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/195

  • refactor(local): Appropriately log for debugging by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/196

  • feat(local): relative prompt path in local rt model-settings by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/203

  • Rename memory image by @mauicvarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/208

  • fix(ci): fix GH worker space issue by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/209

  • fix(build): fix failing build for local runtime by @RafalSkolasinskiarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/202

  • fix: upgrade image to cuda 12.1 by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/214

  • build(deps-dev): bump pytest-mock from 3.12.0 to 3.14.0 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/205

  • build(deps-dev): bump tox from 4.14.1 to 4.14.2 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/213

  • build(deps-dev): bump mypy from 1.2.0 to 1.9.0 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/179

  • Bump pytest from 8.0.2 to 8.1.1 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/151

  • build(deps-dev): bump tox from 4.4.12 to 4.14.2 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/212

  • Bump pytest from 8.0.2 to 8.1.1 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/152

  • build(deps-dev): bump pytest-asyncio from 0.23.5.post1 to 0.23.6 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/172

  • build(deps-dev): bump pytest-cases from 3.6.14 to 3.8.4 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/178

  • build(deps-dev): bump pytest-asyncio from 0.23.5.post1 to 0.23.6 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/180

  • build(deps-dev): bump black from 23.3.0 to 24.3.0 in /memory/requirements by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/201

  • build(deps-dev): bump pytest-cases from 3.8.3 to 3.8.4 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/169

  • build(deps-dev): bump pytest-asyncio from 0.21.0 to 0.23.6 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/182

  • build(deps-dev): bump pytest-mock from 3.12.0 to 3.14.0 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/204

  • build(deps-dev): bump pytest from 7.3.1 to 8.1.1 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/177

  • build(deps-dev): bump pytest-mock from 3.12.0 to 3.14.0 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/207

  • build(deps-dev): bump flake8 from 6.0.0 to 7.0.0 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/174

  • build(deps-dev): bump pytest-cases from 3.8.3 to 3.8.4 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/170

  • build(deps-dev): bump tox from 4.4.12 to 4.14.2 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/210

  • build(deps-dev): bump types-requests from 2.28.11.5 to 2.31.0.20240311 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/181

  • build(deps-dev): bump tox from 4.4.12 to 4.14.2 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/211

  • build(deps-dev): bump pytest-mock from 3.10.0 to 3.14.0 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/206

  • build(local): Remove Pytorch dev dependency by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/217

  • build(docs): Add Dependabot by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/218

  • docs: New structure and Local content by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/216

  • build(deps): bump notebook from 7.1.1 to 7.1.2 in /docs by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/220

  • build: Use wheel for distribution by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/222

  • build(docs): Ignore common virtualenv directories by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/223

  • docs: Continue restructure and add reference for Local by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/224

  • docs(local): Add inference requests and backends' model settings by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/225

  • docs: Add analytics, fix quoted sidebar, and add versions by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/227

  • fix(local): Allow Pydantic models' validators reuse by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/228

  • docs: Add ReadTheDocs config by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/233

  • build(deps-dev): bump types-requests from 2.31.0.20240311 to 2.31.0.20240403 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/234

  • build(deps-dev): bump types-requests from 2.31.0.20240311 to 2.31.0.20240403 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/235

  • build(deps-dev): bump types-requests from 2.31.0.20240311 to 2.31.0.20240403 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/236

  • build(deps-dev): bump types-requests from 2.31.0.20240311 to 2.31.0.20240403 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/237

  • refactor: Align versioning by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/238

  • fix: upgrade to cuda 12.1.1 by @sakousharrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/240

  • OpenAI api default generation kwargs by @RobertSamoilescuarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/239

  • fix(local): Don't delete parameters.extra by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/241

  • build(deps-dev): bump pytest-cases from 3.8.4 to 3.8.5 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/243

  • build(deps-dev): bump pytest-cases from 3.8.4 to 3.8.5 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/244

  • build(deps-dev): bump pytest-cases from 3.8.4 to 3.8.5 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/245

  • build(deps-dev): bump pytest-cases from 3.8.4 to 3.8.5 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/246

  • feat(local): Allow kwargs as serialised JSON string by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/247

  • test(local): Use parameterisation by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/248

  • feat(api): Allow llm_parameters as serialised JSON string by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/249

  • build(local): Bump LLMIS version to 0.0.1rc1 by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/226

  • build(ci): Align version check by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/251

  • build(ci): Improve space saving when build images by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/253

  • build(ci): Use correct names and SSH agent by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/254

  • build(deps-dev): bump types-requests from 2.31.0.20240403 to 2.31.0.20240406 in /memory by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/258

  • build(deps-dev): bump types-requests from 2.31.0.20240403 to 2.31.0.20240406 in /api by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/255

  • build(deps-dev): bump types-requests from 2.31.0.20240403 to 2.31.0.20240406 in /local by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/256

  • build(deps-dev): bump types-requests from 2.31.0.20240403 to 2.31.0.20240406 in /deepspeed by @dependabotarrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/257

  • build(ci): Migrate from Node 16 by @jesse-carrow-up-right in https://github.com/SeldonIO/llm-runtimes/pull/252

Full Changelog: https://github.com/SeldonIO/llm-runtimes/commits/0.3.0

Changesarrow-up-right

Last updated

Was this helpful?