Changelog

0.7.0 - 2025-06-06

Features

  • Added support of agentic workflows via OpenAI API (e.g. tools, planning, reflection patterns).

What's Changed

Full Changelog: https://github.com/SeldonIO/llm-runtimes/compare/0.6.0...0.7.0

0.6.0 - Mar 13, 2025

Main Features

  • New runtimes:

    • Sentence Transformers for text embeddings

    • Prompt Runtime - allows to reference the same model deployed locally with different prompts

  • Support for conditional routing through Core 2 pipelines - this feature allows conditional data flows based on the output of the LLM enabling support for agentic workflows

What's Changed

  • Replaced local images with the ones from artifacts registry by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/606

  • Remove redundant server configs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/607

  • Update Models Overview by @paulb-seldon in https://github.com/SeldonIO/llm-runtimes/pull/609

  • Quickstart example by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/608

  • Fixed models and pipelines naming by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/612

  • Included upperbound on httpx by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/615

  • Clean docs by @paulb-seldon in https://github.com/SeldonIO/llm-runtimes/pull/613

  • Remove PVC from docs examples by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/616

  • Updated images tag to 0.5.0 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/618

  • Fixed installation test image path by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/619

  • Cat the pipeline definition before deploying it by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/621

  • Fixed typo in azure docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/622

  • Fixed name of vector db env prefix by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/628

  • Sentence transformers support by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/626

  • Simplify content when content is a single text message by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/630

  • Refactored extra settings init by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/631

  • Passed database as string in get_db_interface by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/633

  • Fixed memory psql db url by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/634

  • Prompt runtime for local llms by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/627

  • Docs for local-embedding runtime by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/632

  • Renamed url to infer_uri in PromptRuntime by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/641

  • Included output postprocessing for conditional routing by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/637

  • Wrote prompt runtime docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/636

  • Wrote docs for routing by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/643

  • Renamed routing to agents by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/645

  • build(deps-dev): Bump black from 24.10.0 to 25.1.0 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/644

  • build(deps-dev): Bump isort from 5.13.2 to 6.0.0 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/640

  • build(deps-dev): Bump tox from 4.23.0 to 4.24.1 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/639

  • build(deps-dev): Bump syrupy from 4.7.2 to 4.8.1 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/635

  • build(deps-dev): Bump mypy from 1.13.0 to 1.14.1 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/624

  • build(deps-dev): Bump pytest-asyncio from 0.24.0 to 0.25.3 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/642

  • Implemented routing for prompt runtime by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/647

  • Included installation steps for prompt runtime by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/649

  • Guardrails docs example by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/646

  • Bumped mlserver to 1.7.0.rc1 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/650

  • bump LLMIS to 0.3.0 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/651

  • Workflow for prompt-utils runtime by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/652

  • Workflow for prompt-utils runtime (#652) by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/653

  • Fixed capabilities for prompt utils server by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/654

  • Docs for http sse streaming by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/660

  • Bumped LLMIS to 0.3.1 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/666

  • build(deps-dev): Bump flake8 from 7.1.1 to 7.1.2 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/657

  • build(deps-dev): Bump isort from 6.0.0 to 6.0.1 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/659

  • build(deps-dev): Bump pytest from 8.3.3 to 8.3.5 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/662

  • build(deps-dev): Bump types-requests from 2.32.0.20241016 to 2.32.0.20250306 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/663

  • build(deps-dev): Bump tox from 4.24.1 to 4.24.2 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/664

  • build(deps-dev): Bump syrupy from 4.8.1 to 4.9.0 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/665

  • ci: Merge change for release 0.6.0 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/667

  • Fix docs for 0.6.0.rc3 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/668

  • Fix docs for 0.6.0.rc3 (#668) by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/669

  • Bump mlserver to 1.7.0.rc2 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/670

  • Bump mlserver to 1.7.0.rc2 (#670) by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/671

Full Changelog: https://github.com/SeldonIO/llm-runtimes/compare/0.5.0...0.6.0

0.5.0 - Dec 4, 2024

Main Features

  • New runtimes:

  • Streaming support for OpenAI and Gemini

  • Standardized IO accross all runtimes (i.e. prompting)

  • RAG support for the Local Runtime

What's Changed

  • Update CHANGELOG by @github-actions in https://github.com/SeldonIO/llm-runtimes/pull/403

  • Bump openai 1.35.3 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/402

  • Included streaming support for OpenAI by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/404

  • build(deps-dev): Bump tox from 4.15.1 to 4.16.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/409

  • build(deps-dev): Bump tox from 4.15.1 to 4.16.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/408

  • build(deps-dev): Bump tox from 4.15.1 to 4.16.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/407

  • build(deps): Bump orjson from 3.10.4 to 3.10.6 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/406

  • build(deps-dev): Bump mypy from 1.10.0 to 1.10.1 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/398

  • build(deps-dev): Bump mypy from 1.10.0 to 1.10.1 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/399

  • build(deps-dev): Bump mypy from 1.10.0 to 1.10.1 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/397

  • build(deps-dev): Bump types-requests from 2.32.0.20240602 to 2.32.0.20240622 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/396

  • build(deps-dev): Bump types-requests from 2.32.0.20240602 to 2.32.0.20240622 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/395

  • build(deps-dev): Bump types-requests from 2.32.0.20240602 to 2.32.0.20240622 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/394

  • build(deps-dev): Bump flake8 from 7.0.0 to 7.1.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/387

  • build(deps-dev): Bump flake8 from 7.0.0 to 7.1.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/386

  • build(deps-dev): Bump flake8 from 7.0.0 to 7.1.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/385

  • build(deps): Bump urllib3 from 2.2.1 to 2.2.2 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/384

  • build(deps): Bump urllib3 from 2.2.1 to 2.2.2 in /local/requirements by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/383

  • Add a DISABLE_CYTHON environment variable for development purposes by @jklaise in https://github.com/SeldonIO/llm-runtimes/pull/411

  • Do not exclude packages when installing in dev mode without Cython by @jklaise in https://github.com/SeldonIO/llm-runtimes/pull/412

  • added the create the servers pieces to the documentation by @joshsgoldstein in https://github.com/SeldonIO/llm-runtimes/pull/416

  • Feat/gemini api runtime by @jklaise in https://github.com/SeldonIO/llm-runtimes/pull/420

  • Migration to GitBook and General Doc Improvements by @ramonpzg in https://github.com/SeldonIO/llm-runtimes/pull/312

  • Update dependabot.yml to remove docs scan by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/427

  • fix: CI failing because because of a change of permission by@sakoush in https://github.com/SeldonIO/llm-runtimes/pull/429

  • fix: remove docs in version script by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/430

  • bump versions from 0.3.0 to 0.5.0.dev by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/428

  • Fix/gemini grpc bytes decoding by @jklaise in https://github.com/SeldonIO/llm-runtimes/pull/434

  • Add types tensor to Gemini output by @jklaise in https://github.com/SeldonIO/llm-runtimes/pull/435

  • fix: add security workflow for llm-runtimes by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/442

  • fix: move workflow to workflows directory by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/443

  • fix: use one big security scan task by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/444

  • fix: add build base to local docker image build step by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/458

  • fix: Update security.yml to exclude our runtimes from 3rd party deps by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/465

  • Feat/gemini multi modal input by @jklaise in https://github.com/SeldonIO/llm-runtimes/pull/459

  • Streaming support for Gemini by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/466

  • Fix setting stream flag for other tasks than completion and chat by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/474

  • build(deps-dev): Bump black from 24.4.2 to 24.8.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/439

  • build(deps-dev): Bump black from 24.4.2 to 24.8.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/440

  • build(deps-dev): Bump black from 24.4.2 to 24.8.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/441

  • build(deps-dev): Bump flake8 from 7.1.0 to 7.1.1 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/449

  • Included VectorDB runtime by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/480

  • build(deps-dev): Bump flake8 from 7.1.0 to 7.1.1 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/445

  • build(deps-dev): Bump wheel from 0.43.0 to 0.44.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/450

  • build(deps-dev): Bump flake8 from 7.1.0 to 7.1.1 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/454

  • build(deps): Bump orjson from 3.10.6 to 3.10.7 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/460

  • build(deps-dev): Bump mypy from 1.10.1 to 1.11.2 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/473

  • build(deps-dev): Bump wheel from 0.43.0 to 0.44.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/446

  • build(deps-dev): Bump black from 24.4.2 to 24.8.0 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/499

  • build(deps-dev): Bump flake8 from 7.1.0 to 7.1.1 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/497

  • build(deps-dev): Bump tox from 4.16.0 to 4.18.1 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/481

  • build(deps-dev): Bump tox from 4.16.0 to 4.18.1 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/498

  • build(deps-dev): Bump types-requests from 2.32.0.20240622 to 2.32.0.20240907 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/500

  • build(deps): Update google-generativeai requirement from ==0.7.* to >=0.7,<0.9 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/489

  • build(deps-dev): Bump tox from 4.16.0 to 4.18.1 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/485

  • build(deps-dev): Bump pytest-postgresql from 6.0.0 to 6.1.1 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/479

  • build(deps): Bump sqlalchemy from 2.0.31 to 2.0.34 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/476

  • build(deps-dev): Bump wheel from 0.43.0 to 0.44.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/452

  • build(deps-dev): Bump types-requests from 2.32.0.20240622 to 2.32.0.20240907 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/484

  • build(deps-dev): Bump types-requests from 2.32.0.20240622 to 2.32.0.20240907 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/494

  • build(deps-dev): Bump mypy from 1.10.1 to 1.11.2 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/472

  • build(deps-dev): Bump mypy from 1.10.1 to 1.11.2 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/495

  • build(deps-dev): Bump wheel from 0.43.0 to 0.44.0 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/493

  • build(deps-dev): Bump tox from 4.16.0 to 4.18.1 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/483

  • build(deps-dev): Bump types-requests from 2.32.0.20240622 to 2.32.0.20240907 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/482

  • Update security.yml to include vector-db for code scan by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/502

  • build(deps-dev): Bump syrupy from 4.6.1 to 4.7.1 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/470

  • build(deps-dev): Bump mypy from 1.10.1 to 1.11.2 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/471

  • Bumped MLServer to 1.6.1 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/501

  • Bumped llmis to 0.2.0 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/503

  • Included missing files for qdrant by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/506

  • Included pgvector support by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/505

  • build(deps-dev): Bump tox from 4.18.1 to 4.19.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/516

  • build(deps-dev): Bump tox from 4.18.1 to 4.19.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/515

  • build(deps-dev): Bump tox from 4.18.1 to 4.19.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/514

  • build(deps-dev): Bump tox from 4.18.1 to 4.19.0 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/513

  • build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20240914 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/511

  • build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20240914 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/510

  • build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20240914 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/508

  • build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20240914 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/509

  • build(deps): Bump sqlalchemy from 2.0.34 to 2.0.35 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/512

  • build(deps-dev): Bump pytest from 7.3.1 to 8.3.3 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/490

  • build(deps-dev): Bump pytest from 7.3.1 to 8.3.3 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/488

  • build(deps-dev): Bump pytest from 7.3.1 to 8.3.3 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/487

  • build(deps-dev): Bump pytest-asyncio from 0.23.7 to 0.24.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/469

  • build(deps-dev): Bump pytest-asyncio from 0.23.7 to 0.24.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/468

  • build(deps-dev): Bump pytest-asyncio from 0.23.7 to 0.24.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/517

  • Prototype/add prompt and preprocessing by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/461

  • Add gitbook docs to master branch by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/548

  • Integrate 461 in memory by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/507

  • Moved prompt-utils in requirements.txt by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/549

  • Integrate #461 in API runtime by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/486

  • Prototype integrate 461 in local by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/557

  • Prototype integrate 461 in vector db by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/564

  • build(deps-dev): Bump pytest from 7.3.1 to 8.3.3 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/527

  • build(deps-dev): Bump pytest-asyncio from 0.23.7 to 0.24.0 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/526

  • build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/528

  • build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/529

  • build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/531

  • build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/532

  • build(deps-dev): Bump tox from 4.19.0 to 4.21.2 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/544

  • build(deps-dev): Bump pytest-cases from 3.8.5 to 3.8.6 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/530

  • build(deps-dev): Bump tox from 4.19.0 to 4.21.2 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/545

  • build(deps-dev): Bump types-requests from 2.32.0.20240914 to 2.32.0.20241016 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/565

  • build(deps-dev): Bump syrupy from 4.7.1 to 4.7.2 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/555

  • build(deps-dev): Bump types-requests from 2.32.0.20240907 to 2.32.0.20241016 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/566

  • Rag support for local runtime by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/570

  • build(deps-dev): Bump tox from 4.19.0 to 4.21.2 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/546

  • build(deps-dev): Bump black from 24.8.0 to 24.10.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/551

  • build(deps-dev): Bump black from 24.8.0 to 24.10.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/553

  • build(deps-dev): Bump tox from 4.19.0 to 4.23.0 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/567

  • Fix memory system prompt by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/569

  • Runtime settings consistency by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/572

  • build(deps-dev): Bump black from 24.8.0 to 24.10.0 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/550

  • build(deps-dev): Bump syrupy from 4.7.1 to 4.7.2 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/552

  • build(deps-dev): Bump black from 24.8.0 to 24.10.0 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/554

  • build(deps): Update qdrant-client requirement from <1.12.0 to <1.13.0 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/556

  • build(deps-dev): Bump mypy from 1.11.2 to 1.12.0 in /vector-db by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/558

  • build(deps-dev): Bump mypy from 1.11.2 to 1.12.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/559

  • build(deps-dev): Bump mypy from 1.11.2 to 1.12.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/561

  • build(deps-dev): Bump tox from 4.18.1 to 4.23.0 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/568

  • build(deps-dev): Bump mypy from 1.10.1 to 1.13.0 in /prompt-utils by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/574

  • Default model type initialization in prompt utils by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/573

  • Renamed mlserver-llm-openai to mlserver-llm-api by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/577

  • Fix openai setting config by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/578

  • Updated openai docs with IO by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/579

  • OpenAI function calling docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/580

  • IO gemini docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/581

  • Refactored the azure docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/584

  • IO memory docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/586

  • IO vector-db docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/587

  • IO local docs - chat model by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/588

  • IO local mms docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/589

  • IO local chat template by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/591

  • fix(docs): pull changes from docs-master by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/583

  • IO local quantization docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/590

  • IO chat bot by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/592

  • Fixed tensor names for vector-db example by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/597

  • Updated azure model settings and image by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/598

  • Improved local server manifest by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/599

  • Included wait ready for models, pipelines, and deployments by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/600

  • Updated and added docs refs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/596

  • Updated installation docs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/601

  • Removed k8s example by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/602

  • Docs for monitoring by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/593

  • Updated runtimes docs intro by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/603

  • Docs restructure by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/604

  • ci: Merge change for release 0.5.0 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/605

Full Changelog: https://github.com/SeldonIO/llm-runtimes/commits/0.5.0

0.4.0 - 27 Jun 2024

Main Features

  • Code transpilation using cython, which allows code obfuscation

  • Output streaming support for local runtime, which allows faster time to first token back to the user

  • SQL backend for memory runtime, which allows the use of dbs such as postgres or mysql to store chat messages

  • Support for MMS, which allows the use of multiple models on the same GPU (limited support)

What's Changed

  • build(ci): Use a correct example by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/259

  • docs: Remove link to GitHub repository by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/260

  • build: Add license by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/261

  • feat(local): Bump LLMIS to 0.0.1.rc2 by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/263

  • build(deps): bump idna from 3.6 to 3.7 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/265

  • docs: Correct link to memory examples by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/262

  • build(deps-dev): bump black from 24.3.0 to 24.4.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/266

  • build(deps-dev): bump black from 24.3.0 to 24.4.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/267

  • build(deps-dev): bump black from 24.3.0 to 24.4.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/269

  • build(deps-dev): bump black from 24.3.0 to 24.4.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/268

  • build(deps): bump sphinx-autobuild from 2024.2.4 to 2024.4.13 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/270

  • Remove duplicate docs examples by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/271

  • docs(local): Add Core 2 model examples by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/242

  • docs: Add changelog by @github-actions in https://github.com/SeldonIO/llm-runtimes/pull/272

  • release: placeholder for promotion action by @RafalSkolasinski in https://github.com/SeldonIO/llm-runtimes/pull/273

  • build(deps): bump sphinx-autobuild from 2024.4.13 to 2024.4.16 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/275

  • release: promote images from dev to production registry by @RafalSkolasinski in https://github.com/SeldonIO/llm-runtimes/pull/274

  • feat(docs): Add e2e example by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/126

  • Bugfix/add docs pr changes by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/277

  • build(deps): bump notebook from 7.1.2 to 7.1.3 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/278

  • fix(local): Remove dupe kwargs setting by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/279

  • Fix links and MakeFile by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/280

  • Add minor note on MinIO to chatbot example by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/282

  • Update the image in the chatbot example doc by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/284

  • build(deps): bump myst-parser from 2.0.0 to 3.0.0 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/285

  • fix(deps): Use LLMIS 0.0.1 by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/296

  • docs: Add installation by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/281

  • build(deps-dev): bump black from 24.4.0 to 24.4.1 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/294

  • build(deps-dev): bump mypy from 1.9.0 to 1.10.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/287

  • build(deps-dev): bump black from 24.4.0 to 24.4.1 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/289

  • build(deps-dev): bump black from 24.4.0 to 24.4.2 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/297

  • build(deps-dev): bump mypy from 1.9.0 to 1.10.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/291

  • build(deps-dev): bump mypy from 1.9.0 to 1.10.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/295

  • build(deps-dev): bump mypy from 1.9.0 to 1.10.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/290

  • build(deps): update torch requirement from <2.3.0,>=2.0.1 to >=2.0.1,<2.4.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/293

  • build(deps-dev): bump black from 24.4.0 to 24.4.2 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/299

  • Update CHANGELOG by @github-actions in https://github.com/SeldonIO/llm-runtimes/pull/300

  • Bump version to 0.3.0 by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/301

  • docs(api): Fix header level by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/302

  • Add section at end of e2e example for changing api to local rt by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/283

  • Wholesale docs edits pb by @paulb-seldon in https://github.com/SeldonIO/llm-runtimes/pull/298

  • run CI tests only for version of Python that is used within the Docker image by @RafalSkolasinski in https://github.com/SeldonIO/llm-runtimes/pull/303

  • deprecate and archive deepspeed by @RafalSkolasinski in https://github.com/SeldonIO/llm-runtimes/pull/304

  • fix: Remove references to removed runtime by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/313

  • Update API examples for docker access and kubernetes deployment by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/321

  • Included local runtime limitations section. by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/320

  • use {{ current-version }} in example manifest by @RafalSkolasinski in https://github.com/SeldonIO/llm-runtimes/pull/319

  • Feature/add prompt note by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/324

  • Docs example for quantization and tensor parallelism by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/323

  • Add local model loading from settings relative path by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/329

  • docs: Update docs for azure openai by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/326

  • build(deps): bump notebook from 7.1.3 to 7.2.0 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/330

  • build(deps): bump jinja2 from 3.1.3 to 3.1.4 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/327

  • build(deps): bump myst-parser from 3.0.0 to 3.0.1 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/318

  • build(deps-dev): bump pytest from 8.1.1 to 8.2.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/314

  • build(deps-dev): bump pytest from 8.1.1 to 8.2.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/315

  • build(deps-dev): bump pytest from 8.1.1 to 8.2.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/316

  • build(deps-dev): bump tox from 4.14.2 to 4.15.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/310

  • build(deps-dev): bump tox from 4.14.2 to 4.15.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/306

  • build(deps-dev): bump black from 24.4.1 to 24.4.2 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/308

  • build(deps-dev): bump tox from 4.14.2 to 4.15.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/307

  • build(deps): bump requests from 2.31.0 to 2.32.0 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/337

  • build(deps-dev): bump pytest-asyncio from 0.23.6 to 0.23.7 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/336

  • build(deps-dev): bump pytest-asyncio from 0.23.6 to 0.23.7 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/334

  • build(deps-dev): bump pytest-asyncio from 0.23.6 to 0.23.7 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/332

  • build(deps-dev): bump pytest from 8.2.0 to 8.2.1 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/331

  • build(deps-dev): bump pytest from 8.2.0 to 8.2.1 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/333

  • build(deps-dev): bump pytest from 8.2.0 to 8.2.1 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/335

  • build(deps-dev): Bump types-requests from 2.31.0.20240406 to 2.32.0.20240521 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/339

  • build(deps-dev): Bump types-requests from 2.31.0.20240406 to 2.32.0.20240521 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/340

  • build(deps-dev): Bump types-requests from 2.31.0.20240406 to 2.32.0.20240521 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/338

  • feat(memory): Add postgres backend by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/134

  • Included documentation and examples for mms by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/347

  • build(deps-dev): Bump types-requests from 2.32.0.20240521 to 2.32.0.20240523 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/344

  • build(deps-dev): Bump types-requests from 2.32.0.20240521 to 2.32.0.20240523 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/346

  • build(deps): Bump sphinx-design from 0.5.0 to 0.6.0 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/342

  • Pin open telemetry sdk package to fix ci by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/353

  • Ensure db portability in memory rt by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/352

  • build(deps-dev): Bump types-requests from 2.32.0.20240523 to 2.32.0.20240602 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/359

  • build(deps-dev): Bump types-requests from 2.32.0.20240521 to 2.32.0.20240602 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/360

  • build(deps-dev): Bump types-requests from 2.32.0.20240523 to 2.32.0.20240602 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/362

  • Update incorrect prompt settings in model-settings.json cell by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/357

  • build(deps): Bump opentelemetry-sdk from 1.24.0 to 1.25.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/355

  • build(deps): Bump opentelemetry-sdk from 1.24.0 to 1.25.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/356

  • Revert "build(deps): Bump opentelemetry-sdk from 1.24.0 to 1.25.0 in /memory" by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/363

  • Revert "build(deps): Bump opentelemetry-sdk from 1.24.0 to 1.25.0 in /api" by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/364

  • Remove pinned opentelemetry dep for memory, api and local rt by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/365

  • build(deps-dev): Bump pytest from 8.2.1 to 8.2.2 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/368

  • build(deps-dev): Bump pytest from 8.2.1 to 8.2.2 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/367

  • build(deps-dev): Bump pytest from 8.2.1 to 8.2.2 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/366

  • SQL backend docs by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/358

  • docs(local): Add output for example by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/369

  • docs(local): Fix dupe image field by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/370

  • Migrate to pydantic v2 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/349

  • Bumped mlserver to 1.6.0.dev2 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/380

  • Cython support for llm-runtimes by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/354

  • Streaming support for local runtime by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/351

  • build(deps-dev): Bump tox from 4.15.0 to 4.15.1 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/375

  • build(deps-dev): Bump tox from 4.15.0 to 4.15.1 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/373

  • build(deps-dev): Bump tox from 4.15.0 to 4.15.1 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/372

  • build(deps): Bump orjson from 3.10.3 to 3.10.4 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/381

  • build(deps): Bump tornado from 6.4 to 6.4.1 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/374

  • build(deps): Bump notebook from 7.2.0 to 7.2.1 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/377

  • Rename db option by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/371

  • Bumped llmis to 0.1.1 by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/389

  • use mlserver 1.6.0.rc1 by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/390

  • build(deps): Bump sqlalchemy from 2.0.30 to 2.0.31 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/388

  • ci: Merge changes from master to release branch by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/391

  • Update tests.yml by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/392

  • ci: Merge changes from master to release branch by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/393

  • use mlserver 1.6.0 by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/400

  • ci: Merge changes from master to release branch by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/401

New Contributors

  • @paulb-seldon made their first contribution in https://github.com/SeldonIO/llm-runtimes/pull/298

Full Changelog: https://github.com/SeldonIO/llm-runtimes/commits/0.4.0

Changes

0.3.0 - 26 Apr 2024

Overview

Content

This is the initial release of Seldon’s LLM Module. The LLM Module provides a set of MLServer runtimes to enable serving and deployment of large language (and other GenAI) model applications using Seldon Core v2 models and pipelines. The initial set of runtimes includes an API gateway to access LLM solutions hosted by 3rd parties, a Local self-hosted for on-prem LLM deployment, and Conversational Memory to build stateful LLM applications.

API

There are many exciting companies who provide access to LLMs as a service. They all vary in their APIs. This runtime provides a unified way to access them, starting with OpenAI. This requires egress access to OpenAI’s endpoints.

Local

You can use both open foundational or fine-tuned models, such as those from Mistral and Meta, or your own custom models, trained from scratch. There’s many different performance characteristics of the different ways to run these models. We provide a unified way to access leading backends, including Transformers, vLLM, and DeepSpeed.

Conversational Memory

This facilitates the building of stateful chatbots that save conversational history. With this, you’re able to memorise (aka store) conversations long-term. This means that the context is kept and is able to be used by the different models through the API and Local runtimes.

We’ve carefully benchmarked each runtime to ensure that there is minimal overhead, and in some circumstances, have made our own improvements on top of the supported backends.

What's Changed

  • Add tox tests and github workflow by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/2

  • Updates to allow runtime to be built for MLServer by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/4

  • Refactor openai runtime directory name by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/5

  • Deepspeed runtime by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/8

  • Initial commit for Microsoft Guidance Runtime by @ukclivecox in https://github.com/SeldonIO/llm-runtimes/pull/9

  • Update workflow to add linting for Guidance by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/12

  • Update notebook and readme by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/13

  • Update notebook (2) by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/14

  • Add mms notebook for deepspeed by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/15

  • Change title of MMS notebook by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/16

  • Update deepspeed mii to 0.0.6 by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/21

  • Return root error if failing to call openai endpoints by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/25

  • Add MS azure openai support by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/23

  • Update packages (mlserver 1.4.rc6 and mii 0.0.8) by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/29

  • Add parsing json string for HF params by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/31

  • Add extra args in inference config to be processed by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/35

  • Remove reference to apache2 in codebase by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/36

  • Openai images generation model fix + other small improvements by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/38

  • CI for building docker images by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/40

  • Add ability to specify version for deepspeed image by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/41

  • Fix bytes payload in openai runtime by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/43

  • upgrade deepspeed mii to 0.1.0 by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/46

  • Remove guidance by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/50

  • Release workflow by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/51

  • CI fix using envar by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/52

  • Adjust envar CI by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/53

  • CI fix: add env to name for action by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/54

  • Feature/add periodic ci and dependabot by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/58

  • Bump types-requests from 2.28.11.5 to 2.31.0.20240125 in /llm-api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/80

  • Bump types-requests from 2.28.11.5 to 2.31.0.20240125 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/79

  • Bump pytest-mock from 3.10.0 to 3.12.0 in /llm-api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/75

  • Bump pytest-cases from 3.6.14 to 3.8.2 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/71

  • Bump pytest-asyncio from 0.21.0 to 0.23.3 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/67

  • Bump deepspeed-mii from 0.1.0 to 0.2.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/66

  • Bump mypy from 1.2.0 to 1.8.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/62

  • Bump flake8 from 6.0.0 to 7.0.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/64

  • Bump flake8 from 6.0.0 to 7.0.0 in /llm-api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/74

  • Update dependabot.yml to include dockerfiles by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/85

  • Bump pytest-asyncio from 0.23.3 to 0.23.4 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/88

  • Update torch requirement from ~=2.0.1 to >=2.0.1,<2.2.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/86

  • Bump pytest from 7.3.1 to 8.0.0 in /llm-api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/83

  • Bump pytest-mock from 3.10.0 to 3.12.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/60

  • Bump mypy from 1.2.0 to 1.8.0 in /llm-api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/76

  • Bump pytest-cases from 3.6.14 to 3.8.2 in /llm-api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/69

  • docs(readme): Correct spelling mistake by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/96

  • [File System] Conversational Buffer Memory Runtime by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/56

  • Compatibility changes to openai-runtime for memory runtime requirements by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/55

  • fix: Use consistent package naming by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/103

  • fix(build): Missed folder name for prefix change by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/104

  • Bump pytest-asyncio from 0.23.4 to 0.23.5 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/99

  • Add pydantic validation for api base runtime by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/112

  • build(ci): Migrate from Node 16 actions to at least Node 20 by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/113

  • build(ci): Correct disk usage maximising values by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/114

  • Bump types-requests from 2.31.0.20240125 to 2.31.0.20240218 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/110

  • Bump black from 23.3.0 to 24.2.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/101

  • Bump pytest from 7.3.1 to 8.0.1 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/108

  • Update torch requirement from <2.2.0,>=2.0.1 to >=2.0.1,<2.3.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/93

  • Bump deepspeed-mii from 0.2.0 to 0.2.2 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/106

  • build(dependabot): Correct the config and update DeepSeed README by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/116

  • fix(api): Use latest formatting by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/115

  • Bump pytest-asyncio from 0.21.0 to 0.23.5 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/120

  • Bump pytest from 8.0.0 to 8.0.1 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/124

  • Bump types-requests from 2.31.0.20240125 to 2.31.0.20240218 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/123

  • Remove depreciated edits endpoint by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/125

  • Fix incorrect window size on bulk upload of messages by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/129

  • Bump seldonio/mlserver from 1.4.0.rc5-slim to 1.4.0-slim in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/131

  • Bump seldonio/mlserver from 1.4.0.rc5-slim to 1.4.0-slim in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/130

  • Bump pytest from 8.0.1 to 8.0.2 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/128

  • Bump pytest from 8.0.1 to 8.0.2 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/127

  • build: Bump to latest MLServer release by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/133

  • Bump seldonio/mlserver from 1.4.0-slim to 1.5.0-slim in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/137

  • Bump seldonio/mlserver from 1.4.0-slim to 1.5.0-slim in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/136

  • Add memory docs by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/107

  • Bump opentelemetry-instrumentation from 0.39b0 to 0.41b0 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/140

  • Bump starlette from 0.27.0 to 0.36.2 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/141

  • Bump pytest-asyncio from 0.23.5 to 0.23.5.post1 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/143

  • Bump pytest-cases from 3.8.2 to 3.8.3 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/144

  • Bump mypy from 1.8.0 to 1.9.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/145

  • Bump pytest-cases from 3.8.2 to 3.8.3 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/146

  • Bump pytest-asyncio from 0.23.5 to 0.23.5.post1 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/147

  • Bump deepspeed-mii from 0.2.2 to 0.2.3 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/149

  • Bump fastapi from 0.97.0 to 0.109.1 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/142

  • Bump mypy from 1.8.0 to 1.9.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/148

  • feat(local): Initial support by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/100

  • feat(memory):Refactor filesys interface by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/154

  • feat(docs): Add OpenAI runtime docs by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/109

  • feat(local): Add prompting by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/155

  • feat(local): Add text generation by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/156

  • fix(local): Remove uneeded files by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/159

  • build(ci): Fix testing by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/158

  • build(deps-dev): bump black from 24.2.0 to 24.3.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/163

  • fix(local): Correctly pass parameters and check for lengths by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/161

  • build(deps-dev): bump black from 24.2.0 to 24.3.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/162

  • Bump types-requests from 2.31.0.20240218 to 2.31.0.20240311 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/150

  • Bump types-requests from 2.31.0.20240218 to 2.31.0.20240311 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/153

  • refactor(local): Support choosing LLMIS runtimes by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/165

  • refactor(local): Use async generation by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/166

  • build(ci): Add missing images by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/168

  • fix(deps): Add dependabot workflow + upgrade mlserver to 1.5 by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/171

  • build(deps-dev): bump tox from 4.4.12 to 4.14.1 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/189

  • build(deps-dev): bump mypy from 1.8.0 to 1.9.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/183

  • build(deps-dev): Bump to latest for local by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/190

  • feat(local): Add vLLM and DeepSpeed examples and tests by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/191

  • refactor(local): Don't require the model ready to unload by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/192

  • fix(local): Remove unused function stub by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/193

  • docs: Correct install target by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/198

  • docs(local): Update README with newer examples by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/199

  • fix(local): Pin LLMIS commit by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/200

  • refactor(local): Share response collection by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/194

  • refactor(local): Return correct error types by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/195

  • refactor(local): Appropriately log for debugging by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/196

  • feat(local): relative prompt path in local rt model-settings by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/203

  • Rename memory image by @mauicv in https://github.com/SeldonIO/llm-runtimes/pull/208

  • fix(ci): fix GH worker space issue by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/209

  • fix(build): fix failing build for local runtime by @RafalSkolasinski in https://github.com/SeldonIO/llm-runtimes/pull/202

  • fix: upgrade image to cuda 12.1 by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/214

  • build(deps-dev): bump pytest-mock from 3.12.0 to 3.14.0 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/205

  • build(deps-dev): bump tox from 4.14.1 to 4.14.2 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/213

  • build(deps-dev): bump mypy from 1.2.0 to 1.9.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/179

  • Bump pytest from 8.0.2 to 8.1.1 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/151

  • build(deps-dev): bump tox from 4.4.12 to 4.14.2 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/212

  • Bump pytest from 8.0.2 to 8.1.1 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/152

  • build(deps-dev): bump pytest-asyncio from 0.23.5.post1 to 0.23.6 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/172

  • build(deps-dev): bump pytest-cases from 3.6.14 to 3.8.4 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/178

  • build(deps-dev): bump pytest-asyncio from 0.23.5.post1 to 0.23.6 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/180

  • build(deps-dev): bump black from 23.3.0 to 24.3.0 in /memory/requirements by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/201

  • build(deps-dev): bump pytest-cases from 3.8.3 to 3.8.4 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/169

  • build(deps-dev): bump pytest-asyncio from 0.21.0 to 0.23.6 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/182

  • build(deps-dev): bump pytest-mock from 3.12.0 to 3.14.0 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/204

  • build(deps-dev): bump pytest from 7.3.1 to 8.1.1 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/177

  • build(deps-dev): bump pytest-mock from 3.12.0 to 3.14.0 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/207

  • build(deps-dev): bump flake8 from 6.0.0 to 7.0.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/174

  • build(deps-dev): bump pytest-cases from 3.8.3 to 3.8.4 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/170

  • build(deps-dev): bump tox from 4.4.12 to 4.14.2 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/210

  • build(deps-dev): bump types-requests from 2.28.11.5 to 2.31.0.20240311 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/181

  • build(deps-dev): bump tox from 4.4.12 to 4.14.2 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/211

  • build(deps-dev): bump pytest-mock from 3.10.0 to 3.14.0 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/206

  • build(local): Remove Pytorch dev dependency by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/217

  • build(docs): Add Dependabot by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/218

  • docs: New structure and Local content by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/216

  • build(deps): bump notebook from 7.1.1 to 7.1.2 in /docs by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/220

  • build: Use wheel for distribution by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/222

  • build(docs): Ignore common virtualenv directories by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/223

  • docs: Continue restructure and add reference for Local by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/224

  • docs(local): Add inference requests and backends' model settings by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/225

  • docs: Add analytics, fix quoted sidebar, and add versions by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/227

  • fix(local): Allow Pydantic models' validators reuse by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/228

  • docs: Add ReadTheDocs config by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/233

  • build(deps-dev): bump types-requests from 2.31.0.20240311 to 2.31.0.20240403 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/234

  • build(deps-dev): bump types-requests from 2.31.0.20240311 to 2.31.0.20240403 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/235

  • build(deps-dev): bump types-requests from 2.31.0.20240311 to 2.31.0.20240403 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/236

  • build(deps-dev): bump types-requests from 2.31.0.20240311 to 2.31.0.20240403 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/237

  • refactor: Align versioning by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/238

  • fix: upgrade to cuda 12.1.1 by @sakoush in https://github.com/SeldonIO/llm-runtimes/pull/240

  • OpenAI api default generation kwargs by @RobertSamoilescu in https://github.com/SeldonIO/llm-runtimes/pull/239

  • fix(local): Don't delete parameters.extra by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/241

  • build(deps-dev): bump pytest-cases from 3.8.4 to 3.8.5 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/243

  • build(deps-dev): bump pytest-cases from 3.8.4 to 3.8.5 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/244

  • build(deps-dev): bump pytest-cases from 3.8.4 to 3.8.5 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/245

  • build(deps-dev): bump pytest-cases from 3.8.4 to 3.8.5 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/246

  • feat(local): Allow kwargs as serialised JSON string by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/247

  • test(local): Use parameterisation by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/248

  • feat(api): Allow llm_parameters as serialised JSON string by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/249

  • build(local): Bump LLMIS version to 0.0.1rc1 by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/226

  • build(ci): Align version check by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/251

  • build(ci): Improve space saving when build images by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/253

  • build(ci): Use correct names and SSH agent by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/254

  • build(deps-dev): bump types-requests from 2.31.0.20240403 to 2.31.0.20240406 in /memory by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/258

  • build(deps-dev): bump types-requests from 2.31.0.20240403 to 2.31.0.20240406 in /api by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/255

  • build(deps-dev): bump types-requests from 2.31.0.20240403 to 2.31.0.20240406 in /local by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/256

  • build(deps-dev): bump types-requests from 2.31.0.20240403 to 2.31.0.20240406 in /deepspeed by @dependabot in https://github.com/SeldonIO/llm-runtimes/pull/257

  • build(ci): Migrate from Node 16 by @jesse-c in https://github.com/SeldonIO/llm-runtimes/pull/252

Full Changelog: https://github.com/SeldonIO/llm-runtimes/commits/0.3.0

Changes

Last updated

Was this helpful?