# Embeddings

The local embeddings runtime offers a versatile interface for deploying various embedding models through multiple backend systems. Initially supporting SentenceTransformers, this runtime is particularly optimized for those looking to run their own large language models (LLMs) on customizable serving infrastructures. This flexibility allows developers to tailor their deployment environment to meet specific performance, scaling, and integration requirements.

This runtime can be employed as a standalone solution for embedding generation and retrieval, providing a robust foundation for various applications. Beyond its standalone capabilities, it integrates seamlessly into more sophisticated systems. For instance, it plays a crucial role in applications such as Retrieval-Augmented Generation (RAG), where embedding models help enhance the retrieval process to generate more accurate and contextually relevant responses. Another advanced application is semantic caching, where embeddings are used to store and retrieve data based on semantic similarity, thus improving query response times and system efficiency. The local embeddings runtime can also support other use cases like document similarity search, recommendation systems, etc.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.seldon.ai/llm-module/components/embeddings.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
