# Prompting

The Prompt Runtime enables the deployment of models specifically designed for generating prompts. It processes input tensors, compiles prompts using Jinja templates, and forwards these prompts to a local language model (LLM) for completion. While it is generally more efficient to compile prompts directly within the model deployment, this approach can be restrictive when the same model needs to be reused across different tasks or within a pipeline. Redeploying models frequently is impractical, particularly given the significant resource demands of large language models. The Prompt Runtime provides a flexible solution by allowing the reuse of the same local LLM with minimal additional overhead, requiring only one extra inference request. The Prompt Runtime is intended to complement the Local Runtime in the following manner:

* A local LLM is deployed using the default chat template included in its config.json (see [HuggingFace Models](https://huggingface.co/models)).
* Multiple prompts can then be deployed by referencing the desired LLM for completion in the model-settings.json.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.seldon.ai/llm-module/components/prompting.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
