# Core 2 Configuration

When tuning performance for pipelines, reducing the overhead of Core 2 components responsible for data-processing within a pipeline is another aspect to consider. In Core 2, four components influence this overhead:

* `pipelinegateway` which handles pipeline requests
* `modelgateway` which sends requests to model inference servers
* `dataflow-engine` runs Kafka KStream topologies to manage data streamed between models in a pipeline
* the Kafka cluster

Given that Core 2 uses Kafka as the messaging system to communicate data between models, lowering the network latency between Core 2 and Kafka (especially when the Kafka installation is in a separate cluster to Core) will improve pipeline performance.

Additionally, the number of Kafka partitions per topic (which must be fixed across all models in a pipeline) significantly influences

1. Kafka’s maximum throughput, and
2. the effective number of replicas for `pipelinegateway`, `dataflow-engine` and `modelgateway`

As a baseline for serving a high inferencing RPS across multiple pipelines, we recommend using as many replicas of `pipelinegateway` and `dataflow-engine` as you have Kafka topic partitions in order to leverage the balanced distribution of inference traffic. In this case, each `dataflow-engine` will process the data from one partition, across all pipelines. Increasing the number of dataflow-engine replicas further starts sharding the pipelines across the available replicas, with each pipeline being processed by `maxShardCountMultiplier` replicas (see detailed pipeline scalability docs for configuration details)

Similarly, `modelgateway` can handle more throughput if its number of workers and number of replicas is increased. `modelgateway` has two scalability parameters that can be set via environment variables:

* `MODELGATEWAY_NUM_WORKERS`
* `MODELGATEWAY_MAX_NUM_CONSUMERS`

Each model within a Kubernetes namespace is consistently assigned to one modelgateway consumer (based on their index in a hash table of size `MODELGATEWAY_MAX_NUM_CONSUMERS`); The size of the hash table influences how many models will share the same consumer.

For each consumer, a `MODELGATEWAY_NUM_WORKERS` number of lightweight inference workers (goroutines) are created to forward requests to the inference servers and wait for responses.

Increasing these parameters (starting with an increase in the number of workers) will improve throughput if the `modelgateway` pod has enough resources to support more workers and consumers.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.seldon.ai/seldon-core-2/user-guide/performance-tuning/pipelines/core-2-configuration.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
