# Help and Support

> Identify and debug issues within the Seldon ecosystem

If the information in this guide does not help you to resolve your issue, you can contact the Seldon team through your customer representative. If you do so, provide as much detail as you can about your particular installation of Seldon Enterprise Platform.

The **About** page, in the top-right corner of the UI, displays the Seldon Enterprise Platform version details, license details, and your browser version. For more details on your specific browser, visit <https://www.whatsmybrowser.org/> and share the URL that can provide us with more information about your browser usage, like resolution and other support needed.

![AboutPage](/files/qOdieHmQmQZ8kdZFVoz2)

## Browser Requirements

Seldon Enterprise Platform supports all modern browsers, including Chrome, Firefox, Safari, and Microsoft Edge. We recommend using the latest browser version available to your operating system. See your browser’s documentation to learn more about checking and updating your version.

Here are some recommendations for using the Seldon Enterprise Platform UI effectively:

* Cookies - cookies must be enabled in your browser, per our Cookie Policy. Blocking cookies will interfere with your interactions with the Enterprise Platform UI.
* JavaScript - JavaScript must be enabled to interact with the Enterprise Platform UI. Avoid programs that block JavaScript, like anti-virus software, or add exceptions for Seldon Enterprise Platform.
* Browser add-ons or extensions - browser add-ons might interfere with your use of the Enterprise Platform UI. While disabling them isn't always required, we may ask you to disable them when helping you troubleshoot.
* Browser window sizes - your computer's screen size determines the maximum browser window resolution. For the best experience, use a browser window at least 1280 pixels wide and 768 pixels tall.

## Errors from Seldon Enterprise Platform

It is often the case that errors in Seldon Enterprise Platform are caused by faults or misconfigurations in other parts of the system. However, it is still useful to check Seldon Enterprise Platform's understanding of the problem, as it may offer insights into what is happening.

If you experience Seldon Enterprise Platform crashing or returning an error, the best first steps are to:

1. Turn on the network tab in your browser's Developer Tools, ensure the session is being recorded, and recreate the issue. Find the failed call(s) and inspect the full message(s).
2. Find the Seldon Enterprise Platform pod (usually in the `seldon-system` namespace) and inspect its logs. You can filter for messages at level `warn` or `error`.

This should help to determine if the cause is within Seldon Enterprise Platform or another component. If the issue is with Seldon Enterprise Platform itself, report the bug directly to Seldon. Likewise, if the issue appears to be with Seldon Core, either report the bug or [raise an issue on GitHub](https://github.com/SeldonIO/seldon-core/issues). Otherwise, check the below sections or refer to the documentation and information available online for that component. If you still cannot resolve the issue, it can be reported to Seldon.

## Auth

See the [auth section](/seldon-enterprise-platform/production-environment/auth.md#debugging) for debugging tips.

## Knative

See the [Knative install section](/seldon-enterprise-platform/production-environment/knative.md) for how to verify Knative.

## Argo and Batch

See the [Argo section](/seldon-enterprise-platform/production-environment/argo-workflows.md) for debugging batch and the [MinIO section](/seldon-enterprise-platform/production-environment/minio.md) for MinIO.

## Prometheus

See the [Monitoring and Alerting](/seldon-enterprise-platform/production-environment/observability-alerting.md) for debugging Prometheus.

## Serving Engines

For Seldon Core debugging, it is best to see their respective docs.

In our demos we load models from google storage buckets. In the model logs we sometimes see this:

```
Compute Engine Metadata server unavailable onattempt
```

This is a [known Google Storage issue](https://github.com/googleapis/google-cloud-python/issues/2995) but does not cause failures. Treat this as a warning.

## Request Logs Entries Missing

Sometimes requests fail to appear in the request logs. Often this is a problem with the request logger setup. If so see the [request logging docs](/seldon-enterprise-platform/production-environment/request-logging.md).

Sometimes we see this error in the request logger logs:

```
RequestError(400, 'illegal_argument_exception', 'mapper [request.instance] cannot be changed from type [long] to [float]')
```

What happens here is Elasticsearch has inferred the type of the fields in the request for the model's index. This is inferred on the first request and if it changes or is inferred incorrectly this has to be addressed manually.

The best thing to do here is to delete the index.

First port-forward Elasticsearch. If, for example, Open Distro Elasticsearch is used then this is:

```bash
kubectl port-forward -n seldon-logs svc/opensearch-cluster-master 9200
```

To delete the index we need to know its name. The pattern is:

```
inference-log-<seldon/kfserving>-<namespace>-<modelname>-<endpoint>-<modelid>
```

Usually `endpoint` is `default` unless there's a canary, and `modelid` is usually `<modelname>-container` if created via the Seldon Enterprise Platform UI.

You may use the [Elasticsearch API](https://www.elastic.co/guide/en/elasticsearch/reference/6.8/cat-indices.html) to list the available indices in Elasticsearch with:

```bash
curl -k --request GET 'https://localhost:9200/_cat/indices/inference-log-*'
```

which should return something like:

```
yellow open inference-log-seldon-seldon-income-default-income-container                       xWEBE-xXQA65cEIitp6SHw 1 1 9646    0   3.4mb   3.4mb
yellow open inference-log-seldon-seldon-income-classifier-default-income-classifier-container wUeX7fiISpSIGFCoHI05Cg 1 1   61   60 151.1kb 151.1kb
yellow open inference-log-seldon-seldon-batch-demo-default-batch-demo-container               3oYtAwL1Q-2Gi3og91iGdg 1 1 1848 1093   1.3mb   1.3mb
```

Then delete the index with a curl command. If the auth is `admin/admin` and there's a `cifar10` seldon model in a namespace also called `seldon` then it's:

```
curl -k -v -XDELETE https://admin:admin@localhost:9200/inference-log-seldon-seldon-cifar10-default
```

{% hint style="info" %}
**Note**: Optionally, for easier debugging, you may also install [Kibana](/seldon-enterprise-platform/production-environment/elasticsearch.md#kibana-optional) or its equivalent [OpenSearch Dashboards](/seldon-enterprise-platform/production-environment/opensearch.md) to visualize and inspect indices in Elasticsearch. You may wish to expose these via ingress routing rules, such as Istio `VirtualServices` or standard Kubernetes `Ingress` resources.
{% endhint %}

## Insufficient ephemeral storage in EKS clusters

When using `eksctl`, the volume size for each node will be of 20Gb by default. However, with large images this may not be enough. This is discussed at length on [this thread](https://github.com/eksctl-io/eksctl/issues/780) in the `eksctl` repository.

When this happens, pods usually start to get evicted. If you run `kubectl describe` on any of these pods, you should be able to see errors about `not enough ephemeral storage`. You should also be able to see some `DiskPressure` events on the output of `kubectl describe nodes`.

To fix it, it should be enough to increase the available space. With `eksctl`, you can do so by tweaking the `nodeGroups` config and adding a `volumeSize` and `volumeType` keys. For instance, to change the volume to 100Gb you could do the following in your `ClusterConfig` spec:

```yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

...

nodeGroups:
  - volumeSize: 100
    volumeType: gp2
    ...
```

## Elastic Queue Capacity

If request logging is used with a high throughput then it's possible to hit a `rejected execution of processing` error in the logger. This comes with a `queue capacity` message. To address this the `thread_pool.write.queue_size` needs to be increased. For example, with the Elasticsearch Helm chart this could be:

```
esConfig:
  elasticsearch.yml: |
    thread_pool.write.queue_size: 2000
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.seldon.ai/seldon-enterprise-platform/help-and-support.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
