We will assume that ambassador (or Istio) ingress is port-forwarded to localhost:8003
!kubectl create namespace cifar10 || true
namespace/cifar10 created
Setup MinIO
Use the provided notebook to install Minio in your cluster.
We will assume that MinIO service is port-forwarded to localhost:8090
Poetry
We will use poetry.lock to fully define the explainer environment. Install poetry following official documentation. Usually this goes down to
Train Outlier Detector
Prepare Training Environment
We are going to use pyproject.toml and poetry.lock files from Alibi Detect Server. This will allow us to create environment that will match the runtime one.
Currently, the server's pyproject.toml is structured in the way that it uses a locally present source code of seldon-core.
Please, make sure that you obtain the source code that match the version of used alibi-detect-server.
Prepare Training Script
Deploy Cifar10 model and Outlier Detector
Note, this requires Knative. Follow Knative documentation to install it.
Deploy Event Display
Deploy Model
Create Knative Broker, Trigger and Kservice
Test it!
In a terminal follow logs of the event-display deployment with for example
Now we were send two requests, one containing a normal image and one outlier.
Note: it may take a moment for the kservice to become available
%%writefile train.py
import logging
import os
import tensorflow as tf
import numpy as np
from tensorflow.keras.layers import Conv2D, Conv2DTranspose, Dense
from tensorflow.keras.layers import Flatten, Layer, Reshape, InputLayer
from tensorflow.keras.regularizers import l1
from alibi_detect.od import OutlierVAE
from alibi_detect.utils.fetching import fetch_detector
from alibi_detect.utils.perturbation import apply_mask
from alibi_detect.utils.saving import save_detector, load_detector
logger = tf.get_logger()
logger.setLevel(logging.ERROR)
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255
y_train = y_train.astype('int64').reshape(-1,)
y_test = y_test.astype('int64').reshape(-1,)
print('Train: ', X_train.shape, y_train.shape)
print('Test: ', X_test.shape, y_test.shape)
detector_type = 'outlier'
dataset = 'cifar10'
detector_name = 'OutlierVAE'
# define encoder and decoder networks
latent_dim = 1024
encoder_net = tf.keras.Sequential(
[
InputLayer(input_shape=(32, 32, 3)),
Conv2D(64, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2D(128, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2D(512, 4, strides=2, padding='same', activation=tf.nn.relu)
]
)
decoder_net = tf.keras.Sequential(
[
InputLayer(input_shape=(latent_dim,)),
Dense(4*4*128),
Reshape(target_shape=(4, 4, 128)),
Conv2DTranspose(256, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2DTranspose(64, 4, strides=2, padding='same', activation=tf.nn.relu),
Conv2DTranspose(3, 4, strides=2, padding='same', activation='sigmoid')
]
)
# initialize outlier detector
od = OutlierVAE(
threshold=.015, # threshold for outlier score
encoder_net=encoder_net, # can also pass VAE model instead
decoder_net=decoder_net, # of separate encoder and decoder
latent_dim=latent_dim
)
# train
od.fit(X_train, epochs=50, verbose=True)
# save the trained outlier detector
save_detector(od, './outlier-detector')
Overwriting train.py
%%time
!./venv/bin/python3 train.py
2022-04-14 13:41:54.199577: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-14 13:41:54.217047: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-14 13:41:54.217420: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-14 13:41:54.217938: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-04-14 13:41:54.218807: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-14 13:41:54.219165: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-14 13:41:54.219504: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-14 13:41:54.848889: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-14 13:41:54.849320: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-14 13:41:54.849643: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-14 13:41:54.849923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2644 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1650, pci bus id: 0000:01:00.0, compute capability: 7.5
Importing matplotlib failed. Plotting will not work.
Importing plotly failed. Interactive plots will not work.
Train: (50000, 32, 32, 3) (50000,)
Test: (10000, 32, 32, 3) (10000,)
2022-04-14 13:41:58.818151: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 614400000 exceeds 10% of free system memory.
2022-04-14 13:41:59.371330: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 614400000 exceeds 10% of free system memory.
2022-04-14 13:41:59.782408: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 614400000 exceeds 10% of free system memory.
2022-04-14 13:42:01.148966: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8101
782/782 [=] - 55s 64ms/step - loss_ma: 8927.7510
2022-04-14 13:42:54.773587: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 614400000 exceeds 10% of free system memory.
782/782 [=] - 99s 126ms/step - loss_ma: -2284.2741
2022-04-14 13:44:34.044742: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 614400000 exceeds 10% of free system memory.
782/782 [=] - 65s 82ms/step - loss_ma: -3521.0513
782/782 [=] - 69s 87ms/step - loss_ma: -4055.0235
782/782 [=] - 66s 83ms/step - loss_ma: -4369.6132
782/782 [=] - 65s 82ms/step - loss_ma: -4575.9023
782/782 [=] - 68s 86ms/step - loss_ma: -4773.8706
782/782 [=] - 67s 84ms/step - loss_ma: -4934.9222
782/782 [=] - 71s 90ms/step - loss_ma: -5055.4330
782/782 [=] - 59s 75ms/step - loss_ma: -5155.0893
782/782 [=] - 56s 71ms/step - loss_ma: -5198.4920
782/782 [=] - 57s 72ms/step - loss_ma: -5314.9975
782/782 [=] - 57s 72ms/step - loss_ma: -5366.3326
782/782 [=] - 57s 71ms/step - loss_ma: -5434.8929
782/782 [=] - 57s 72ms/step - loss_ma: -5468.0532
782/782 [=] - 56s 71ms/step - loss_ma: -5504.1972
782/782 [=] - 61s 76ms/step - loss_ma: -5549.6413
782/782 [=] - 73s 93ms/step - loss_ma: -5579.2789
782/782 [=] - 65s 82ms/step - loss_ma: -5593.0464
782/782 [=] - 65s 82ms/step - loss_ma: -5639.0633
782/782 [=] - 65s 82ms/step - loss_ma: -5658.0385
782/782 [=] - 67s 85ms/step - loss_ma: -5656.6797
782/782 [=] - 67s 85ms/step - loss_ma: -5701.8011
782/782 [=] - 63s 80ms/step - loss_ma: -5723.2239
782/782 [=] - 66s 83ms/step - loss_ma: -5740.9575
782/782 [=] - 65s 82ms/step - loss_ma: -5758.7366
782/782 [=] - 65s 82ms/step - loss_ma: -5781.2925
782/782 [=] - 67s 85ms/step - loss_ma: -5796.3319
782/782 [=] - 63s 80ms/step - loss_ma: -5815.1920
782/782 [=] - 64s 81ms/step - loss_ma: -5830.7356
782/782 [=] - 66s 84ms/step - loss_ma: -5842.1293
782/782 [=] - 63s 79ms/step - loss_ma: -5847.8182
782/782 [=] - 64s 81ms/step - loss_ma: -5866.5971
782/782 [=] - 69s 87ms/step - loss_ma: -5878.1151
782/782 [=] - 70s 89ms/step - loss_ma: -5893.1399
782/782 [=] - 65s 83ms/step - loss_ma: -5893.4249
782/782 [=] - 70s 88ms/step - loss_ma: -5909.6713
782/782 [=] - 58s 73ms/step - loss_ma: -5916.4036
782/782 [=] - 58s 74ms/step - loss_ma: -5921.7595
782/782 [=] - 58s 73ms/step - loss_ma: -5924.8622
782/782 [=] - 58s 73ms/step - loss_ma: -5935.0705
782/782 [=] - 58s 74ms/step - loss_ma: -5943.9454
782/782 [=] - 59s 75ms/step - loss_ma: -5948.6081
782/782 [=] - 58s 74ms/step - loss_ma: -5960.5511
782/782 [=] - 59s 75ms/step - loss_ma: -5970.2687
782/782 [=] - 59s 74ms/step - loss_ma: -5970.9040
782/782 [=] - 59s 75ms/step - loss_ma: -5980.7978
782/782 [=] - 58s 74ms/step - loss_ma: -5980.5145
782/782 [=] - 58s 73ms/step - loss_ma: -5986.5828
782/782 [=] - 54s 68ms/step - loss_ma: -5989.7350
CPU times: user 2min 14s, sys: 1min 2s, total: 3min 17s
Wall time: 52min 52s
trigger.eventing.knative.dev/vaeoutlier-trigger created
kubectl logs event-display-7f5f8647fb-t227z -f
%%bash
deployment=$(kubectl get deploy -n cifar10 -l seldon-deployment-id=cifar10 -o jsonpath='{.items[0].metadata.name}')
kubectl rollout status deploy/${deployment} -n cifar10
deployment "cifar10-default-0-cifar10-container" successfully rolled out
!kubectl get broker,trigger,kservice -n cifar10
NAME URL AGE READY REASON
broker.eventing.knative.dev/default http://broker-ingress.knative-eventing.svc.cluster.local/cifar10/default 5m True
NAME BROKER SUBSCRIBER_URI AGE READY REASON
trigger.eventing.knative.dev/vaeoutlier-trigger default http://vae-outlier.cifar10.svc.cluster.local 4m59s True
NAME URL LATESTCREATED LATESTREADY READY REASON
service.serving.knative.dev/vae-outlier http://vae-outlier.cifar10.example.com vae-outlier-00001 vae-outlier-00001 True
import json
import matplotlib.pyplot as plt
with open("images/cifar10_image.json") as f:
data = json.load(f)
plt.imshow(data["instances"][0])
plt.show()
with open("images/outlier_image.json") as f:
data = json.load(f)
plt.imshow(data["instances"][0])
plt.show();
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).