githubEdit

Integrated gradients for MNIST

In this notebook we apply the integrated gradients method to a convolutional network trained on the MNIST dataset. Integrated gradients defines an attribution value for each feature of the input instance (in this case for each pixel in the image) by integrating the model's gradients with respect to the input along a straight path from a baseline instance $x^\prime$ to the input instance $x.$

A more detailed description of the method can be found herearrow-up-right. Integrated gradients was originally proposed in Sundararajan et al., "Axiomatic Attribution for Deep Networks"arrow-up-right.

Note

To enable support for IntegratedGradients, you may need to run

pip install alibi[tensorflow]
import numpy as np
import os
import tensorflow as tf
from tensorflow.keras.layers import Activation, Conv2D, Dense, Dropout
from tensorflow.keras.layers import Flatten, Input, Reshape, MaxPooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.utils import to_categorical
from alibi.explainers import IntegratedGradients
import matplotlib.pyplot as plt
print('TF version: ', tf.__version__)
print('Eager execution enabled: ', tf.executing_eagerly()) # True
TF version:  2.5.0
Eager execution enabled:  True

Load data

Loading and preparing the MNIST data set.

Train model

Train a convolutional neural network on the MNIST dataset. The model includes 2 convolutional layers and it reaches a test accuracy of 0.98. If save_model = True, a local folder ./model_mnist will be created and the trained model will be saved in that folder. If the model was previously saved, it can be loaded by setting load_mnist_model = True.

Calculate integrated gradients

The IntegratedGradients class implements the integrated gradients attribution method. A description of the method can be found herearrow-up-right.

In the following example, the baselines (i.e. the starting points of the path integral) are black images (all pixel values are set to zero). This means that black areas of the image will always have zero attribution. The path integral is defined as a straight line from the baseline to the input image. The path is approximated by choosing 50 discrete steps according to the Gauss-Legendre method.

Visualize attributions

Sample images from the test dataset and their attributions.

  • The first column shows the original image.

  • The second column shows the values of the attributions.

  • The third column shows the positive valued attributions.

  • The fourth column shows the negative valued attributions.

The attributions are calculated using the black image as a baseline for all samples.

png

Last updated

Was this helpful?