githubEdit

Integrated gradients for transformers models

In this example, we apply the integrated gradients method to two different sentiment analysis models. The first one is a pretrained sentiment analysis model from the transformersarrow-up-right library. The second model is a combination of a pretrained (distil)BERT model and a simple feed forward network. The entire model, (distil)BERT and feed forward network, is trained on the IMDB reviews dataset.

In text classification models, integrated gradients (IG) define an attribution value for each word in the input sentence. The attributions are calculated considering the integral of the model gradients with respect to the word embedding layer along a straight path from a baseline instance $x^\prime$ to the input instance $x.$ A description of the method can be found herearrow-up-right. Integrated gradients was originally proposed in Sundararajan et al., "Axiomatic Attribution for Deep Networks"arrow-up-right

Note

To enable support for IntegratedGradients, you may need to run

pip install alibi[tensorflow]
import re
import os
import numpy as np
import matplotlib as mpl
import matplotlib.cm

from tqdm import tqdm
from typing import Optional, Union, List, Dict, Tuple
from IPython.display import HTML

import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.datasets import imdb
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import SparseCategoricalCrossentropy

from transformers.optimization_tf import WarmUp
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer, PreTrainedTokenizer

from alibi.explainers import IntegratedGradients

Here we define some functions needed to process the data and visualize. For consistency with other text examplesarrow-up-right in alibi, we will use the IMDB reviews dataset provided by Keras. Since the dataset consists of reviews that are already tokenized, we need to decode each sentence and re-convert them into tokens using the (distil)BERT tokenizer.

Automodel

In this section, we will use the Tensorflow auto model for sequence classification provided by the transformersarrow-up-right library.

The model is pretrained on the Stanford Sentiment Treebank (SST)arrow-up-right dataset. The Stanford Sentiment Treebank is the first corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language.

Each phrase is labeled as either negative, somewhat negative, neutral, somewhat positive or positive. The corpus with all 5 labels is referred to as SST-5 or SST fine-grained. Binary classification experiments on full sentences (negative or somewhat negative vs somewhat positive or positive with neutral sentences discarded) refer to the dataset as SST-2 or SST binary. In this example, we will use a text classifier pretrained on the SST-2 dataset.

The auto_model_distilbert output is a custom object containing the output logits. We use a wrapper to transform the output into a tensor and apply a softmax function to the logits.

Calculate integrated gradients

The auto model consists of a main distilBERT layer (layer 0) followed by two dense layers.

We will proceed with the embedding layer from distilBERT. We calculate attributions to the outputs of the embedding layer for which we can easily construct an appropriate baseline for the IG by replacing the regular tokens with the [PAD] token (i.e. a neutral token) and keeping the other special tokens (e.g. [CLS], [SEP], [UNK], [PAD]). By including special tokens such as [CLS], [SEP], [UNK], we ensure that the attribution for those tokens will be 0 if we use the embedding layer. The 0 attribution is due to integration between $[x, x]$ which is 0. Note that if we considered a hidden layer instead, we would inevitably capture higher order interaction between the input tokens. Moreover, the embedding layer is our first choice since we cannot compute attributions for the raw input due to its discrete structure (i.e., we cannot differentiate the output of the model with respect to the discrete input representation). That being said, you can use any other layer and compute attributions to the outputs of it instead.

Here we consider some simple sentences such as "I love you, I like you", "I love you, I like you, but I also kind of dislike you" .

Let's check the attributions' shapes.

As you can see, the attribution of each token corresponds to a tensor of 768 elements. We compress all this information into a single number buy summing up all 768 components. The nice thing about this is that we still remain consistent with the Completeness Axiom, which states that the attributions add up to the difference between the output of our model for the given instance and the output of our model for the given baseline.

[CLS] i love you , i like you , but i also kind of dislike you [SEP]

Note that since the sentence is classified as negative, words like dislike contribute positively to the score while words like love contribute negatively.

Sentiment analysis on IMDB with fine-tuned model head.

Load and process data

Load model and corresponding tokenizer

Now we have to load the model and the corresponding tokenizer. You can chose between the BERT model or the distilBERT model. Note that we will be finetuning those models which will require access to a GPU. In our experiments, we trained distilBERT on a single Quadro RTX 5000 which requires around 5GB of memory. The entire training took around 5-6 min. We recommend using distilBERT as it is lighter and we did not noticed a big difference in performance between the two models after finetuning.

Decoding each sentence in the Keras IMDB tokenized dataset to obtain the corresponding plain text. The dataset is already in a pretty good shape, so we don't need to do extra preprocessing. The only thing that we do is to replace the unknown tokens with the appropriate tokenizer's unknown token (i.e., [UNK])

Retokenizing the plain text using the (distil)BERT tokenizer.

Construct the Tensorflow datasets for training and testing.

Train model

Here we train a classification model by leveraging the pretrained (distil)BERT transformer.

Calculate integrated gradients

We pick the first 10 sentences from the test set as examples. You can easily add some of your text here too, as we exemplify it.

We calculate the attributions with respect to the first embedding layer of the (distil)BERT. You can choose any other layer.

Check attributions for our example

[CLS] best movie i ' ve ever seen nothing bad to say about it [SEP]

Check attribution for some test examples

[CLS] please give this one a miss br br [UNK] [UNK] and the rest of the cast rendered terrible performances the show is flat flat flat br br i don ' t know how michael madison could have allowed this one on his plate he almost seemed to know this wasn ' t going to work out and his performance was quite [UNK] so all you madison fans give this a miss [SEP]

[CLS] this film requires a lot of patience because it focuses on mood and character development the plot is very simple and many of the scenes take place on the same set in frances [UNK] the sandy dennis character apartment but the film builds to a disturbing climax br br the characters create an atmosphere [UNK] with sexual tension and psychological [UNK] it ' s very interesting that robert alt ##man directed this considering the style and structure of his other films still the trademark alt ##man audio style is evident here and there i think what really makes this film work is the brilliant performance by sandy dennis it ' s definitely one of her darker characters but she plays it so perfectly and convincing ##ly that it ' s scary michael burns does a good job as the mute young man regular alt ##man player michael murphy has a small part the [UNK] moody set fits the content of the story very well in short this movie is a powerful study of loneliness sexual [UNK] and desperation be patient [UNK] up the atmosphere and pay attention to the wonderful ##ly written script br br i praise robert alt ##man this is one of his many films that deals with unconventional fascinating subject matter this film is disturbing but it ' s sincere and it ' s sure to [UNK] a strong emotional response from the viewer if you want to see an unusual film some might even say bizarre this is worth the [SEP]

Last updated

Was this helpful?