Counterfactual explanations with one-hot encoded categorical variables
Real world machine learning applications often handle data with categorical variables. Explanation methods which rely on perturbations of the input features need to make sure those perturbations are meaningful and capture the underlying structure of the data. This becomes tricky for categorical features. For instance random perturbations across possible categories or enforcing a ranking between categories based on frequency of occurrence in the training data do not capture this structure. Our method captures the relation between categories of a variable numerically through the context given by the other features in the data and/or the predictions made by the model. First it captures the pairwise distances between categories and then applies multi-dimensional scaling. More details about the method can be found in the documentation. The example notebook illustrates this approach on the adult dataset, which contains a mixture of categorical and numerical features used to predict whether a person's income is above or below $50k.
Note
To enable support for CounterfactualProto, you may need to run
pip install alibi[tensorflow]import tensorflow as tf
tf.get_logger().setLevel(40) # suppress deprecation messages
tf.compat.v1.disable_v2_behavior() # disable TF2 behaviour as alibi code still relies on TF1 constructs
from tensorflow.keras.layers import Dense, Dropout, Input
from tensorflow.keras.models import Model
from tensorflow.keras.utils import to_categorical
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import os
from sklearn.preprocessing import OneHotEncoder
from time import time
from alibi.datasets import fetch_adult
from alibi.explainers import CounterfactualProto
from alibi.utils import ohe_to_ord, ord_to_ohe
print('TF version: ', tf.__version__)
print('Eager execution enabled: ', tf.executing_eagerly()) # FalseLoad adult dataset
The fetch_adult function returns a Bunch object containing the features, the targets, the feature names and a mapping of the categories in each categorical variable.
Define shuffled training and test set:
Reorganize data so categorical features come first:
Adjust feature_names and category_map as well:
Create a dictionary with as keys the categorical columns and values the number of categories for each variable in the dataset:
Since we will apply one-hot encoding (OHE) on the categorical variables, we convert cat_vars_ord from the ordinal to OHE format. alibi.utils.mapping contains utility functions to do this. The keys in cat_vars_ohe now represent the first column index for each one-hot encoded categorical variable. This dictionary will later be used in the counterfactual explanation.
Preprocess data
Scale numerical features between -1 and 1:
Apply OHE to categorical variables:
Combine numerical and categorical data:
Train neural net
Generate counterfactual
Original instance:
Initialize counterfactual parameters. The feature perturbations are applied in the numerical feature space, after transforming the categorical variables to numerical features. As a result, the dimensionality and values of feature_range are defined in the numerical space.
Initialize explainer:
Fit explainer. d_type refers to the distance metric used to convert the categorical to numerical values. Valid options are abdm, mvdm and abdm-mvdm. abdm infers the distance between categories of the same variable from the context provided by the other variables. This requires binning of the numerical features as well. mvdm computes the distance using the model predictions, and abdm-mvdm combines both methods. More info on both distance measures can be found in the documentation.
We can now visualize the transformation from the categorical to numerical values for each category. The example below shows that the Education feature is ordered from High School Dropout to having obtained a Doctorate degree. As a result, if we perturb an instance representing a person that has obtained a Bachelors degree, the nearest perturbations will result in a counterfactual instance with either a Masters or an Associates degree.

Explain instance:
Helper function to more clearly describe explanations:
By obtaining a higher level of education the income is predicted to be above $50k.
Change the categorical distance metric
Instead of abdm, we now use mvdm as our distance metric.
The same conclusion hold using a different distance metric.
Use k-d trees to build prototypes
We can also use k-d trees to build class prototypes to guide the counterfactual to nearby instances in the counterfactual class as described in Interpretable Counterfactual Explanations Guided by Prototypes.
Initialize, fit and explain instance:
By slightly increasing the age of the person the income would be predicted to be above $50k.
Use an autoencoder to build prototypes
Another option is to use an autoencoder to guide the perturbed instance to the counterfactual class. We define and train the autoencoder:
Weights for the autoencoder and prototype loss terms:
Initialize, fit and explain instance:
Black box model with k-d trees
Now we assume that we only have access to the model's prediction function and treat it as a black box. The k-d trees are again used to define the prototypes.
Initialize, fit and explain instance:
If the person was younger and worked less, he or she would have a predicted income below $50k.
Last updated
Was this helpful?

