githubEdit

Linearity measure applied to Iris

General definition

The model linearity module in alibi provides metric to measure how linear an ML model is. Linearity is defined based on how much the linear superposition of the model's outputs differs from the output of the same linear superposition of the inputs.

Given $N$ input vectors $v_i$, $N$ real coefficients $\alpha_i$ and a predict function $\text{M}(v_i)$, the linearity of the predict function is defined as

L=iαiM(vi)M(iαivi)If M is a regressorL = \Big|\Big|\sum_i \alpha_i M(v_i) - M\Big(\sum_i \alpha_i v_i\Big) \Big|\Big| \quad \quad \text{If M is a regressor}

L=iαilogM(vi)logM(iαivi)If M is a classifierL = \Big|\Big|\sum_i \alpha_i \log \circ M(v_i) - \log \circ M\Big(\sum_i \alpha_i v_i\Big)\Big|\Big| \quad \quad \text{If M is a classifier}

Note that a lower value of $L$ means that the model $M$ is more linear.

Alibi implementation

  • Based on the general definition above, alibi calculates the linearity of a model in the neighboorhood of a given instance $v_0$.

Iris Data set

  • As an example, we will visualize the decision boundaries and the values of the linearity measure for various classifier on the iris dataset. Only 2 features are included for visualization porpuses.

This example will use the xgboostarrow-up-right library, which can be installed with:

!pip install xgboost
import pandas as pd
import numpy as np
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt

from sklearn.datasets import load_iris

from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from xgboost import XGBClassifier

from itertools import product
from alibi.confidence import linearity_measure, LinearityMeasure

Dataset

Models

We will experiment with 5 different classifiers:

  • A logistic regression model, which is expected to be highly linear.

  • A random forest classifier, which is expected to be higly non-linear.

  • An xgboost classifier.

  • A support vector machine classifier.

  • A feed forward neural network

Decision boundaries and linearity

Logistic regression

png

Random forest

png

Xgboost

png

SVM

png

NN

png

Average linearity over the whole feature space

png

Last updated

Was this helpful?