What is the correct way to obtain explanations for predictions using Shap?

2024/9/19 9:43:49

I'm new to using shap, so I'm still trying to get my head around it. Basically, I have a simple sklearn.ensemble.RandomForestClassifier fit using model.fit(X_train,y_train), and so on. After training, I'd like to obtain the Shap values to explain predictions on unseen data. Based on the docs and other tutorials, this seems to be the way to go:

explainer = shap.Explainer(model.predict, X_train)
shap_values = explainer.shap_values(X_test)

However, this takes a long time to run (about 18 hours for my data). If I replace the model.predict with just model in the first line, i.e:

explainer = shap.Explainer(model, X_train)
shap_values = explainer.shap_values(X_test)

It significantly reduces the runtime (down to about 40 minutes). So that leaves me to wonder what I'm actually getting in the second case?

To reiterate, I just want to be able to explain new predictions, and it seems strange to me that it would be this expensive - so I'm sure I'm doing something wrong.

Answer

I think your question already contains a hint:

explainer = shap.Explainer(model.predict, X_train)
shap_values = explainer.shap_values(X_test)

is expensive and most probably is a kind of an exact algo to calculate Shapely values out of a function.

explainer = shap.Explainer(model, X_train)
shap_values = explainer.shap_values(X_test)

averages readily available predictions from trained model.

To prove the first claim (second is the fact of the matter) let's study source code for Explainer class.

Class definition:

class Explainer(Serializable):""" Uses Shapley values to explain any machine learning model or python function.This is the primary explainer interface for the SHAP library. It takes any combinationof a model and masker and returns a callable subclass object that implementsthe particular estimation algorithm that was chosen."""def __init__(self, model, masker=None, link=links.identity, algorithm="auto", output_names=None, feature_names=None, linearize_link=True,seed=None, **kwargs):""" Build a new explainer for the passed model.Parameters----------model : object or functionUser supplied function or model object that takes a dataset of samples andcomputes the output of the model for those samples.

So, now you know one can provide either a model or a function as the first argument.

In case Pandas is supplied as masker:

        if safe_isinstance(masker, "pandas.core.frame.DataFrame") or \((safe_isinstance(masker, "numpy.ndarray") or sp.sparse.issparse(masker)) and len(masker.shape) == 2):if algorithm == "partition":self.masker = maskers.Partition(masker)else:self.masker = maskers.Independent(masker)

Finally, if callable is supplied:

                elif callable(self.model):if issubclass(type(self.masker), maskers.Independent):if self.masker.shape[1] <= 10:algorithm = "exact"else:algorithm = "permutation"

Hopefully, you see now why the first one is an exact one (and thus takes long to calculate).

Now to your question(s):

What is the correct way to obtain explanations for predictions using Shap?

and

So that leaves me to wonder what I'm actually getting in the second case?

If you have a model (tree, linear, whatever) which is supported by SHAP use:

explainer = shap.Explainer(model, X_train)
shap_values = explainer.shap_values(X_test)

These are SHAP values extracted from a model and this is why SHAP came into existence.

If it's not supported, use 1st one.

Both should give similar results.

https://en.xdnf.cn/q/72861.html

Related Q&A

value error when using numpy.savetxt

I want to save each numpy array (A,B, and C) as column in a text file, delimited by space:import numpy as npA = np.array([5,7,8912,44])B = np.array([5.7,7.45,8912.43,44.99])C = np.array([15.7,17.45,189…

How do I retrieve key from value in django choices field?

The sample code is below:REFUND_STATUS = ((S, SUCCESS),(F, FAIL) ) refund_status = models.CharField(max_length=3, choices=REFUND_STATUS)I know in the model I can retrieve the SUCCESS with method get_re…

Errno13, Permission denied when trying to read file

I have created a small python script. With that I am trying to read a txt file but my access is denied resolving to an no.13 error, here is my code:import time import osdestPath = C:\Users\PC\Desktop\N…

How to write large JSON data?

I have been trying to write large amount (>800mb) of data to JSON file; I did some fair amount of trial and error to get this code:def write_to_cube(data):with open(test.json) as file1:temp_data = j…

Using absolute_import and handling relative module name conflicts in python [duplicate]

This question already has answers here:How can I import from the standard library, when my project has a module with the same name? (How can I control where Python looks for modules?)(7 answers)Close…

Setting results of torch.gather(...) calls

I have a 2D pytorch tensor of shape n by m. I want to index the second dimension using a list of indices (which could be done with torch.gather) then then also set new values to the result of the index…

Why does PyCharm use double backslash to indicate escaping?

For instance, I write a normal string and another "abnormal" string like this:Now I debug it, finding that in the debug tool, the "abnormal" string will be shown like this:Heres the…

Execute Shell Script from Python with multiple pipes

I want to execute the following Shell Command in a python script:dom=myserver cat /etc/xen/$myserver.cfg | grep limited | cut -d= -f2 | tr -d \"I have this:dom = myserverlimit = subprocess.cal…

Python: Is there a way to split a string of numbers into every 3rd number?

For example, if I have a string a=123456789876567543 could i have a list like...123 456 789 876 567 543

How to override the __dir__ method for a class?

I want to change the dir() output for my class. Normally, for all other objects, its done by defining own __dir__ method in their class. But if I do this for my class, its not called.class X(object):de…