Efficient way to generate Lime explanations for full dataset

2024/5/20 7:31:26

Am working on a binary classification problem with 1000 rows and 15 features.

Currently am using Lime to explain the predictions of each instance.

I use the below code to generate explanations for full test dataframe

test_indx_list = X_test.index.tolist()
test_dict={}
for n in test_indx_list:exp = explainer.explain_instance(X_test.loc[n].values, model.predict_proba, num_features=5)a=exp.as_list()test_dict[n] = a

But this is not efficient. Is there any alternative approach to generate explanation/ get feature contributions quicker?

Answer

From what the docs show, there isn't currently an option to do batch explain_instance, although there are plans for it. This should help a lot with speed on newer versions later on.

What seems to be the most appropriate change to get better speed is decreasing the number of samples used to learn the linear model.

explainer.explain_instance(... num_features=5, num_samples=2500)

The default value for num_samples is 5000, which can be much more than you need depending on your model, and is currently the argument that will most affect the speed of the explainer.

Another approach would be to try adding parallelization to the snippet. It's a more complex solution where you run multiple instances of the snippet at the same time, and gather the results at the end. For that, I leave a link, but really it's not something I can give a snippet right out of the box.

https://en.xdnf.cn/q/72717.html

Related Q&A

how to handle javascript alerts in selenium using python

So I there is this button I want to click and if its the first time youve clicked it. A javascript alert popup will appear. Ive been using firebug and just cant find where that javascript is located an…

testing.postgresql command not found: initdb inside docker

Hi im trying to make a unittest with postgresql database that use sqlalchemy and alembicAlso im running it on docker postgresqlIm following the docs of testing.postgresql(docs) to set up a temporary po…

Recommended approach for loading CouchDB design documents in Python?

Im very new to couch, but Im trying to use it on a new Python project, and Id like to use python to write the design documents (views), also. Ive already configured Couch to use the couchpy view server…

Error when import matplotlib.pyplot as plt

I did not have any problem to use "plt", but it suddenly shows an error message and does not work, when I import it. Please see the below. >>> import matplotlib >>> import m…

Python NtQueryDirectoryFile (File information structure)

Ive written a simple (test) script to list files in a selected directory. Not using FindFirstFile; only native API. When I execute the script and watch, Win32API monitor tells me STATUS_SUCCESS. My Fil…

returning A DNS record in dnspython

I am using dnspython to get the A record and return the result (IP address for a given domain).I have this simple testing python script:import dns.resolverdef resolveDNS():domain = "google.com&quo…

IO completion port key confusion

Im writing an IO completion port based server (source code here) using the Windows DLL API in Python using the ctypes module. But this is a pretty direct usage of the API and this question is directed…

PySpark reversing StringIndexer in nested array

Im using PySpark to do collaborative filtering using ALS. My original user and item ids are strings, so I used StringIndexer to convert them to numeric indices (PySparks ALS model obliges us to do so).…

Numba np.convolve really slow

Im trying to speed up a piece of code convolving a 1D array (filter) over each column of a 2D array. Somehow, when I run it with numbas njit, I get a 7x slow down. My thoughts:Maybe column indexing is …

Python: Retrieving only POP3 message text, no headers

Im trying to make a Python program that retrieves only the body text of an email without passing headers or any other parameters. Im not sure how to go about this.The goal is to be able to send basic c…