sklearn pipeline transform ValueError that Expected Value is not equal to Trained Value

2024/10/8 6:19:20

Can you please help me to with the following function where I got the error of ValueError: Column ordering must be equal for fit and for transform when using the remainder keyword

(The function is called on a pickled sklearn pipeline that I had saved in GCP Storage.)

Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-192-c6a8bc0ab221> in <module>
----> 1 safety_project_lite(request)<ipython-input-190-24c565131f14> in safety_project_lite(request)31 32     df_resp = pd.DataFrame(data=request_data)
---> 33     response = loaded_model.predict(df_resp)34 35     output = {"Safety Rating": response[0]}~/.local/lib/python3.5/site-packages/sklearn/utils/metaestimators.py in <lambda>(*args, **kwargs)114 115         # lambda, but not partial, allows help() to work with update_wrapper
--> 116         out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)117         # update the docstring of the returned function118         update_wrapper(out, self.fn)~/.local/lib/python3.5/site-packages/sklearn/pipeline.py in predict(self, X, **predict_params)417         Xt = X418         for _, name, transform in self._iter(with_final=False):
--> 419             Xt = transform.transform(Xt)420         return self.steps[-1][-1].predict(Xt, **predict_params)421 ~/.local/lib/python3.5/site-packages/sklearn/compose/_column_transformer.py in transform(self, X)581             if (n_cols_transform >= n_cols_fit and582                     any(X.columns[:n_cols_fit] != self._df_columns)):
--> 583                 raise ValueError('Column ordering must be equal for fit '584                                  'and for transform when using the '585                                  'remainder keyword')ValueError: Column ordering must be equal for fit and for transform when using the remainder keyword

Code:

def safety_project_lite_beta(request):client = storage.Client(request.GCP_Project)bucket = client.get_bucket(request.GCP_Bucket)blob = bucket.blob(request.GCP_Path)model_file = BytesIO()blob.download_to_file(model_file)loaded_model = pickle.loads(model_file.getvalue())request_data = {'A': [request.A],'B': [request.B],'C': [request.C],'D': [request.D],'E': [request.E],'F': [request.F]}df_resp = pd.DataFrame(data=request_data)response = loaded_model.predict(df_resp)output = {"Rating": response[0]}return output
Answer

The model can only predict if the data you feed it is of the same structure as it has been trained on.

To force the fact that df_resp has the same columns as X_train, pass a list of its columns along when building the dataframe:

df_resp = pd.DataFrame(request_data, columns=X_train.columns)

If that variable is for some reason not available, you could pickle its column list (X_train.columns) and use it later:

loaded_cols = pickle.loads([...])
df_resp = pd.DataFrame(data=request_data, columns=loaded_cols)

This ensures a more dynamic workflow where you could add columns more easily for example.

https://en.xdnf.cn/q/118724.html

Related Q&A

How to show Chinese characters in Matplotlib graphs?

I want to make a graph based on a data frame that has a column with Chinese characters. But the characters wont show on the graph, and I received this error. C:\Users\march\anaconda3\lib\site-packages\…

nginx flask aws 502 Bad Gateway

My server is running great yesterday but now it returned a 502 error, how could this happen?In my access.log shows:[24/Aug/2016:07:40:29 +0000] "GET /ad/image/414 HTTP/1.1" 502 583 "-&q…

Let discord bot interact with other bots

I have a Python script for a Discord bot and I want it to send a message to another Bot and select the prompt option and then type in a message but I cant get the interaction done. It just sends the me…

Image has 3 channels but its in a grayscale color. If I change it to 1 channel, it goes into RGB

I started doing some image-processing in python and Ive stumbled upon an issue which is kind of confusing from a beginners perspective. I have a dataset of 1131 np arrays (images) of MRI on knee. The s…

Creating a Barplot using pyqt

I need plotting an animated bar chart with pyqtgraph. With animate i mean a chart, which updates his values given by a serial port. For now, a not-animated plot will be enough. I would like to implemen…

Stop Button in Tkinter

Im trying to have a turtle animation start with a button and stop with a button. Its very easy to start with a button but I cant seem to be able to figure out a stop button? Heres my code so far: imp…

binascii.Error: Incorrect padding How to decode the end with /

I received a string encoded with base64, I am using python to decode it, but decoding failed, I found that the string is followed by / ends, I dont know how to decode it, I havent found the answer, who…

How to match words in a list with user input in python?

I am working on program which takes user input and replaces the words in a list with x. eg is the word is sucks and user input is "this word is sucks". the output should be "this word i…

How to plot a ROC curve using dataframe converted from CSV file

I was trying to plot a ROC curve by using the documentation provided by sklearn. My data is in a CSV file, and it looks like this.It has two classes Goodand Badscreenshot of my CSV fileAnd my code look…

SyntaxError: Non-ASCII character. Python

Could somebody tell me which character is a non-ASCII character in the following:Columns(str) – comma-seperated list of values. Works only if format is tab or xls. For UnitprotKB, some possible column…