sklearn Pipeline: argument of type ColumnTransformer is not iterable

2024/11/13 21:14:22

I am attempting to use a pipeline to feed an ensemble voting classifier as I want the ensemble learner to use models that train on different feature sets. For this purpose, I followed the tutorial available at [1].

Following is the code that I could develop so far.

y = df1.index
x = preprocessing.scale(df1)phy_features = ['A', 'B', 'C']
phy_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())])
phy_processer = ColumnTransformer(transformers=[('phy', phy_transformer, phy_features)])fa_features = ['D', 'E', 'F']
fa_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())])
fa_processer = ColumnTransformer(transformers=[('fa', fa_transformer, fa_features)])pipe_phy = Pipeline(steps=[('preprocessor', phy_processer ),('classifier', SVM)])
pipe_fa = Pipeline(steps=[('preprocessor', fa_processer ),('classifier', SVM)])ens = VotingClassifier(estimators=[pipe_phy, pipe_fa])cv = KFold(n_splits=10, random_state=None, shuffle=True)
for train_index, test_index in cv.split(x):x_train, x_test = x[train_index], x[test_index]y_train, y_test = y[train_index], y[test_index]ens.fit(x_train,y_train)print(ens.score(x_test, y_test))

However, when running the code, I am getting an error saying TypeError: argument of type 'ColumnTransformer' is not iterable, at the line ens.fit(x_train,y_train).

Following is the complete stack trace that I am receiving.

Traceback (most recent call last):File "<input>", line 1, in <module>File "C:\Program Files\JetBrains\PyCharm 2020.1.1\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfilepydev_imports.execfile(filename, global_vars, local_vars)  # execute the scriptFile "C:\Program Files\JetBrains\PyCharm 2020.1.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfileexec(compile(contents+"\n", file, 'exec'), glob, loc)File "C:/Users/ASUS/PycharmProjects/swelltest/enemble.py", line 112, in <module>ens.fit(x_train,y_train)File "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py", line 265, in fitreturn super().fit(X, transformed_y, sample_weight)File "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py", line 65, in fitnames, clfs = self._validate_estimators()File "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_base.py", line 228, in _validate_estimatorsself._validate_names(names)File "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 77, in _validate_namesinvalid_names = [name for name in names if '__' in name]File "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 77, in <listcomp>invalid_names = [name for name in names if '__' in name]
TypeError: argument of type 'ColumnTransformer' is not iterable

Following are the values in the names list when the error is occuring.

1- ColumnTransformer(transformers=[('phy',Pipeline(steps=[('imputer',SimpleImputer(strategy='median')),('scaler', StandardScaler())]),['HR', 'RMSSD', 'SCL'])])
2- ColumnTransformer(transformers=[('fa',Pipeline(steps=[('imputer',SimpleImputer(strategy='median')),('scaler', StandardScaler())]),['Squality', 'Sneutral', 'Shappy'])])

What is the reason for this and how can I fix it?

Answer

The estimators parameter of VotingClassifier should be a list of pairs (name, estimator), so e.g.

ens = VotingClassifier(estimators=[('phy', pipe_phy),('fa', pipe_fa)])

(In your code, the check is trying to find the second element of the pair, hence the complaint that ColumnTransformer is not iterable.)

https://en.xdnf.cn/q/72483.html

Related Q&A

PyQT Window: I want to remember the location it was closed at

I have a QDialog, and when the user closes the QDialog, and reopens it later, I want to remember the location and open the window at the exact same spot. How would I exactly remember that location?

Django Reusable Application Configuration

I have some Django middleware code that connects to a database. I want to turn the middleware into a reusable application ("app") so I can package it for distribution into many other project…

executable made with py2exe doesnt run on windows xp 32bit

I created an executable with py2exe on a 64bit windows 7 machine, and distributed the program.On a windows xp 32bit machine the program refuses to run exhibiting the following behavior:a popup window s…

Pandas reading NULL as a NaN float instead of str [duplicate]

This question already has answers here:How to treat NULL as a normal string with pandas?(4 answers)Closed 5 years ago.Given the file:$ cat test.csv a,b,c,NULL,d e,f,g,h,i j,k,l,m,nWhere the 3rd colum…

How to invert differencing in a Python statsmodels ARIMA forecast?

Im trying to wrap my head around ARIMA forecasting using Python and Statsmodels. Specifically, for the ARIMA algorithm to work, the data needs to be made stationary via differencing (or similar method)…

how to see the content of a particular file in .tar.gz archive without unzipping the contents?

for ex abc.tar.gz has abc/file1.txt abc/file2.txt abc/abc1/file3.txt abc/abc2/file4.txt i need to read/display the contents of file3.txt without extracting the file.Thanks for any input.

Matplotlib animation not showing

When I try this on my computer at home, it works, but not on my computer at work. Heres the codeimport numpy as np import matplotlib.pyplot as plt import matplotlib.animation as animation import sys im…

Extracting Fields Names of an HTML form - Python

Assume that there is a link "http://www.someHTMLPageWithTwoForms.com" which is basically a HTML page having two forms (say Form 1 and Form 2). I have a code like this ...import httplib2 from …

Best way to combine a permutation of conditional statements

So, I have a series of actions to perform, based on 4 conditional variables - lets say x,y,z & t. Each of these variables have a possible True or False value. So, that is a total of 16 possible per…

Fast way to get N Min or Max elements from a list in Python

I currently have a long list which is being sorted using a lambda function f. I then choose a random element from the first five elements. Something like:f = lambda x: some_function_of(x, local_variabl…