GridSearch for an estimator inside a OneVsRestClassifier

2024/11/20 8:32:06

I want to perform GridSearchCV in a SVC model, but that uses the one-vs-all strategy. For the latter part, I can just do this:

model_to_set = OneVsRestClassifier(SVC(kernel="poly"))

My problem is with the parameters. Let's say I want to try the following values:

parameters = {"C":[1,2,4,8], "kernel":["poly","rbf"],"degree":[1,2,3,4]}

In order to perform GridSearchCV, I should do something like:

 cv_generator = StratifiedKFold(y, k=10)model_tunning = GridSearchCV(model_to_set, param_grid=parameters, score_func=f1_score, n_jobs=1, cv=cv_generator)

However, then I execute it I get:

Traceback (most recent call last):File "/.../main.py", line 66, in <module>argclass_sys.set_model_parameters(model_name="SVC", verbose=3, file_path=PATH_ROOT_MODELS)File "/.../base.py", line 187, in set_model_parametersmodel_tunning.fit(self.feature_encoder.transform(self.train_feats), self.label_encoder.transform(self.train_labels))File "/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.py", line 354, in fitreturn self._fit(X, y)File "/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.py", line 392, in _fitfor clf_params in grid for train, test in cv)File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 473, in __call__self.dispatch(function, args, kwargs)File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 296, in dispatchjob = ImmediateApply(func, args, kwargs)File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 124, in __init__self.results = func(*args, **kwargs)File "/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.py", line 85, in fit_grid_pointclf.set_params(**clf_params)File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 241, in set_params% (key, self.__class__.__name__))
ValueError: Invalid parameter kernel for estimator OneVsRestClassifier

Basically, since the SVC is inside a OneVsRestClassifier and that's the estimator I send to the GridSearchCV, the SVC's parameters can't be accessed.

In order to accomplish what I want, I see two solutions:

  1. When creating the SVC, somehow tell it not to use the one-vs-one strategy but the one-vs-all.
  2. Somehow indicate the GridSearchCV that the parameters correspond to the estimator inside the OneVsRestClassifier.

I'm yet to find a way to do any of the mentioned alternatives. Do you know if there's a way to do any of them? Or maybe you could suggest another way to get to the same result?

Thanks!

Answer

When you use nested estimators with grid search you can scope the parameters with __ as a separator. In this case the SVC model is stored as an attribute named estimator inside the OneVsRestClassifier model:

from sklearn.datasets import load_iris
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import f1_scoreiris = load_iris()model_to_set = OneVsRestClassifier(SVC(kernel="poly"))parameters = {"estimator__C": [1,2,4,8],"estimator__kernel": ["poly","rbf"],"estimator__degree":[1, 2, 3, 4],
}model_tunning = GridSearchCV(model_to_set, param_grid=parameters,score_func=f1_score)model_tunning.fit(iris.data, iris.target)print model_tunning.best_score_
print model_tunning.best_params_

That yields:

0.973290762737
{'estimator__kernel': 'poly', 'estimator__C': 1, 'estimator__degree': 2}
https://en.xdnf.cn/q/26343.html

Related Q&A

The Pythonic way of organizing modules and packages

I come from a background where I normally create one file per class. I organize common classes under directories as well. This practice is intuitive to me and it has been proven to be effective in C++,…

Where do you need to use lit() in Pyspark SQL?

Im trying to make sense of where you need to use a lit value, which is defined as a literal column in the documentation.Take for example this udf, which returns the index of a SQL column array:def find…

Evaluate multiple scores on sklearn cross_val_score

Im trying to evaluate multiple machine learning algorithms with sklearn for a couple of metrics (accuracy, recall, precision and maybe more).For what I understood from the documentation here and from t…

Generate SQL statements from a Pandas Dataframe

I am loading data from various sources (csv, xls, json etc...) into Pandas dataframes and I would like to generate statements to create and fill a SQL database with this data. Does anyone know of a way…

How to translate a model label in Django Admin?

I could translate Django Admin except a model label because I dont know how to translate a model label in Django Admin. So, how can I translate a model label in Django Admin?

converty numpy array of arrays to 2d array

I have a pandas series features that has the following values (features.values)array([array([0, 0, 0, ..., 0, 0, 0]), array([0, 0, 0, ..., 0, 0, 0]),array([0, 0, 0, ..., 0, 0, 0]), ...,array([0, 0, 0, …

profiling a method of a class in Python using cProfile?

Id like to profile a method of a function in Python, using cProfile. I tried the following:import cProfile as profile# Inside the class method... profile.run("self.myMethod()", "output_f…

Installing h5py on an Ubuntu server

I was installing h5py on an Ubuntu server. However it seems to return an error that h5py.h is not found. It gives the same error message when I install it using pip or the setup.py file. What am I miss…

NLTK Named Entity Recognition with Custom Data

Im trying to extract named entities from my text using NLTK. I find that NLTK NER is not very accurate for my purpose and I want to add some more tags of my own as well. Ive been trying to find a way t…

How do I write to the console in Google App Engine?

Often when I am coding I just like to print little things (mostly the current value of variables) out to console. I dont see anything like this for Google App Engine, although I note that the Google Ap…