Question 1

I want to perform GridSearchCV in a SVC model, but that uses the one-vs-all strategy. For the latter part, I can just do this:

model_to_set = OneVsRestClassifier(SVC(kernel="poly"))

My problem is with the parameters. Let's say I want to try the following values:

parameters = {"C":[1,2,4,8], "kernel":["poly","rbf"],"degree":[1,2,3,4]}

In order to perform GridSearchCV, I should do something like:

 cv_generator = StratifiedKFold(y, k=10)model_tunning = GridSearchCV(model_to_set, param_grid=parameters, score_func=f1_score, n_jobs=1, cv=cv_generator)

However, then I execute it I get:

Traceback (most recent call last):File "/.../main.py", line 66, in <module>argclass_sys.set_model_parameters(model_name="SVC", verbose=3, file_path=PATH_ROOT_MODELS)File "/.../base.py", line 187, in set_model_parametersmodel_tunning.fit(self.feature_encoder.transform(self.train_feats), self.label_encoder.transform(self.train_labels))File "/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.py", line 354, in fitreturn self._fit(X, y)File "/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.py", line 392, in _fitfor clf_params in grid for train, test in cv)File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 473, in __call__self.dispatch(function, args, kwargs)File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 296, in dispatchjob = ImmediateApply(func, args, kwargs)File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 124, in __init__self.results = func(*args, **kwargs)File "/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.py", line 85, in fit_grid_pointclf.set_params(**clf_params)File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 241, in set_params% (key, self.__class__.__name__))
ValueError: Invalid parameter kernel for estimator OneVsRestClassifier

Basically, since the SVC is inside a OneVsRestClassifier and that's the estimator I send to the GridSearchCV, the SVC's parameters can't be accessed.

In order to accomplish what I want, I see two solutions:

When creating the SVC, somehow tell it not to use the one-vs-one strategy but the one-vs-all.
Somehow indicate the GridSearchCV that the parameters correspond to the estimator inside the OneVsRestClassifier.

I'm yet to find a way to do any of the mentioned alternatives. Do you know if there's a way to do any of them? Or maybe you could suggest another way to get to the same result?

Thanks!

Question 2

When you use nested estimators with grid search you can scope the parameters with __ as a separator. In this case the SVC model is stored as an attribute named estimator inside the OneVsRestClassifier model:

from sklearn.datasets import load_iris
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import f1_scoreiris = load_iris()model_to_set = OneVsRestClassifier(SVC(kernel="poly"))parameters = {"estimator__C": [1,2,4,8],"estimator__kernel": ["poly","rbf"],"estimator__degree":[1, 2, 3, 4],
}model_tunning = GridSearchCV(model_to_set, param_grid=parameters,score_func=f1_score)model_tunning.fit(iris.data, iris.target)print model_tunning.best_score_
print model_tunning.best_params_

That yields:

0.973290762737
{'estimator__kernel': 'poly', 'estimator__C': 1, 'estimator__degree': 2}

GridSearch for an estimator inside a OneVsRestClassifier

Related Q&A

The Pythonic way of organizing modules and packages

Where do you need to use lit() in Pyspark SQL?

Evaluate multiple scores on sklearn cross_val_score

Generate SQL statements from a Pandas Dataframe

How to translate a model label in Django Admin?

converty numpy array of arrays to 2d array

profiling a method of a class in Python using cProfile?

Installing h5py on an Ubuntu server

NLTK Named Entity Recognition with Custom Data

How do I write to the console in Google App Engine?