I am trying to do GridSearch for best hyper-parameters in every individual one of ten folds cross validation, it worked fine with my previous multi-class classification work, but not the case this time with multi-label work.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
clf = OneVsRestClassifier(LinearSVC())C_range = 10.0 ** np.arange(-2, 9)
param_grid = dict(estimator__clf__C = C_range)clf = GridSearchCV(clf, param_grid)
clf.fit(X_train, y_train)
I am getting the error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-65-dcf9c1d2e19d> in <module>()6 7 clf = GridSearchCV(clf, param_grid)
----> 8 clf.fit(X_train, y_train)/usr/local/lib/python2.7/site-packages/sklearn/grid_search.pyc in fit(self, X, y)595 596 """
--> 597 return self._fit(X, y, ParameterGrid(self.param_grid))598 599 /usr/local/lib/python2.7/site-packages/sklearn/grid_search.pyc in _fit(self, X, y,
parameter_iterable)357 % (len(y), n_samples))358 y = np.asarray(y)
--> 359 cv = check_cv(cv, X, y, classifier=is_classifier(estimator))360 361 if self.verbose > 0:/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _check_cv(cv, X,
y, classifier, warn_mask)1365 needs_indices = None1366 if classifier:
-> 1367 cv = StratifiedKFold(y, cv, indices=needs_indices)1368 else:1369 if not is_sparse:/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.pyc in __init__(self,
y, n_folds, indices, shuffle, random_state)427 for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)):428 for label, (_, test_split) in zip(unique_labels, per_label_splits):
--> 429 label_test_folds = test_folds[y == label]430 # the test split can be too big because we used431 # KFold(max(c, self.n_folds), self.n_folds) instead ofValueError: boolean index array should have 1 dimension
Which might refer to the dimension or the format of the label indicator.
print X_train.shape, y_train.shape
get:
(147, 1024) (147, 6)
Seems GridSearch
implements StratifiedKFold
inherently.
The problem raises in the stratified K-fold strategy with multi-label problem.
StratifiedKFold(y_train, 10)
gives
ValueError Traceback (most recent call last)
<ipython-input-87-884ffeeef781> in <module>()
----> 1 StratifiedKFold(y_train, 10)/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.pyc in __init__(self,
y, n_folds, indices, shuffle, random_state)427 for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)):428 for label, (_, test_split) in zip(unique_labels, per_label_splits):
--> 429 label_test_folds = test_folds[y == label]430 # the test split can be too big because we used431 # KFold(max(c, self.n_folds), self.n_folds) instead ofValueError: boolean index array should have 1 dimension
Current use of conventional K-fold strategy works fine. Is there any method to implement stratified K-fold to multi-label classification?