Running locally on a Jupyter notebook and using the MNIST dataset (28k entries, 28x28 pixels per image, the following takes 27 seconds.
from sklearn.neighbors import KNeighborsClassifierknn_clf = KNeighborsClassifier(n_jobs=1)
knn_clf.fit(pixels, labels)
However, the following takes 1722 seconds, in other words ~64 times longer:
from sklearn.model_selection import cross_val_predict
y_train_pred = cross_val_predict(knn_clf, pixels, labels, cv = 3, n_jobs=1)
My naive understanding is that cross_val_predict
with cv=3
is doing 3-fold cross validation, so I'd expect it to fit the model 3 times, and so take at least ~3 times longer, but I don't see why it would take 64x!
To check if it was something specific to my environment, I ran the same in a Colab notebook - the difference was less extreme (15x), but still way above the ~3x I expected:
What am I missing? Why is cross_val_predict so much slower than just fitting the model?
In case it matters, I'm running scikit-learn 0.20.2.