Grid search and cross validation SVM

2024/10/11 18:18:34

i am implementing svm using best parameter of grid search on 10fold cross validation and i need to understand prediction results why are different i got two accuracy results testing on training set notice that i need predictio results of the best parameters on the training set for further analysis the code and results are described below. Any explanation

from __future__ import print_functionfrom sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from time import *
from sklearn import metrics
X=datascaled.iloc[:,0:13]
y=datascaled['num']np.random.seed(1)
# Split the dataset in two equal parts
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)# Set the parameters by cross-validation
tuned_parameters =  [{'kernel': ['rbf'], 'gamma': [1e-2, 1e-3, 1e-4, 1e-5],'C': [0.001, 0.10, 0.1, 10, 25, 50, 100, 1000]},{'kernel': ['sigmoid'], 'gamma': [1e-2, 1e-3, 1e-4, 1e-5],'C': [0.001, 0.10, 0.1, 10, 25, 50, 100, 1000] },{'kernel': ['linear'], 'C': [0.001, 0.10, 0.1, 10, 25, 50, 100, 1000]}]              print()clf = GridSearchCV(SVC(), tuned_parameters, cv=10,scoring='accuracy')
t0 = time()clf.fit(X_train, y_train)
t = time() - t0
print("Best parameters set found on development set:")
print()
print(clf.best_params_)
print()
print('Training accuracy')
print(clf.best_score_)
print(clf.best_estimator_)
print()
print()
print('****Results****')
svm_pred=clf.predict(X_train)
#print("\t\taccuracytrainkfold: {}".format(metrics.accuracy_score(y_train, svm_pred)))
print("=" * 52)
print("time cost: {}".format(t))
print()
print("confusion matrix\n", metrics.confusion_matrix(y_train, svm_pred))
print()
print("\t\taccuracy: {}".format(metrics.accuracy_score(y_train, svm_pred)))
print("\t\troc_auc_score: {}".format(metrics.roc_auc_score(y_train, svm_pred)))
print("\t\tcohen_kappa_score: {}".format(metrics.cohen_kappa_score(y_train, svm_pred)))
print()
print("\t\tclassification report")
print("-" * 52)
print(metrics.classification_report(y_train, svm_pred)) Best parameters set found on development set:{'C': 1000, 'gamma': 0.01, 'kernel': 'rbf'}Training accuracy
0.9254658385093167****Results****
====================================================
time cost: 7.728448867797852confusion matrix[[77  2][ 4 78]]accuracy: 0.9627329192546584roc_auc_score: 0.9629515282494597cohen_kappa_score: 0.9254744638173121classification report
----------------------------------------------------precision    recall  f1-score   support0       0.95      0.97      0.96        791       0.97      0.95      0.96        82avg / total       0.96      0.96      0.96       161
Answer

You are using 10-fold cross-validation for training and asking to calculate the prediction accuracy after each fold. I suggest doing the following.

Split the data into 10-folds using sklearn.model_selection.KFold and create a loop that passes through each fold as follows:

for train_index, test_index in kf.split(X):print("TRAIN:", train_index, "TEST:", test_index)X_train, X_test = X[train_index], X[test_index]y_train, y_test = y[train_index], y[test_index]

Inside that loop, build and train the model using the previously used lines repeated below. But use cv=1 rather than cv=10 inside GridSearchCV()

    clf = GridSearchCV(SVC(), tuned_parameters, cv=1, scoring='accuracy')clf.fit(X_train, y_train)

After training the model using data from one fold, then predict its accuracy using the data of the same fold according to the below lines used in your code.

    svm_pred=clf.predict(X_train)print("\t\taccuracy: {}".format(metrics.accuracy_score(y_train, svm_pred)))

The complete code is given below:

for train_index, test_index in kf.split(X):print("TRAIN:", train_index, "TEST:", test_index)X_train, X_test = X[train_index], X[test_index]y_train, y_test = y[train_index], y[test_index]clf = GridSearchCV(SVC(), tuned_parameters, cv=1, scoring='accuracy')clf.fit(X_train, y_train)svm_pred=clf.predict(X_train)print("\t\taccuracy: {}".format(metrics.accuracy_score(y_train, svm_pred)))

Wish that helps :)

https://en.xdnf.cn/q/118288.html

Related Q&A

Accessing dynamically created tkinter widgets

I am trying to make a GUI where the quantity of tkinter entries is decided by the user.My Code:from tkinter import*root = Tk()def createEntries(quantity):for num in range(quantity):usrInput = Entry(roo…

Graphene-Django Filenaming Conventions

Im rebuilding a former Django REST API project as a GraphQL one. I now have queries & mutations working properly.Most of my learning came from looking at existing Graphene-Django & Graphene-Py…

Summing up CSV power plant data by technology and plant name

Ive got a question regarding the Form 860 data about US power plants.It is organized block-wise and not plant-wise. To become useful, the capacity numbers must be summed up.How may I get the total capa…

Send and receive signals from another class pyqt

I am needing a way to receive signals sent by a Class to another class. I have 2 classes: In my first class I have a function that emits a signal called asignal In my second class I call the first cla…

I can not add many values in one function

I have a gui applicationI put text into text box1, text box2,………… text box70 ,and then click on the pushButton, The function return_text () in the module_b.py be called. Now I can call one instance…

Close browser popup in Selenium Python

I am scraping a page using Selenium, Python. On opening the page one Popup appears. I want to close this popup anyway. I tried as below:url = https://shopping.rochebros.com/shop/categories/37browser = …

How can I replace certain string in a string in Python?

I am trying to write two procedures to replace matched strings in a string in python. And I have to write two procedures. def matched_case(old new): .........note: inputs are two strings, it returns a…

Python: `paste multiple (unknown) csvs together

What I am essentially looking for is the `paste command in bash, but in Python2. Suppose I have a csv file:a1,b1,c1,d1 a2,b2,c2,d2 a3,b3,c3,d3And another such:e1,f1 e2,f2 e3,f3I want to pull them toget…

Django Redirect after Login Error

Im new to Django and I know that to redirect after login I have to set the parameter page. But this only works when the login is successful. How can i do the same thing when some error occurs?? Ps: I…

Python: Pulling .png from a website, outputting to another

Hoping this question is not too vague, or asking for too much. Essentially I am analyzing large amounts of spectra, and wanting to create one large webpage that contains these spectra rather than looki…