How to plot a ROC curve using dataframe converted from CSV file

2024/11/18 5:32:41

I was trying to plot a ROC curve by using the documentation provided by sklearn. My data is in a CSV file, and it looks like this.It has two classes 'Good'and 'Bad'

screenshot of my CSV file

screenshot of my CSV file

And my code looks like this

import numpy as np
import matplotlib.pyplot as plt
from itertools import cycle
import sys
from sklearn import svm, datasets
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier
from scipy import interp
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import MultinomialNB# Import some data to play with
df = pd.read_csv("E:\\autodesk\\TTI ROC curve.csv")
X =df[['TTI','Max TemperatureF','Mean TemperatureF','Min TemperatureF',' Min Humidity']].values
y = df['TTI_Category'].as_matrix()
# Binarize the output
y = label_binarize(y, classes=['Good','Bad'])
n_classes = y.shape[1]# shuffle and split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,random_state=0)# Learn to predict each class against the other
classifier = OneVsRestClassifier(svm.SVC(kernel='linear', probability=True,random_state=random_state))
y_score = classifier.fit(X_train, y_train).decision_function(X_test)# Compute ROC curve and ROC area for each class
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])roc_auc[i] = auc(fpr[i], tpr[i])# Compute micro-average ROC curve and ROC area
fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel())
roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])
plt.figure()
lw = 2
plt.plot(fpr[2], tpr[2], color='darkorange',lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[2])
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic example')
plt.legend(loc="lower right")
plt.show()enter code here

If i run this code the system told me random_state is not defined. so I changed it to random_state=true. Then the system told me

plt.plot(fpr[2], tpr[2], color='darkorange', KeyError: 2 <matplotlib.figure.Figure at 0xd8bff60>

if I print out n_classes. The system told me it's "1", and if I print out the n_classes in the documentation it says 3. I'm not sure if that's where the problem is. Does anyone have answer to this traceback?

Answer

Looks like you simply don't understand how your data is structured and how your code should work.

LabelBinarizer will return a one-v-all encoding, meaning that for two classes you will get the following mapping: ['good', 'bad', 'good'] -> [[1], [0], [1]], s.t. n_classes = 1.

Why would you expect it to be 3 if you have 2 classes? Simply change plt.plot(fpr[2], tpr[2], color='darkorange', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[2]) to plt.plot(fpr[0], tpr[0], color='darkorange', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[0]) and you should be good.

https://en.xdnf.cn/q/118715.html

Related Q&A

SyntaxError: Non-ASCII character. Python

Could somebody tell me which character is a non-ASCII character in the following:Columns(str) – comma-seperated list of values. Works only if format is tab or xls. For UnitprotKB, some possible column…

A pseudocode algorithm for integer addition based on binary operation

I have tried for ages to come up with a solution but just cant get my head around it.It needs to be based on two integers on the use of standard logical operations which have direct hardware implementa…

How to efficiently split overlapping ranges?

I am looking for an efficient method to split overlapping ranges, I have read many similar questions but none of them solves my problem. The problem is simple, given a list of triplets, the first two e…

pass 2D array to linear regression (sklearn)

I want to pass 2D array to linear regression: x = [[1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 3, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1],[0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0,…

How do I fix this OverflowError?

I keep getting a "OverflowError: math range error". No matter what I input, the result is the same. Im running Python 3.3, and its finding the problem at the last line. How do I fix this? (A…

Pyinstaller subprocess.check_output error

Ive bundled my app with pyinstaller to 2 *.exegui_app.exe (onefile) config.ini \libs (onedir)winservice.exe+ all DLLs and libsWhen I manually install service with command winservice.exe install everyth…

Exception handler to check if inline script for variable worked

I need to add exception handling that considers if line 7 fails because there is no intersection between the query and array brands. Im new to using exception handlers and would appreciate any advice o…

Parameter list with single argument

When testing Python parameter list with a single argument, I found some weird behavior with print.>>> def hi(*x): ... print(x) ... >>> hi() () >>> hi(1,2) (1, 2) >>…

Scatter plot of values in pandas dataframe

I have a pandas dataframe in the following format. I am trying to plot this data based on ClusterAssigned, with probably different colors for 0 and 1. Distance ClusterAssigned23 135 120 …

String Delimiter in Python

I want to do split a string using "},{" as the delimiter. I have tried various things but none of them work.string="2,1,6,4,5,1},{8,1,4,9,6,6,7,0},{6,1,2,3,9},{2,3,5,4,3 "Split it i…