Is numerical encoding necessary for the target variable in classification?

2024/10/14 9:22:38

I am using sklearn for text classification, all my features are numerical but my target variable labels are in text. I can understand the rationale behind encoding features to numerics but don't think this applies for the target variable?

Answer

If your target variable is in textual form, you can transform it into numeric form (or you can leave it alone, please see my note below) in order for any Scikit-learn algorithm to pick it in an OVA (One Versus All) scheme: your learning algorithm will try to guess each class as compared against the residual ones only when they will be transformed into numeric codes starting from 0 to (number of classes - 1).

For instance, in this example from the Scikit-Learn documentation, you can figure out the class of your iris because there are three models that evaluate each possible class:

  • class 0 versus classes 1 and 2
  • class 1 versus classes 0 and 2
  • class 2 versus classes 0 and 1

Naturally, classes 0, 1 and 2 are Setosa, Versicolor, and Virginica, but the algorithm needs them expressed as numeric codes, as you can verify by exploring the results of the example code:

list(iris.target_names)
['setosa', 'versicolor', 'virginica']np.unique(Y)
array([0, 1, 2])

NOTE: it is true that Scikit-learn encodes by itself the target labelsif they are strings. On Scikit-learn's Github page for logisticregression(https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/logistic.py)you can see at rows 1623 and 1624 where the code calls the label encoderand it encodes labels automatically:

# Encode for string labels
label_encoder = LabelEncoder().fit(y)
y = label_encoder.transform(y)
https://en.xdnf.cn/q/69426.html

Related Q&A

django - regex for optional url parameters

I have a view in django that can accept a number of different filter parameters, but they are all optional. If I have 6 optional filters, do I really have to write urls for every combination of the 6 …

How do I remove transparency from a histogram created using Seaborn in python?

Im creating histograms using seaborn in python and want to customize the colors. The default settings create transparent histograms, and I would like mine to be solid. How do I remove the transparency?…

Set confidence levels in seaborn kdeplot

Im completely new to seaborn, so apologies if this is a simple question, but I cannot find anywhere in the documentation a description of how the levels plotted by n_levels are controlled in kdeplot. T…

OpenCV (cv2 in Python) VideoCapture not releasing camera after deletion

I am relatively new to Python, just having learnt it over the past month or so and have hacked this together based off examples and others code I found online.I have gotten a Tkinter GUI to display the…

Paho MQTT Python Client: No exceptions thrown, just stops

I try to setup a mqtt client in python3. This is not the first time im doing this, however i came across a rather odd behaviour. When trying to call a function, which contains a bug, from one of the c…

SSH Key-Forwarding using python paramiko

We currently run a script on our desktop that uses paramiko to ssh to a remote linux host. Once we are on the remote linux host we execute another command to log into another remote machine. What we wa…

Is it possible to ignore Matplotlib first default color for plotting?

Matplotlib plots each column of my matrix a with 4 columns by blue, yellow, green, red.Then, I plot only the second, third, fourth columns from matrix a[:,1:4]. Is it possible to make Matplotlib ignore…

`sock.recv()` returns empty string when connection is dead on non-blocking socket

I have a non-blocking socket in Python called sock. According to my understanding the recv() method should raise an exception if the connection has been closed by the peer, but it returns an empty stri…

Iteration order with pandas groupby on a pre-sorted DataFrame

The SituationIm classifying the rows in a DataFrame using a certain classifier based on the values in a particular column. My goal is to append the results to one new column or another depending on cer…

How do I pass an exception between threads in python

I need to pass exceptions across a thread boundary.Im using python embedded in a non thread safe app which has one thread safe call, post_event(callable), which calls callable from its main thread.I am…