How to handle unseen categorical values in test data set using python?

2024/9/30 9:26:20

Suppose I have location feature. In train data set its unique values are 'NewYork', 'Chicago'. But in test set it has 'NewYork', 'Chicago', 'London'. So while creating one hot encoding how to ignore 'London'? In other words, How not to encode the categories that only appear in the test set?

Answer

You can use the parameter handle_unknown in one hot encoding.

ohe = OneHotEncoder(handle_unknown=‘ignore’)

This will not show an error and will let execution occur.

See Documentation for more https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

https://en.xdnf.cn/q/71104.html

Related Q&A

How to get Facebook access token using Python library?

Ive found this Library it seems it is the official one, then found this, but everytime i find an answer the half of it are links to Facebook API Documentation which talks about Javascript or PHP and ho…

How to convert shapefile/geojson to hexagons using uber h3 in python?

I want to create hexagons on my geographic map and want to preserve the digital boundary specified by the shapefile/geojson as well. How do I do it using ubers h3 python library? Im new to shapefiles…

Python mypy: float and int are incompatible types with numbers.Real

I am new to Pythons static typing module mypy. I am trying to append ints and floats to an array, which I typed statically to be Real. But mypy says that they are incompatible types with Real. I though…

Search/Find functionality in QTableView

I have a QWidget and inside that, there is a QTableView. I need to have a find functionality on the first column of the table, so when I click on Ctrl+F, a find dialog will pop-up.class Widget(QWidget)…

How does searching with pip work?

Yes, Im dead serious with this question. How does searching with pip work?The documentation of the keyword search refers to a "pip search reference" at https://pip.pypa.io/en/stable/user_gui…

keras LSTM feeding input with the right shape

I am getting some data from a pandas dataframe with the following shapedf.head() >>> Value USD Drop 7 Up 7 Mean Change 7 Change Predict 0.06480 2.0 4.0 -0.000429 …

Problems with a binary one-hot (one-of-K) coding in python

Binary one-hot (also known as one-of-K) coding lies in making one binary column for each distinct value for a categorical variable. For example, if one has a color column (categorical variable) that ta…

How to hide the title bar in pygame?

I was wondering does anyone know how to hide the pygame task bar?I really need this for my pygame program!Thanks!

Deleting existing class variable yield AttributeError

I am manipulating the creation of classes via Pythons metaclasses. However, although a class has a attribute thanks to its parent, I can not delete it.class Meta(type):def __init__(cls, name, bases, dc…

Setting global font size in kivy

What is the preferred way, whether through python or the kivy language, to set the global font size (i.e. for Buttons and Labels) in kivy? What is a good way to dynamically change the global font size…