I am applying OneHotEncoder on numpy array.
Here's the code
print X.shape, test_data.shape #gives 4100, 15) (410, 15)
onehotencoder_1 = OneHotEncoder(categorical_features = [0, 3, 4, 5, 6, 8, 9, 11, 12])
X = onehotencoder_1.fit_transform(X).toarray()
onehotencoder_2 = OneHotEncoder(categorical_features = [0, 3, 4, 5, 6, 8, 9, 11, 12])
test_data = onehotencoder_2.fit_transform(test_data).toarray()print X.shape, test_data.shape #gives (4100, 46) (410, 43)
where both X
and test_data
are <type 'numpy.ndarray'>
X
is my train set while test_data
my test set.
How come the no. of columns different for X
& test_data
. they should be 46 or either 43 for both after applying onehotencoder.
I am applying OnehotEncoder on specific attributes as they are categorical in nature in both X
and test_data
Can someone point out what is wrong here?