Categorical dtype changes after using melt

2024/7/27 10:48:31

In answering this question, I found that after using melt on a pandas dataframe, a column that was previously an ordered Categorical dtype becomes an object. Is this intended behaviour?

Note: not looking for a solution, just wondering if there is any reason for this behaviour or if it's not intended behavior.

Example:

Using the following dataframe df:

  Cat  L_1  L_2  L_3
0   A    1    2    3
1   B    4    5    6
2   C    7    8    9df['Cat'] = pd.Categorical(df['Cat'], categories = ['C','A','B'], ordered=True)# As you can see `Cat` is a category
>>> df.dtypes
Cat    category
L_1       int64
L_2       int64
L_3       int64
dtype: objectmelted = df.melt('Cat')>>> meltedCat variable  value
0   A      L_1      1
1   B      L_1      4
2   C      L_1      7
3   A      L_2      2
4   B      L_2      5
5   C      L_2      8
6   A      L_3      3
7   B      L_3      6
8   C      L_3      9

Now, if I look at Cat, it's become an object:

>>> melted.dtypes
Cat         object
variable    object
value        int64
dtype: object

Is this intended?

Answer

In source code . 0.22.0(My old version)

 for col in id_vars:mdata[col] = np.tile(frame.pop(col).values, K)mcolumns = id_vars + var_name + [value_name]

Which will return the datatype object with np.tile.

It has been fixed in 0.23.4(After I update my pandas)

df.melt('Cat')
Out[6]: Cat variable  value
0   A      L_1      1
1   B      L_1      4
2   C      L_1      7
3   A      L_2      2
4   B      L_2      5
5   C      L_2      8
6   A      L_3      3
7   B      L_3      6
8   C      L_3      9
df.melt('Cat').dtypes
Out[7]: 
Cat         category
variable      object
value          int64
dtype: object

More info how it fixed :

for col in id_vars:id_data = frame.pop(col)if is_extension_type(id_data): # here will return True , then become concat not np.tileid_data = concat([id_data] * K, ignore_index=True)else:id_data = np.tile(id_data.values, K)mdata[col] = id_data
https://en.xdnf.cn/q/72978.html

Related Q&A

python apscheduler not consistent

Im running a scheduler using python apscheduler inside web.py framework. The function runserver is supposed to run everyday at 9 a.m but it is inconsistent. It runs most days but skips a day once in a …

Change timezone info for multiple datetime columns in pandas

Is there a easy way of converting all timestamp columns in a dataframe to local/any timezone? Not by doing it column by column?

Change permissions via ftp in python

Im using python with ftplib to upload images to a folder on my raspberryPi located in /var/www. Everything is working fine except that uploaded files have 600 permissions and I need 644 for them.Which …

Creating a Persistent Data Object In Django

I have a Python-based maximum entropy classifier. Its large, stored as a Pickle, and takes about a minute to unserialize. Its also not thread safe. However, it runs fast and can classify a sample (a si…

How to catch specific exceptions on sqlalchemy?

I want to catch specific exceptions like UniqueViolation on sqlalchemy.But sqlalchemy throw exceptions only through IntegrityError.So I catched specific exceptions with below code.except sqlalchemy.exc…

numpy.linalg.LinAlgError: SVD did not converge in Linear Least Squares on first run only

I have been wrestling with a known and documented SVD converge issue. Having read up on similar issues raised by others, I have double checked my data and reduced this to a tiny DataFrame - 10 rows/2 c…

Seaborn kde plot plotting probabilities instead of density (histplot without bars)

I have a question about seaborn kdeplot. In histplot one can set up which stats they want to have (counts, frequency, density, probability) and if used with the kde argument, it also applies to the kde…

How can I improve my code for euler 14?

I solved Euler problem 14 but the program I used is very slow. I had a look at what the others did and they all came up with elegant solutions. I tried to understand their code without much success.Her…

Visual Studio Code not recognizing Python import and functions

What do the squiggly lines represent in the image? The actual error the flags up when I hover my mouse over the squiggly line is: Import "pyspark.sql.functions" could not be resolvedPylance …

Pandas interpolate() backwards in dataframe

Going forward, interpolate works great:name days 0 a NaN 1 a NaN 2 a 2 3 a 3 4 a NaN 5 a NaN records.loc[:, days].interpolate(…