pandas - concat with columns of same categories turns to object

2024/10/3 10:37:42

I want to concatenate two dataframes with category-type columns, by first adding the missing categories to each column.

df = pd.DataFrame({"a": pd.Categorical(["foo", "foo", "bar"]), "b": [1, 2, 1]})
df2 = pd.DataFrame({"a": pd.Categorical(["baz"]), "b": [1]})df["a"] = df["a"].cat.add_categories("baz")
df2["a"] = df2["a"].cat.add_categories(["foo", "bar"])

In theory the categories for both "a" columns are the same:

In [33]: df.a.cat.categories
Out[33]: Index(['bar', 'foo', 'baz'], dtype='object')In [34]: df2.a.cat.categories
Out[34]: Index(['baz', 'foo', 'bar'], dtype='object')

However, when concatenating the two dataframes, I get an object-type "a" column:

In [35]: pd.concat([df, df2]).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 0
Data columns (total 2 columns):
a    4 non-null object
b    4 non-null int64
dtypes: int64(1), object(1)
memory usage: 96.0+ bytes

In the documentation it says that when categories are the same, it should result in a category-type column. Does the order of the categories matter even though the category is unordered? I am using pandas-0.20.3.

Answer

Yes. By using reorder_categories you can change the order of categories, even though the category itself is unordered.

df2["a"] = df2.a.cat.reorder_categories(df.a.cat.categories)In [43]: pd.concat([df, df2]).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 0
Data columns (total 2 columns):
a    4 non-null category
b    4 non-null int64
dtypes: category(1), int64(1)
memory usage: 172.0 bytes
https://en.xdnf.cn/q/70741.html

Related Q&A

Python convert Excel File (xls or xlsx) to/from ODS

Ive been scouring the net to find a Python library or tool that can converts an Excel file to/from ODS format, but havent been able to come across anything. I need the ability to input and output data …

Select pandas frame rows based on two columns values

I wish to select some specific rows based on two column values. For example:d = {user : [1., 2., 3., 4] ,item : [5., 6., 7., 8.],f1 : [9., 16., 17., 18.], f2:[4,5,6,5], f3:[4,5,5,8]} df = pd.DataFrame(…

Using scipy sparse matrices to solve system of equations

This is a follow up to How to set up and solve simultaneous equations in python but I feel deserves its own reputation points for any answer.For a fixed integer n, I have a set of 2(n-1) simultaneous e…

Segmentation Fault in Pandas read_csv

I have Python 2.7.5 on Os X 10.9 with Pandas version 0.12.0-943-gaef5061. When I download this train.csv file and run read_csv, I get Segmentation Fault 11. I have experimented with the file encoding…

Multiple subprocesses with timeouts

Im using a recipe that relies on SIGALRM to set alarm interrupt -- Using module subprocess with timeoutThe problem is that I have more than one Python script using signal.ALARM process to set time-outs…

what is the difference between tfidf vectorizer and tfidf transformer

I know that the formula for tfidf vectorizer is Count of word/Total count * log(Number of documents / no.of documents where word is present)I saw theres tfidf transformer in the scikit learn and I just…

Use Pandas string method contains on a Series containing lists of strings

Given a simple Pandas Series that contains some strings which can consist of more than one sentence:In: import pandas as pd s = pd.Series([This is a long text. It has multiple sentences.,Do you see? M…

Is this the correct way of whitening an image in python?

I am trying to zero-center and whiten CIFAR10 dataset, but the result I get looks like random noise! Cifar10 dataset contains 60,000 color images of size 32x32. The training set contains 50,000 and tes…

Python zlib output, how to recover out of mysql utf-8 table?

In python, I compressed a string using zlib, and then inserted it into a mysql column that is of type blob, using the utf-8 encoding. The string comes back as utf-8, but its not clear how to get it bac…

Incorrect user for supervisord celeryd

I have some periodic tasks that I run with celery (daemonized by supervisord), but after trying to create a directory in the home dir for the user i setup for the supervisord process I got a "perm…