What is the closest equivalent to an R Factor variable in Python pandas?
What is the closest equivalent to an R Factor variable in Python pandas?
This question seems to be from a year back but since it is still open here's an update. pandas has introduced a categorical
dtype and it operates very similar to factors
in R. Please see this link for more information:
http://pandas-docs.github.io/pandas-docs-travis/categorical.html
Reproducing a snippet from the link above showing how to create a "factor" variable in pandas.
In [1]: s = Series(["a","b","c","a"], dtype="category")In [2]: s
Out[2]:
0 a
1 b
2 c
3 a
dtype: category
Categories (3, object): [a < b < c]