Pandas Groupby Unique Multiple Columns

2024/10/5 13:29:48

I have a dataframe.

import pandas as pd
df = pd.DataFrame(           
{'number': [0,0,0,1,1,2,2,2,2], 'id1': [100,100,100,300,400,700,700,800,700], 'id2': [100,100,200,500,600,700,800,900,1000]})id1   id2  number
0  100   100       0
1  100   100       0
2  100   200       0
3  300   500       1
4  400   600       1
5  700   700       2
6  700   800       2
7  800   900       2
8  700  1000       2

(This represents a much larger dataframe I am working with ~millions of rows).

I can apply a groupby().unique to one column:

df.groupby(['number'])['id1'].unique()number
0         [100]
1    [300, 400]
2    [700, 800]
Name: id1, dtype: objectdf.groupby(['number'])['id2'].unique()number
0               [100, 200]
1               [500, 600]
2    [700, 800, 900, 1000]
Name: id2, dtype: object

I want to do the unique over both columns simultaneously to get it ordered in a dataframe:

number
0               [100, 200]
1     [300, 400, 500, 600]
2    [700, 800, 900, 1000]

When I try and do this for both columns I get the error:

pd.Data.Frame(df.groupby(['number'])['id1', 'id2'].unique())Traceback (most recent call last):File "C:\Python34\lib\site-packages\IPython\core\interactiveshell.py", line 2885, in run_codeexec(code_obj, self.user_global_ns, self.user_ns)File "<ipython-input-15-bfc6026e241e>", line 9, in <module>df.groupby(['number'])['id1', 'id2'].unique()File "C:\Python34\lib\site-packages\pandas\core\groupby.py", line 498, in __getattr__(type(self).__name__, attr))
AttributeError: 'DataFrameGroupBy' object has no attribute 'unique'

What do? Is it preferable to use a multi-index?

Edit: In addition is it possible to get the output as follows:

number
0 100
0 200
1 300
1 400
1 500
1 600
2 700
2 800
2 900
2 1000
Answer

You can select all column by []:

s = (df.groupby(['number'])['id1', 'id2'].apply(lambda x: pd.unique(x.values.ravel()).tolist()))print (s)
number
0               [100, 200]
1     [300, 500, 400, 600]
2    [700, 800, 900, 1000]
dtype: object

Or:

s2 = (df.groupby(['number'])['id1', 'id2'].apply(lambda x: np.unique(x.values.ravel()).tolist()))
print (s2)
number
0               [100, 200]
1     [300, 400, 500, 600]
2    [700, 800, 900, 1000]
dtype: object

EDIT:

If need output as column, first reshape by stack and then drop_duplicates:

df1 = (df.set_index('number')[['id1', 'id2']].stack().reset_index(level=1, drop=True).reset_index(name='a').drop_duplicates())
print (df1)number     a
0        0   100
5        0   200
6        1   300
7        1   500
8        1   400
9        1   600
10       2   700
13       2   800
15       2   900
17       2  1000
https://en.xdnf.cn/q/70484.html

Related Q&A

OpenCV Error: Assertion failed when using COLOR_BGR2GRAY function

Im having a weird issue with opencv. I have no issues when working in a jupyter notebook but do when trying to run this Sublime.The error is: OpenCV Error: Assertion failed (depth == CV_8U || depth == …

matplotlib 1.3.1 has requirement numpy=1.5, but youll have numpy 1.8.0rc1 which is incompatible

Im executing bellow command in Mac (High Sierra) as a part of getting started with pyAudioAnalysis.pip install numpy matplotlib scipy sklearn hmmlearn simplejson eyed3 pydub Im getting following error…

VS Code Debugger Immediately Exits

I use VS Code for a python project but recently whenever I launch the debugger it immediately exits. The debug UI will pop up for half a second then disappear. I cant hit a breakpoint no matter where i…

Sudoku Checker in Python

I am trying to create a sudoku checker in python:ill_formed = [[5,3,4,6,7,8,9,1,2],[6,7,2,1,9,5,3,4,8],[1,9,8,3,4,2,5,6,7],[8,5,9,7,6,1,4,2,3],[4,2,6,8,5,3,7,9], # <---[7,1,3,9,2,4,8,5,6],[9,6,1,5,…

Subclassing numpy scalar types

Im trying to subclass numpy.complex64 in order to make use of the way numpy stores the data, (contiguous, alternating real and imaginary part) but use my own __add__, __sub__, ... routines.My problem i…

Python file IO w vs wb [duplicate]

This question already has answers here:What does wb mean in this code, using Python?(5 answers)Closed 10 years ago.Wondering what the real difference is when writing files from Python. From what I can…

How to extract hour, minute and second from Series filled with datetime.time values

Data:0 09:30:38 1 13:40:27 2 18:05:24 3 04:58:08 4 09:00:09Essentially what Id like to do is split this into three columns [hour, minute, second]Ive tried the following code but none see…

Getting an error attachment_filename does not exist in my docker environment

Due to some reasons this particular code is not working in docker but it works fine in development environment. I am getting error "TypeError: send_file() got an unexpected keyword argument attach…

How to check if a process with Command line argument is running using python

I would like to check if a script is running with a specific command line argument within a python script.For example I would like to check if:main.py testargIs running. Is there any way I can achieve …

cx_oracle and python 2.7 [duplicate]

This question already has answers here:Python module "cx_Oracle" module could not be found(4 answers)Closed 5 years ago.Im using python 2.7 and cx_oracle ( Windows x86 Installer (Oracle 10g, …