I have a large data frame that looks like so (and is copy-pasteable with df=pd.read_clipboard(sep='\s\s+')
:
user_nm month unique_ips shifted_ips halves quarters mo_pairs100118231 2 set([142.136]) set([]) h1 q1 p1100118231 3 set([142.136]) set([142.136]) h1 q1 p2100118231 6 set([108.0]) set([142.136]) h1 q2 p3100118231 7 set([108.0]) set([108.0]) h2 q3 p4100118231 8 set([142.136]) set([108.0]) h2 q3 p4100118231 9 set([142.136]) set([142.136]) h2 q3 p5100118231 10 set([142.136]) set([142.136]) h2 q4 p5100118231 11 set([142.136]) set([142.136]) h2 q4 p6100406016 3 set([50.192]) set([]) h1 q1 p2100406016 7 set([50.192]) set([50.192]) h2 q3 p4
for each user, I want to group by halves
(or quarters
, or mo_pairs
) and get the union of unique_ips
and shifted_ips
.
I can groupby the fields like so:
In [265]: a=df.groupby(['user_nm','halves'])In [266]: a.head()
Out[266]: user_nm month unique_ips shifted_ips halves quarters mo_pairs
user_nm halves
100118231 h1 0 100118231 2 set([142.136]) set([]) h1 q1 p11 100118231 3 set([142.136]) set([142.136]) h1 q1 p22 100118231 6 set([108.0]) set([142.136]) h1 q2 p3h2 3 100118231 7 set([108.0]) set([108.0]) h2 q3 p44 100118231 8 set([142.136]) set([108.0]) h2 q3 p45 100118231 9 set([142.136]) set([142.136]) h2 q3 p56 100118231 10 set([142.136]) set([142.136]) h2 q4 p57 100118231 11 set([142.136]) set([142.136]) h2 q4 p6
100406016 h1 8 100406016 3 set([50.192]) set([]) h1 q1 p2h2 9 100406016 7 set([50.192]) set([50.192]) h2 q3 p4
However, when I attempt to union these rows, I get an error:
In [267]: a.apply(lambda x: x[2] & x[3], axis=1)
TypeError: <lambda>() got an unexpected keyword argument 'axis'
Ideally, I would like something like this:
unique_ips shifted_ips
user_nm halves
100118231 h1 set([142.136, 108.0]) set([142.136])
100118231 h2 set([142.136,108.0]) set([142.136,108.0])
100406016 h1 set([50.192]) set([])
100406016 h2 set([50.192]) set([50.192])
I've also tried set_index
, but that does not group the dataframe appropriately
b=df.set_index(['user_nm','halves'])
This seems like a relatively simple task, what am I missing?