Python Pandas rolling aggregate a column of lists

2024/11/17 21:27:55

I have a simple dataframe df with a column of lists lists. I would like to generate an additional column based on lists.

The df looks like:

import pandas as pd
lists={1:[[1]],2:[[1,2,3]],3:[[2,9,7,9]],4:[[2,7,3,5]]}
#create test dataframe
df=pd.DataFrame.from_dict(lists,orient='index')
df=df.rename(columns={0:'lists'})
dflists
1           [1]
2     [1, 2, 3]
3  [2, 9, 7, 9]
4  [2, 7, 3, 5]

I would like df to look like this:

df
Out[9]: lists                 rolllists
1           [1]                       [1]
2     [1, 2, 3]              [1, 1, 2, 3]
3  [2, 9, 7, 9]     [1, 2, 3, 2, 9, 7, 9]
4  [2, 7, 3, 5]  [2, 9, 7, 9, 2, 7, 3, 5]

Basically I want to 'sum'/append the rolling 2 lists. Note that row 1, because I only have 1 list 1, rolllists is that list. But in row 2, I have 2 lists that I want appended. Then for row three, append df[2].lists and df[3].lists etc. I have worked on similar things before, reference this:Pandas Dataframe, Column of lists, Create column of sets of cumulative lists, and record by record differences.
In addition, if we can get this part above, then I want to do this in a groupby (so the example below would be 1 group for example, so for instance the df might look like this in the groupby):

  Group         lists                 rolllists
1     A           [1]                       [1]
2     A     [1, 2, 3]              [1, 1, 2, 3]
3     A  [2, 9, 7, 9]     [1, 2, 3, 2, 9, 7, 9]
4     A  [2, 7, 3, 5]  [2, 9, 7, 9, 2, 7, 3, 5]
5     B           [1]                       [1]
6     B     [1, 2, 3]              [1, 1, 2, 3]
7     B  [2, 9, 7, 9]     [1, 2, 3, 2, 9, 7, 9]
8     B  [2, 7, 3, 5]  [2, 9, 7, 9, 2, 7, 3, 5]

I have tried various things like df.lists.rolling(2).sum() and I get this error:

TypeError: cannot handle this type -> object 

in Pandas 0.24.1 and unfortunatley in Pandas 0.22.0 the command doesn't error, but instead returns the exact same values as in lists. So Looks like newer versions of Pandas can't sum lists? That's a secondary issue.

Love any help! Have Fun!

Answer

You can start with

import pandas as pd
mylists={1:[[1]],2:[[1,2,3]],3:[[2,9,7,9]],4:[[2,7,3,5]]}
mydf=pd.DataFrame.from_dict(mylists,orient='index')
mydf=mydf.rename(columns={0:'lists'})
mydf = pd.concat([mydf, mydf], axis=0, ignore_index=True)
mydf['group'] = ['A']*4 + ['B']*4# initialize your new series
mydf['newseries'] = mydf['lists']# define the function that appends lists overs rows
def append_row_lists(data):for i in data.index:try: data.loc[i+1, 'newseries'] = data.loc[i, 'lists'] + data.loc[i+1, 'lists']except: passreturn data# loop over your groups
for gp in mydf.group.unique():condition = mydf.group == gpmydf[condition] = append_row_lists(mydf[condition])

Output

          lists Group                 newseries
0           [1]     A                       [1]
1     [1, 2, 3]     A              [1, 1, 2, 3]
2  [2, 9, 7, 9]     A     [1, 2, 3, 2, 9, 7, 9]
3  [2, 7, 3, 5]     A  [2, 9, 7, 9, 2, 7, 3, 5]
4           [1]     B                       [1]
5     [1, 2, 3]     B              [1, 1, 2, 3]
6  [2, 9, 7, 9]     B     [1, 2, 3, 2, 9, 7, 9]
7  [2, 7, 3, 5]     B  [2, 9, 7, 9, 2, 7, 3, 5]
https://en.xdnf.cn/q/71134.html

Related Q&A

Easy way of overriding default methods in custom Python classes?

I have a class called Cell:class Cell:def __init__(self, value, color, size):self._value = valueself._color = colorself._size = size# and other methods...Cell._value will store a string, integer, etc. …

Return first non NaN value in python list

What would be the best way to return the first non nan value from this list?testList = [nan, nan, 5.5, 5.0, 5.0, 5.5, 6.0, 6.5]edit:nan is a float

How to subplot pie chart in plotly?

How can I subplot pie1 in fig, so it be located at the first position. this is how I am doing it but it doesnt work out import pandas as pdimport numpy as npimport seaborn as snsimport plotly.offline a…

Example of use \G in negative variable-length lookbehinds to limit how far back the lookbehind goes

In the pypi page of the awesome regex module (https://pypi.python.org/pypi/regex) it is stated that \G can be used "in negative variable-length lookbehinds to limit how far back the lookbehind goe…

Regex with lookbehind not working using re.match

The following python code:import reline="http://google.com" procLine = re.match(r(?<=http).*, line) if procLine.group() == "":print(line + ": did not match regex") els…

testing python multiprocessing pool code with nose

I am trying to write tests with nose that get set up with something calculated using multiprocessing.I have this directory structure:code/tests/tests.pytests.py looks like this:import multiprocessing a…

Python verify url goes to a page

I have a list of urls (1000+) which have been stored for over a year now. I want to run through and verify them all to see if they still exist. What is the best / quickest way to check them all and re…

Bokeh: Synchronizing hover tooltips in linked plots

I have two linked plots. When hovering, I would like to have a tooltip appear in both plots. I already use the linked selection with great success, but now I want to link the tooltips also.Below is an …

Pipe STDIN to a script that is itself being piped to the Python interpreter?

I need to implement an SVN pre-commit hook which executes a script that itself is stored in SVN.I can use the svn cat command to pipe that script to the Python interpreter, as follows:svn cat file://$R…

subprocess.call using cygwin instead of cmd on Windows

Im programming on Windows 7 and in one of my Python projects I need to call bedtools, which only works with Cygwin on Windows. Im new to Cygwin, installed the default version + everything needed for be…