Fastest way to iterate through a pandas dataframe?

2024/9/20 19:45:26

How do I run through a dataframe and return only the rows which meet a certain condition? This condition has to be tested on previous rows and columns. For example:

          #1    #2    #3    #4
1/1/1999   4     2     4     5
1/2/1999   5     2     3     3
1/3/1999   5     2     3     8
1/4/1999   6     4     2     6
1/5/1999   8     3     4     7
1/6/1999   3     2     3     8
1/7/1999   1     3     4     1

I could like to test a few conditions for each row and if all conditions are passed I would like to append the row to list. For example:

for row in dataframe:if [row-1, column 0] + [row-2, column 3] >= 6:append row to a list

I may have up to 3 conditions which must be true for the row to be returned. The way am thinking about doing it is by making a list for all the observations which are true for each condition, and then making a separate list for all of the rows that appear in all three lists.

My two questions are the following:

What is the fastest way to get all of the rows that meet a certain condition based on previous rows? Looping through a dataframe of 5,000 rows seems like it may be too long. Especially if potentially 3 conditions have to be tested.

What is the best way to get a list of rows which meet all 3 conditions?

Answer

The quickest way to select rows is to not iterate through the rows of the dataframe. Instead, create a mask (boolean array) with True values for the rows you wish to select, and then call df[mask] to select them:

mask = (df['column 0'].shift(1) + df['column 3'].shift(2) >= 6)
newdf = df[mask]

To combine more than one condition with logical-and, use &:

mask = ((...) & (...))

For logical-or use |:

mask = ((...) | (...))

For example,

In [75]: df = pd.DataFrame({'A':range(5), 'B':range(10,20,2)})In [76]: df
Out[76]: A   B
0  0  10
1  1  12
2  2  14
3  3  16
4  4  18In [77]: mask = (df['A'].shift(1) + df['B'].shift(2) > 12)In [78]: mask
Out[78]: 
0    False
1    False
2    False
3     True
4     True
dtype: boolIn [79]: df[mask]
Out[79]: A   B
3  3  16
4  4  18
https://en.xdnf.cn/q/72311.html

Related Q&A

Constraints do not follow DCP rules in CVXPY

I want to solve this problem using CVXPY but I dont know why I get the following error message:DCPError: Problem does not follow DCP rules. I guess my constraints are not DCP. Is there any way to model…

is this betweenness calculation correct?

I try to calculate betweenness for all nodes for the path from 2 to 6 in this simple graph.G=nx.Graph() edge=[(1,5),(2,5),(3,5),(4,5),(4,6),(5,7),(7,6)] G.add_edges_from(edge) btw=nx.betweenness_centra…

Why does PIL thumbnail not resizing correctly?

I am trying to create and save a thumbnail image when saving the original user image in the userProfile model in my project, below is my code:def save(self, *args, **kwargs):super(UserProfile, self).sa…

Put the legend of pandas bar plot with secondary y axis in front of bars

I have a pandas DataFrame with a secondary y axis and I need a bar plot with the legend in front of the bars. Currently, one set of bars is in front of the legend. If possible, I would also like to pla…

Printing floats with a specific number of zeros

I know how to control the number of decimals, but how do I control the number of zeros specifically?For example:104.06250000 -> 104.0625 119.00000 -> 119.0 72.000000 -> 72.0

How do I make a matplotlib scatter plot square?

In gnuplot I can do this to get a square plot:set size squareWhat is the equivalent in matplotlib? I have tried this:import matplotlib matplotlib.use(Agg) import matplotlib.pyplot as plt plt.rcParams[…

Efficiently determining if a business is open or not based on store hours

Given a time (eg. currently 4:24pm on Tuesday), Id like to be able to select all businesses that are currently open out of a set of businesses. I have the open and close times for every business for ev…

parsing .xsd in python

I need to parse a file .xsd in Python as i would parse an XML. I am using libxml2. I have to parse an xsd that look as follow: <xs:complexType name="ClassType"> <xs:sequence><x…

How to get the params from a saved XGBoost model

Im trying to train a XGBoost model using the params below: xgb_params = {objective: binary:logistic,eval_metric: auc,lambda: 0.8,alpha: 0.4,max_depth: 10,max_delta_step: 1,verbose: True }Since my input…

Reverse Label Encoding giving error

I label encoded my categorical data into numerical data using label encoderdata[Resi] = LabelEncoder().fit_transform(data[Resi])But I when I try to find how they are mapped internally usinglist(LabelEn…