calculate the queue for orders based on creation and delivery date, by product group

2024/11/13 10:28:40

I have a Pandas dataframe containing records for a lot of orders, one recorde for each order. Each record has order_id, category_id, created_at and picked_at. I need to calculate queue length for each order at the time of it's creation. Which means for each record current_order I need to count the number of rows with following conditions:

  • must have the same category_id as the current_order
  • must be created before created_at of the current_order
  • must be picked after created_at of the current_order

The dataframe is quite larg hence doing the calculation using a loop is too time consuming. How can I do this faster?

Any help would be greatly appreciated.

Edited

A sample of dataframe:

          id  category_id          created_at           picked_at
0  123228779        69558 2021-05-22 00:08:46 2021-05-22 00:22:45
1  123228972        69558 2021-05-22 00:12:39 2021-05-22 00:17:00
2  123229120         6725 2021-05-22 00:15:47 2021-05-22 00:42:50
3  123229210        41358 2021-05-22 00:17:44 2021-05-22 00:35:34
4  123229152         6725 2021-05-22 00:16:29 2021-05-22 01:05:43
Answer

Let's first start by reshaping the dataframe to have created_at and picked_at in the same column. Then we calculate the queue value.

df2 = (df.melt(id_vars=['id', 'category_id'],var_name='type',value_name='time').sort_values(by=['category_id', 'time']) # not required to sort by "category_id",# but done here for clarity)df2['queue'] = (df2['type'].map({'created_at': 1, 'picked_at': -1}).cumsum())
>>> df2id  category_id        type                time  queue
2  123229120         6725  created_at 2021-05-22 00:15:47      1
4  123229152         6725  created_at 2021-05-22 00:16:29      2
7  123229120         6725   picked_at 2021-05-22 00:42:50      1
9  123229152         6725   picked_at 2021-05-22 01:05:43      0
3  123229210        41358  created_at 2021-05-22 00:17:44      1
8  123229210        41358   picked_at 2021-05-22 00:35:34      0
0  123228779        69558  created_at 2021-05-22 00:08:46      1
1  123228972        69558  created_at 2021-05-22 00:12:39      2
6  123228972        69558   picked_at 2021-05-22 00:17:00      1
5  123228779        69558   picked_at 2021-05-22 00:22:45      0

Finally, we reshape the queue to the original dataframe:

df['queue'] = (df2.pivot(columns=['type'],values=['queue']).loc[:, ('queue', 'created_at')].dropna().astype(int))

output:

          id  category_id          created_at           picked_at  queue
0  123228779        69558 2021-05-22 00:08:46 2021-05-22 00:22:45      1
1  123228972        69558 2021-05-22 00:12:39 2021-05-22 00:17:00      2
2  123229120         6725 2021-05-22 00:15:47 2021-05-22 00:42:50      1
3  123229210        41358 2021-05-22 00:17:44 2021-05-22 00:35:34      1
4  123229152         6725 2021-05-22 00:16:29 2021-05-22 01:05:43      2

NB. this gives us the queue, per category_id, after creation.

https://en.xdnf.cn/q/119482.html

Related Q&A

Python print with string invalid syntax

I have a rock, paper, scissors code Ive been working on lately (yes, I am a total noob at coding), and I get an Invalid Syntax error with this specific line:print(The magical 8ball reads "Your for…

How to load images and text labels for CNN regression from different folders

I have two folders, X_train and Y_train. X_train is images, Y_train is vector and .txt files. I try to train CNN for regression. I could not figure out how to take data and train the network. When i us…

How to calculate number of dates within a year of a date in pandas

I have the following dataframe and I need to calculate the amount of ER visit Dates with a score of 1 that are one year after the PheneDate for that pheneDate for a given subject. So basically phenevi…

Remove substring from string if substring in list in data frame column

I have the following data frame df1string lists 0 i have a dog [fox, dog, cat] 1 there is a cat [dog, house, car] 2 hello everyone [hi, hello, everyone] 3 …

how to save data in the db django model?

Good day, I cant really understand what Im doing wrong in here. I was using this function base view to store my scrap data in the database with the django model, but now its not saving any more. I cant…

Move existing jointplot legend

I tried answers from a previous question to no avail in Matplotlib 1.5.1. I have a seaborn figure:import seaborn as sns %matplotlib inline import matplotlib.pyplot as plt import numpy as np tips = sns.…

timezone conversion of a large list of timestamps from an excel file with python

I have an excel file named "hello.xlsx". There is a column of timestamps that has a lot of rows (more than 80,000 rows for now). The file basically looks like this:03/29/2018 19:24:5003/29/20…

N_gram frequency python NTLK

I want to write a function that returns the frequency of each element in the n-gram of a given text. Help please. I did this code fo counting frequency of 2-gramcode:from nltk import FreqDistfrom nltk.…

Is there a way to have a list of 4 billion numbers in Python?

I made a binary search function and Im curious what would happen if I used it on 4 billion numbers, but I get a MemoryError every time I use it. Is there a way to store the list without this issue?

ValueError: invalid literal for int() with base 10: when it worked before

Im having some issues with my program, basically what Im trying to do is Stenography, insert an image into another image and then extract the secret image.My program is able to insert just fine, but ex…