Calculate Time Difference based on Conditionals

2024/9/20 21:32:17

I have a dataframe that looks something like this (actual dataframe is millions of rows):

ID Category Site Task Completed Access Completed
1 A X 1/2/22 12:00:00AM 1/1/22 12:00:00 AM
1 A Y 1/3/22 12:00:00AM 1/2/22 12:00:00 AM
1 A X 1/1/22 1:00:00AM 1/1/22 12:00:00 AM
1 B X 1/1/22 1:00:00AM 1/1/22 12:00:00 AM
2 A X 1/2/22 12:00:00AM 1/1/22 12:00:00 AM
2 A X 1/1/22 1:00:00AM 1/1/22 12:00:00 AM

A few things to point out about this dataframe. First, the 'access completed' date is constant for a ID/category/site combination. So, for ID 1, Cat A, Site X, the access completed will always be 1/1/22 12:00:00 AM no matter how many 'task completed' there are for that combo. Second, there can be multiple task completed dates for a ID/Category/Site combination.

What I want to find is the time difference (in hours) between Access Completed and first Task Completed for every ID/Category/Site combination within the dataset. The intended result would look something like this:

ID Category Site Time Difference
1 A X 1
1 A Y 24
1 B X 1
2 A X 1

Thanks so much for your help.

Answer

Try as follows:

  • First, use pd.to_datetime to turn date/time strings into datetime.
  • Sort the df on column "Task Completed", apply df.groupby and get .first.
  • Now, subtract "Access Completed" from "Task Completed" using Series.sub, and use .dt.total_seconds on the Timedelta values to get all seconds. Since we want hours, we apply .div(3600).
  • Lastly, reset the index and use df.rename to assign a meaningful column name.
df[['Task Completed','Access Completed']] = \df[['Task Completed','Access Completed']].apply(lambda x: pd.to_datetime(x))res = df.sort_values('Task Completed').groupby(['ID','Category','Site']).first()
res = res['Task Completed'].sub(res['Access Completed'])\.dt.total_seconds().div(3600).reset_index(drop=False).rename(columns={0:'Time Difference'})print(res)ID Category Site  Time Difference
0   1        A    X              1.0
1   1        A    Y             24.0
2   1        B    X              1.0
3   2        A    X              1.0
https://en.xdnf.cn/q/119277.html

Related Q&A

Cannot open jpg images with PIL or open()

I am testing to save ImageField in Django, but for some reason all the *.jpg files Ive tried dont work while the one png I had works. Using django shell in WSL VCode terminal. python 3.7 django 3.0 pil…

how to delete tensorflow model before retraining

I cant retrain my image classifier with new images, I get the following error:AssertionError: Export directory already exists. Please specify a different export directory: /tmp/saved_models/1/How do I …

Use beautifulsoup to scrape a table within a webpage?

I am scraping a county website that posts emergency calls and their locations. I have found success webscraping basic elements, but am having trouble scraping the rows of the table. (Here is an example…

Encrypt folder or zip file using python

So I am trying to encrypt a directory using python and Im not sure what the best way to do that is. I am easily able to turn the folder into a zip file, but from there I have tried looking up how to en…

Use Python Element Tree to parse xml in ASCII text file [closed]

Closed. This question needs debugging details. It is not currently accepting answers.Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to repro…

Time series plot showing unique occurrences per day

I have a dataframe, where I would like to make a time series plot with three different lines that each show the daily occurrences (the number of rows per day) for each of the values in another column. …

Problem accessing indexed results two stage stochastic programming Pyomo

When running a stochastic programming problem in Pyomo, the resulting solution works only when running 10 precisely the same scenarios but the results remain zero when running different scenarios. I ai…

Pandas python + format for values

This is the code:import pandas as pd from pandas import Series, DataFrame import numpy as np import matplotlib.pyplot as pltdf.head(3).style.format({Budget: "€ {:,.0f}"}) Year …

Is there any implementation of deconvolution?

Some one may prefer to call it the transposed convolution, as introduced here. Im looking forward to an implementation of the transposed convolution, in Python or C/C++. Thank you all for helping me!

discord.py How to check if user is on server?

I need to check if the user is on the server. Please help me