Python: Split Start and End Date into All Days Between Start and End Date

2024/10/5 15:07:20

I've got data called 'Planned Leave' which includes 'Start Date', 'End Date', 'User ID' and 'Leave Type'.

I want to be able to create a new data-frame which shows all days between Start and End Date, per 'User ID'.

So far, I've only been able to create a date_list which supplies a range of dates between start and end date, but I cannot find a way to include this for each 'User ID' and 'Leave Type'.

Here is my current function:

def datesplit(data):x = pd.DataFrame(columns=['Date'])for i in plannedleave.iterrows():start = data['Start Date'][i]end = data['End Date'][i]date_list = [start + dt.timedelta(days=x) for x in range((end-start).days)]x.append(date_list)return x>>> datesplit(plannedleave)
>>> Value Error: Can only Tuple-index with a MultiIndex

Here's what the data looks like:

>>> plannedleave.dtypes
>>>Employee ID                      int64First Name                      objectLast Name                       objectLeave Type                      objectStart Date              datetime64[ns]End Date                datetime64[ns]
dtype: object

I'd be forever grateful if you could find a solution here! :-)

Answer

Here are necessary loops, so I prefer DataFrame.itertuples more like DataFrame.iterrows for performance in list comprehension:

def datesplit(df):df1 = df.rename(columns={'Start Date':'sdate','End Date':'edate', 'Employee ID':'ID'})return  (pd.concat([pd.Series(r.ID,pd.date_range(r.sdate, r.edate)) for r in df1.itertuples()]).rename_axis('Date').reset_index(name='Employee ID'))df = datesplit(plannedleave)
print (df)Date  Employee ID
0  2020-05-10         1001
1  2020-05-11         1001
2  2020-05-12         1001
3  2020-05-13         1001
4  2020-05-14         1001
5  2020-05-15         1001
6  2020-05-18         1002
7  2020-05-19         1002
8  2020-05-20         1002
9  2020-05-21         1002
10 2020-05-22         1002

Performance with 200 rows:

plannedleave = pd.concat([plannedleave] * 100, ignore_index=True)def datesplit(df):df1 = df.rename(columns={'Start Date':'sdate','End Date':'edate', 'Employee ID':'ID'})return  (pd.concat([pd.Series(r.ID,pd.date_range(r.sdate, r.edate)) for r in df1.itertuples()]).rename_axis('Date').reset_index(name='Employee ID'))def datesplitvb(data):parts = []for idx, row in data.iterrows():parts.append(pd.DataFrame(row['Employee ID'], columns=['Employee ID'],index=pd.date_range(start=row['Start Date'], end=row['End Date'],name='Date')))return pd.concat(parts).reset_index()In [152]: %timeit datesplit(plannedleave.copy())
98.2 ms ± 4.96 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)In [153]: %timeit datesplitvb(plannedleave.copy())
193 ms ± 30.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
https://en.xdnf.cn/q/119670.html

Related Q&A

Python and java AES/ECB/PKCS5 encryption

JAVA VERSION:public class EncryptUtil {public static String AESEncode(String encodeRules, String content) {try {KeyGenerator keygen = KeyGenerator.getInstance("AES");keygen.init(128, new Secu…

How to find the center point of this rectangle

I am trying to find the center point of the green rectangle which is behind the fish, but my approach is not working. Here is my code:#Finding contours (almost always finds those 2 retangles + some noi…

Simple Battleships game implementation in Python

Okay Im not sure how to develop another board with hidden spaces for the computers ships per-se, and have it test for hits. Again Im not even sure how Im going to test for hits on the board I have now.…

How to remove WindowsPath and parantheses from a string [duplicate]

This question already has an answer here:Reference - What does this regex mean?(1 answer)Closed 4 years ago.I need to remove WindowsPath( and some of the closing parentheses ) from a directory string.…

How to escape escape-characters

I have a string variable which is not printing properly, I guess because it contains escape characters. How to escape these escape-characters?>>> print "[p{Aa}\\P{InBasic_Latin}\r\t\n]&q…

Python Multiprocessing a large dataframe on Linux

As shown in the title, I have a big data frame (df) that needs to be processed row-wise, as df is big (6 GB), I want to utilize the multiprocessing package of python to speed it up, below is a toy exam…

How to pass more arguments through tkinter bind

How do I pass more arguments through tkinters bind method? for the example:tk = Tk() def moveShip(event,key):if event.keysym == Down and player1.selectedCoord[1] != 9:if key == place:player1.selectedC…

Python Class method definition : unexpected indent [duplicate]

This question already has answers here:Im getting an IndentationError (or a TabError). How do I fix it?(6 answers)Closed 6 months ago.I am getting started with Django and Python so naturally Im doing …

pandas python - round() not behaving correctly

Im rounding values in a dataframe to 1 decimal place. Here is the dfVren 2015 Hsten 2014 Vren 2014 Question 1) Maten r vllagad oc…

Converting dictionary into string

d={a:Apple,b:ball,c:cat}The above dictionary I have and I want my Output like the below-mentioned resultres="a=Apple,b=ball,c=cat"Is it possible in a pythonic way then please answer it I have…