How to calculate number of dates within a year of a date in pandas

2024/9/20 15:34:21

I have the following dataframe and I need to calculate the amount of ER visit Dates with a score of 1 that are one year after the PheneDate for that pheneDate for a given subject. So basically phenevisit 003v1 has 2 dates within a year of 11/23/05 with a score of 1, which are 5/5/06 and 8/5/06, so its score will be 2 and so on for the other phenevisits.

PheneVisit  PheneDate   Score   ER Date    SubjectIDN/A     0       10/25/05   phchp003
phchp003v1  11/23/05    0                  phchp003N/A     1       5/5/06     phchp003
phchp003v2  5/10/06     0                  phchp003N/A     0       6/22/06    phchp003N/A     1       8/5/06     phchp003
phchp003v4  2/7/14      0                  phchp003N/A     1       10/13/14   phchp003N/A     0       2/15/15    phchp003N/A     1       8/14/15    phchp003
phchp004v2  4/27/12     0                  phchp004
phchp004v3  8/15/12     0                  phchp004N/A     1       5/18/13    phchp004N/A     0       6/21/13    phchp004
phchp004v4  6/3/15      0                  phchp004N/A     0       8/27/15    phchp004N/A     1       9/3/15     phchp004N/A     1       8/22/16    phchp004N/A     1       11/19/16   phchp004
phchp005v1  2/8/06      0                  phchp005N/A     1       3/24/06    phchp005N/A     1       4/16/06    phchp005N/A     1       4/25/06    phchp005N/A     1       5/18/06    phchp005N/A     0       5/25/06    phchp005N/A     0       6/2/06     phchp005

I am looking to get this column for the given phenevisits within each subject:

PheneVisit  First Year Hosp0
phchp003v1  20
phchp003v2  200
phchp003v4  1000
phchp004v2  0
phchp004v3  200
phchp004v4  20000
phchp005v1  4

Let me know if there is anything I can clariy, thanks.

Answer

try this:

import pandas as pd
import numpy as np
from io import StringIOinputtxt = StringIO("""
PheneVisit  PheneDate   Score   ER Date    SubjectID
N/A             N/A     0       10/25/05   phchp003
phchp003v1  11/23/05    0       N/A         phchp003
N/A             N/A     1       5/5/06     phchp003
phchp003v2  5/10/06     0       N/A        phchp003
N/A             N/A     0       6/22/06    phchp003
N/A             N/A     1       8/5/06     phchp003
phchp003v4  2/7/14      0       N/A        phchp003
N/A             N/A     1       10/13/14   phchp003
N/A             N/A     0       2/15/15    phchp003
N/A             N/A     1       8/14/15    phchp003
phchp004v2  4/27/12     0       N/A        phchp004
phchp004v3  8/15/12     0       N/A        phchp004
N/A             N/A     1       5/18/13    phchp004
N/A             N/A     0       6/21/13    phchp004
phchp004v4  6/3/15      0       N/A        phchp004
N/A             N/A     0       8/27/15    phchp004
N/A             N/A     1       9/3/15     phchp004
N/A             N/A     1       8/22/16    phchp004
N/A             N/A     1       11/19/16   phchp004
phchp005v1  2/8/06      0       N/A        phchp005
N/A             N/A     1       3/24/06    phchp005
N/A             N/A     1       4/16/06    phchp005
N/A             N/A     1       4/25/06    phchp005
N/A             N/A     1       5/18/06    phchp005
N/A             N/A     0       5/25/06    phchp005
N/A             N/A     0       6/2/06     phchp005
""")df = pd.read_csv(inputtxt, sep='\s\s+', engine='python')df['PheneDate'] = pd.to_datetime(df['PheneDate'], format='%m/%d/%y')df['ER Date'] = pd.to_datetime(df['ER Date'], format='%m/%d/%y')df['pi'] = pd.IntervalIndex.from_arrays(df['PheneDate'], df['PheneDate'] + pd.DateOffset(years=1))
df
def f(x):x = x.set_index('pi')x['Number of First Year'] = np.sum(np.vstack([x.index.contains(i) for i in x.loc[x['Score'] == 1, 'ER Date']]), 0)return x.reset_index(drop=True)df.groupby('SubjectID').apply(f).groupby('PheneVisit')['Number of First Year'].transform('sum')

Output:

SubjectID   
phchp003   0    NaN1    2.02    NaN3    1.04    NaN5    NaN6    1.07    NaN8    NaN9    NaN
phchp004   0    0.01    1.02    NaN3    NaN4    1.05    NaN6    NaN7    NaN8    NaN
phchp005   0    4.01    NaN2    NaN3    NaN4    NaN5    NaN6    NaN
Name: Number of First Year, dtype: float64
https://en.xdnf.cn/q/119479.html

Related Q&A

Remove substring from string if substring in list in data frame column

I have the following data frame df1string lists 0 i have a dog [fox, dog, cat] 1 there is a cat [dog, house, car] 2 hello everyone [hi, hello, everyone] 3 …

how to save data in the db django model?

Good day, I cant really understand what Im doing wrong in here. I was using this function base view to store my scrap data in the database with the django model, but now its not saving any more. I cant…

Move existing jointplot legend

I tried answers from a previous question to no avail in Matplotlib 1.5.1. I have a seaborn figure:import seaborn as sns %matplotlib inline import matplotlib.pyplot as plt import numpy as np tips = sns.…

timezone conversion of a large list of timestamps from an excel file with python

I have an excel file named "hello.xlsx". There is a column of timestamps that has a lot of rows (more than 80,000 rows for now). The file basically looks like this:03/29/2018 19:24:5003/29/20…

N_gram frequency python NTLK

I want to write a function that returns the frequency of each element in the n-gram of a given text. Help please. I did this code fo counting frequency of 2-gramcode:from nltk import FreqDistfrom nltk.…

Is there a way to have a list of 4 billion numbers in Python?

I made a binary search function and Im curious what would happen if I used it on 4 billion numbers, but I get a MemoryError every time I use it. Is there a way to store the list without this issue?

ValueError: invalid literal for int() with base 10: when it worked before

Im having some issues with my program, basically what Im trying to do is Stenography, insert an image into another image and then extract the secret image.My program is able to insert just fine, but ex…

How to fetch the current branch from Jenkins?

I would like to query Jenkins using its API and Python to fetch the branch that is currently ready to be built.How can I do that?

How to vertically stretch graphs with matplotlib subplot [duplicate]

This question already has answers here:How do I change the size of figures drawn with Matplotlib?(16 answers)Closed 5 years ago.With the following code, I try to plot 12 different histograms in one pi…

Python Selenium Traceback (most recent call last):

Im trying to use selenium for a python web scraper but when I try to run the program I get the following error: "/Applications/Python 3.8/IDLE.app/Contents/MacOS/Python" "/Applications/P…