pandass resample with fill_method: Need to know data from which row was copied?

2024/9/8 11:20:13

I am trying to use resample method to fill the gaps in timeseries data. But I also want to know which row was used to fill the missed data.

This is my input series.

In [28]: data
Out[28]: 
Date
2002-09-09    233.25
2002-09-11    233.05
2002-09-16    230.25
2002-09-18    230.10
2002-09-19    230.05
Name: Price

With resample, I will get this

In [29]: data.resample("D", fill_method='bfill')
Out[29]: 
Date
2002-09-09    233.25
2002-09-10    233.05
2002-09-11    233.05
2002-09-12    230.25
2002-09-13    230.25
2002-09-14    230.25
2002-09-15    230.25
2002-09-16    230.25
2002-09-17    230.10
2002-09-18    230.10
2002-09-19    230.05
Freq: D

I am looking for

Out[29]: 
Date
2002-09-09    233.25  2002-09-09
2002-09-10    233.05  2012-09-11
2002-09-11    233.05  2012-09-11
2002-09-12    230.25  2012-09-16
2002-09-13    230.25  2012-09-16
2002-09-14    230.25  2012-09-16
2002-09-15    230.25  2012-09-16
2002-09-16    230.25  2012-09-16
2002-09-17    230.10  2012-09-18  
2002-09-18    230.10  2012-09-18
2002-09-19    230.05  2012-09-19

Any help?

Answer

After converting the Series to a DataFrame, copy the index into it's own column. (DatetimeIndex.format() is useful here as it returns a string representation of the index, rather than Timestamp/datetime objects.)

In [510]: df = pd.DataFrame(data)In [511]: df['OrigDate'] = df.index.format()In [513]: df
Out[513]: Price    OrigDate
Date                          
2002-09-09  233.25  2002-09-09
2002-09-11  233.05  2002-09-11
2002-09-16  230.25  2002-09-16
2002-09-18  230.10  2002-09-18
2002-09-19  230.05  2002-09-19

For resampling without aggregation, there is a helper method asfreq().

In [528]: df.asfreq("D", method='bfill')
Out[528]: Price    OrigDate
2002-09-09  233.25  2002-09-09
2002-09-10  233.05  2002-09-11
2002-09-11  233.05  2002-09-11
2002-09-12  230.25  2002-09-16
2002-09-13  230.25  2002-09-16
2002-09-14  230.25  2002-09-16
2002-09-15  230.25  2002-09-16
2002-09-16  230.25  2002-09-16
2002-09-17  230.10  2002-09-18
2002-09-18  230.10  2002-09-18
2002-09-19  230.05  2002-09-19

This is effectively short-hand for the following, where last() is invoked on the intermediate DataFrameGroupBy objects.

In [529]: df.resample("D", how='last', fill_method='bfill')
Out[529]: Price    OrigDate
Date                          
2002-09-09  233.25  2002-09-09
2002-09-10  233.05  2002-09-11
2002-09-11  233.05  2002-09-11
2002-09-12  230.25  2002-09-16
2002-09-13  230.25  2002-09-16
2002-09-14  230.25  2002-09-16
2002-09-15  230.25  2002-09-16
2002-09-16  230.25  2002-09-16
2002-09-17  230.10  2002-09-18
2002-09-18  230.10  2002-09-18
2002-09-19  230.05  2002-09-19
https://en.xdnf.cn/q/72897.html

Related Q&A

Inefficient multiprocessing of numpy-based calculations

Im trying to parallelize some calculations that use numpy with the help of Pythons multiprocessing module. Consider this simplified example:import time import numpyfrom multiprocessing import Pooldef t…

SQLite: return only top 2 results within each group

I checked other solutions to similar problems, but sqlite does not support row_number() and rank() functions or there are no examples which involve joining multiple tables, grouping them by multiple co…

Python list.append if not in list vs set.add performance [duplicate]

This question already has answers here:Which is faster and why? Set or List?(3 answers)Closed 6 years ago.Which is more performant, and what is asymptotic complexity (or are they equivalent) in Pytho…

using the hardware rng from python

Are there any ready made libraries so that the intel hardware prng (rdrand) can be used by numpy programs to fill buffers of random numbers?Failing this can someone point me in the right direction for…

How do I revert sys.stdout.close()?

In the interactive console:>>> import sys >>> sys.stdout <open file <stdout>, mode w at 0xb7810078> >>> sys.stdout.close() >>> sys.stdout # confirming th…

Find a value from x axis that correspond to y axis in matplotlib python

I am trying to do simple task such as to read values of x axis that corresponds to value of y axis in matplotlib and I cannot see what is wrong. In this case I am interested for example to find which v…

Django accessing OneToOneField

Made a view that extended User:class Client(models.Model):user = models.OneToOneField(User, related_name=user)def __unicode__(self):return "%s" % (self.user) I am currently trying to access…

Pandas DataFrame: copy the contents of a column if it is empty

I have the following DataFrame with named columns and index:a a* b b* 1 5 NaN 9 NaN 2 NaN 3 3 NaN 3 4 NaN 1 NaN 4 NaN 9 NaN 7The data…

Solving the most profit algorithm [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 9…

Get combobox value in python

Im developing an easy program and I need to get the value from a Combobox. It is easy when the Combobox is in the first created window but for example if I have two windows and the Combobox is in the s…