Fill missing timeseries data using pandas or numpy

2024/11/13 13:02:17

I have a list of dictionaries which looks like this :

L=[
{
"timeline": "2014-10", 
"total_prescriptions": 17
}, 
{
"timeline": "2014-11", 
"total_prescriptions": 14
}, 
{
"timeline": "2014-12", 
"total_prescriptions": 8
},
{
"timeline": "2015-1", 
"total_prescriptions": 4
}, 
{
"timeline": "2015-3", 
"total_prescriptions": 10
}, 
{
"timeline": "2015-4", 
"total_prescriptions": 3
} 
]

This basically is the result of a SQL query which when given a start date and an end date gives the count of total prescriptions for each month starting from the start date till the end month.However,for months where the prescriptions count is 0(Feb 2015),it completely skips that month.Is it possible using pandas or numpy to alter this list so that it adds an entry for the missing month with 0 as the total prescription as follows:

[
{
"timeline": "2014-10", 
"total_prescriptions": 17
}, 
{
"timeline": "2014-11", 
"total_prescriptions": 14
}, 
{
"timeline": "2014-12", 
"total_prescriptions": 8
{
"timeline": "2015-1", 
"total_prescriptions": 4
}, 
{
"timeline": "2015-2",   # 2015-2 to be inserted for missing month
"total_prescriptions": 0 # 0 to be inserted for total prescription
}, 
{
"timeline": "2015-3", 
"total_prescriptions": 10
}, 
{
"timeline": "2015-4", 
"total_prescriptions": 3
} 
]
Answer

What you are talking about is called "Resampling" in Pandas; first convert the your time to a numpy datetime and set as your index:

df = pd.DataFrame(L)
df.index=pd.to_datetime(df.timeline,format='%Y-%m')
dftimeline  total_prescriptions
timeline                                
2014-10-01  2014-10                   17
2014-11-01  2014-11                   14
2014-12-01  2014-12                    8
2015-01-01   2015-1                    4
2015-03-01   2015-3                   10
2015-04-01   2015-4                    3

Then you can add in your missing months with resample('MS') (MS stands for "month start" I guess), and use fillna(0) to convert null values to zero as in your requirement.

df = df.resample('MS').fillna(0)
dftotal_prescriptions
timeline                       
2014-10-01                   17
2014-11-01                   14
2014-12-01                    8
2015-01-01                    4
2015-02-01                  NaN
2015-03-01                   10
2015-04-01                    3

To convert back to your original format, convert the datetime index back to string using to_native_types, and then export using to_dict('records'):

df['timeline']=df.index.to_native_types()
df.to_dict('records')
[{'timeline': '2014-10-01', 'total_prescriptions': 17.0},{'timeline': '2014-11-01', 'total_prescriptions': 14.0},{'timeline': '2014-12-01', 'total_prescriptions': 8.0},{'timeline': '2015-01-01', 'total_prescriptions': 4.0},{'timeline': '2015-02-01', 'total_prescriptions': 0.0},{'timeline': '2015-03-01', 'total_prescriptions': 10.0},{'timeline': '2015-04-01', 'total_prescriptions': 3.0}]
https://en.xdnf.cn/q/72104.html

Related Q&A

Can Biopython perform Seq.find() accounting for ambiguity codes

I want to be able to search a Seq object for a subsequnce Seq object accounting for ambiguity codes. For example, the following should be true:from Bio.Seq import Seq from Bio.Alphabet.IUPAC import IUP…

MySQL and lock a table, read, and then truncate

I am using mysqldb in python.I need to do the following for a table.1) Lock 2) Read 3) Truncate the table 4) UnlockWhen I run the below code, I get the below error. So, I am rather unsure on how to lo…

Train and predict on variable length sequences

Sensors (of the same type) scattered on my site are manually reporting on irregular intervals to my backend. Between reports the sensors aggregate events and report them as a batch. The following datas…

What should a Python project structure look like for Travis CI to find and run tests?

I currently have a project with the following .travis.yml file:language: python install: "pip install tox" script: "tox"Locally, tox properly executes and runs 35 tests, but on Trav…

Having trouble building a Dns Packet in Python

Im trying to build a dns packet to send over a socket. I dont want to use any libraries because I want direct access to the socket variable that sends it. Whenever I send the DNS packet, wireshark says…

Element wise comparison between 1D and 2D array

Want to perform an element wise comparison between an 1D and 2D array. Each element of the 1D array need to be compared (e.g. greater) against the corresponding row of 2D and a mask will be created. He…

Jquery ajax post request not working

I have a simple form submission with ajax, but it keeps giving me an error. All the error says is "error". No code, no description. No nothing, when I alert it when it fails.Javascript with …

Why is the main() function not defined inside the if __main__?

You can often see this (variation a):def main():do_something()do_sth_else()if __name__ == __main__:main()And I am now wondering why not this (variation b):if __name__ == __main__:do_something()do_sth_e…

Using selenium inside gitlab CI/CD

Ive desperetaly tried to set a pytest pipeline CI/CD for my personal projet hosted by gitlab. I tried to set up a simple project with two basic files: file test_core.py, witout any other dependencies f…

VSCode issue with Python versions and environments from Jupyter Notebooks

Issue: I am having issues with the environment and version of Python not matching the settings in VSCode, and causing issues with the packages I am trying to use in Jupyter notebooks. I am using a Wind…