Do full-outer-join with pandas.merge_asof

2024/9/21 20:29:50

Hi I need to align some time series data with nearest timestamps, so I think pandas.merge_asof could be a good candidate. However, it does not have an option to set how='outer' like in the standard merge method.

An example can be:

df1:

                                   Value1
Time
2020-07-17 14:25:03.535906075      108
2020-07-17 14:25:05.457247019      110
2020-07-17 14:25:07.467777014      126

df2:

                                  Value2
Time
2020-07-17 14:25:03.535018921     222
2020-07-17 14:25:04.545104980     150
2020-07-17 14:25:07.476825953      60

Then for example, do this merge_asof:

pd.merge_asof(df1, df2, left_index=True, right_index=True, direction='nearest', tolerance=pd.Timedelta('0.3s'))

The results will be:

                               Value1  Value2
Time
2020-07-17 14:25:03.535906075     108   222.0
2020-07-17 14:25:05.457247019     110     NaN
2020-07-17 14:25:07.467777014     126    60.0

But what I want is:

                               Value1  Value2
Time
2020-07-17 14:25:03.535906075     108   222.0
2020-07-17 14:25:04.545104980     NaN   150.0   <---- this is the difference
2020-07-17 14:25:05.457247019     110     NaN
2020-07-17 14:25:07.467777014     126    60.0

basically like a full outer join.

Any suggestion? Thanks in advance.

EDIT:

So this is the case with 2 dataframes. What if, say for example, there are 10 dataframes (i.e. df1, df2, ..., df10) need to do this "nearest" merging, what will be a good way to do?

Answer
  1. Unfortunately, there is no how parameter in pd.merge_asof like there is with pd.merge, otherwise you could simply pass how='outer'.
  2. As a workaround, you can append the unmatched values from the other dataframe manually
  3. Then, sort the index with .sort_index()

df3 = pd.merge_asof(df1, df2, left_index=True, right_index=True, direction='nearest', tolerance=pd.Timedelta('0.3s'))
df4 = pd.merge_asof(df2, df1, left_index=True, right_index=True, direction='nearest', tolerance=pd.Timedelta('0.3s'))
df5 = df3.append(df4[df4['Value1'].isnull()]).sort_index()
df5
Out[1]: Value1  Value2
Time                                         
2020-07-17 14:25:03.535906075   108.0   222.0
2020-07-17 14:25:04.545104980     NaN   150.0
2020-07-17 14:25:05.457247019   110.0     NaN
2020-07-17 14:25:07.467777014   126.0    60.0
https://en.xdnf.cn/q/72021.html

Related Q&A

order of calling constructors in Python

#!/usr/bin/pythonclass Parent(object): # define parent classparentAttr = 100def __init__(self):print "Calling parent constructor"def parentMethod(self):print Calling parent methoddef s…

How do I access data from a python thread

I have a very simple threading example using Python 3.4.2. In this example I am creating a five threads that just returns the character string "Result" and appends it to an array titled thre…

How to tell if a full-screen application is running?

Is it possible in python to tell if a full screen application on linux is running? I have a feeling it might be possible using Xlib but I havent found a way.EDIT: By full screen I mean the WHOLE scree…

Pretty printers for maps throwing a type error

Ive configured pretty printers using http://wiki.eclipse.org/CDT/User/FAQ#How_can_I_inspect_the_contents_of_STL_containers.3F. It successfully works for vector and other containers. However I cant get …

Return PDF generated with FPDF in Flask

I can generate a PDF with an image using the code below. How can I return the generated PDF from a Flask route?from fpdf import FPDF pdf = FPDF() img = input(enter file name) g = img + .jpg pdf.add_p…

Tensorflow not found on pip install inside Docker Container using Mac M1

Im trying to run some projects using the new Mac M1. Those projects already work on Intel processor and are used by other developers that use Intel. I am not able to build this simple Dockerfile: FROM …

Fast fuse of close points in a numpy-2d (vectorized)

I have a question similar to the question asked here: simple way of fusing a few close points. I want to replace points that are located close to each other with the average of their coordinates. The c…

I use to_gbq on pandas for updating Google BigQuery and get GenericGBQException

While trying to use to_gbq for updating Google BigQuery table, I get a response of:GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, givi…

Something wrong with Keras code Q-learning OpenAI gym FrozenLake

Maybe my question will seem stupid.Im studying the Q-learning algorithm. In order to better understand it, Im trying to remake the Tenzorflow code of this FrozenLake example into the Keras code.My code…

How to generate month names as list in Python? [duplicate]

This question already has answers here:Get month name from number(18 answers)Closed 2 years ago.I have tried using this but the output is not as desired m = [] import calendar for i in range(1, 13):m.a…