Merging same-indexed rows by taking non-NaNs from all of them in pandas dataframe

2024/9/20 23:45:55

I have a sparse dataframe with duplicate indices. How can I merge the same-indexed rows in a way that I keep all the non-NaN data from the conflicting rows?

I know that you can achieve something very close with the built-in drop_duplicates function, but you can only keep either the first or the last row with the same index:

df.reset_index().drop_duplicates(subset='index', keep='first').set_index('index').sort_index()

What I'd need is all non-nan values, from any of the conflicting rows.

Before:

DataFrame with A,B,C,D columns, and 1,2,2,3,4,5 rows. In the first 2-indexed row there is a 2.0 in the B column, in the second, there is a 3.0 in the D column.

After:

enter image description here

Answer
df.reset_index().groupby('index').max()

This will select the non-NaN values from the conflicting rows. Or, if there are values in multiple conflicting rows for the same column, the maximum of them.

https://en.xdnf.cn/q/72295.html

Related Q&A

Approximating cos using the Taylor series

Im using the Taylors series to calculate the cos of a number, with small numbers the function returns accurate results for example cos(5) gives 0.28366218546322663. But with larger numbers it returns i…

How to apply max min boundaries to a value without using conditional statements

Problem:Write a Python function, clip(lo, x, hi) that returns lo if x is less than lo; hi if x is greater than hi; and x otherwise. For this problem, you can assume that lo < hi.Dont use any conditi…

pandas to_json() redundant backslashes

I have a .csv file containing data about movies and Im trying to reformat it as a JSON file to use it in MongoDB. So I loaded that csv file to a pandas DataFrame and then used to_json method to write i…

How can I get the old zip() in Python3?

I migrated from Python 2.7 to Python 3.3 and zip() does not work as expected anymore. Indeed, I read in the doc that it now returns an iterator instead of a list.So, how I am supposed to deal with this…

How can I use tensorflow metric function within keras models?

using python 3.5.2 tensorflow rc 1.1Im trying to use a tensorflow metric function in keras. the required function interface seems to be the same, but calling:import pandas import numpy import tensorflo…

Pandas return the next Sunday for every row

In Pandas for Python, I have a data set that has a column of datetimes in it. I need to create a new column that has the date of the following Sunday for each row. Ive tried various methods trying to u…

Where is `_softmax_cross_entropy_with_logits` defined in tensorflow?

I am trying to see how softmax_cross_entropy_with_logits_v2() is implemented. It calls _softmax_cross_entropy_with_logits(). But I dont see where the latter is defined. Does anybody know how to locate …

Python: Counting frequency of pairs of elements in a list of lists

Actually, I have a dataset about a "meeting". For example, A,B,C have a meeting, then the list would be [A,B,C]. Like this, each list would contain a list of members who participated in the …

How to create a pandas dataframe where columns are filled with random strings?

I want to create a Pandas dataframe with 2 columns and x number rows that contain random strings. I have found code to generate a pandas dataframe with random ints and a random stringer generator. I st…

Unable to make my script process locally created server response in the right way

Ive used a script to run selenium locally so that I can make use of the response (derived from selenium) within my spider.This is the web service where selenium runs locally:from flask import Flask, re…