python - Pandas: groupby ffill for multiple columns

2024/10/1 12:10:27

I have the following DataFrame with some missing values. I want to use ffill() to fill missing values in both var1 and var2 grouped by date and building. I can do that for one variable at a time, but when I try to do it for both, it crashes. How can I do this for both variables at once, while also not modifying but retaining var3 or var4?

df = pd.DataFrame({'date': ['2019-01-01','2019-01-01','2019-01-01','2019-01-01','2019-02-01','2019-02-01','2019-02-01','2019-02-01'],'building': ['a', 'a', 'b', 'b', 'a', 'a', 'b', 'b'],'var1': [1.5, np.nan, 2.1, 2.2, 1.2, 1.3, 2.4, np.nan],'var2': [100, 110, 105, np.nan, 102, np.nan, 103, 107],'var3': [10, 11, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],'var4': [1, 2, 3, 4, 5, 6, 7, 8]
})
df  date  building  var1    var2    var3    var4
0   2019-01-01  a   1.5    100.0    10.0    1
1   2019-01-01  a   NaN    110.0    11.0    2
2   2019-01-01  b   2.1    105.0    NaN     3
3   2019-01-01  b   2.2    NaN      NaN     4
4   2019-02-01  a   1.2    102.0    NaN     5
5   2019-02-01  a   1.3    NaN      NaN     6
6   2019-02-01  b   2.4    103.0    NaN     7
7   2019-02-01  b   NaN    107.0    NaN     8# This works
df['var1'] = df.groupby(['date', 'building'])['var1'].ffill()
df['var2'] = df.groupby(['date', 'building'])['var2'].ffill()
dfdate  building  var1    var2    var3    var4
0   2019-01-01  a        1.5    100.0   10.0    1
1   2019-01-01  a        1.5    110.0   11.0    2
2   2019-01-01  b        2.1    105.0   NaN     3
3   2019-01-01  b        2.2    105.0   NaN     4
4   2019-02-01  a        1.2    102.0   NaN     5
5   2019-02-01  a        1.3    102.0   NaN     6
6   2019-02-01  b        2.4    103.0   NaN     7
7   2019-02-01  b        2.4    107.0   NaN     8# This doesn't work
df[['var1', 'var2']] = df.groupby(['date', 'building'])[['var1', 'var2']].ffill()
ValueError: Columns must be same length as key
Answer

I think you need to add fillna before your groupby.

df[["var1", "var2"]] = df[["var1", "var2"]].fillna(df.groupby(['date', 'building'])[["var1", "var2"]].ffill())date        building    var1    var2    var3    var4
0   2019-01-01  a           1.5     100.0   10.0    1
1   2019-01-01  a           1.5     110.0   11.0    2
2   2019-01-01  b           2.1     105.0   NaN     3
3   2019-01-01  b           2.2     105.0   NaN     4
4   2019-02-01  a           1.2     102.0   NaN     5
5   2019-02-01  a           1.3     102.0   NaN     6
6   2019-02-01  b           2.4     103.0   NaN     7
7   2019-02-01  b           2.4     107.0   NaN     8
https://en.xdnf.cn/q/70963.html

Related Q&A

Gtk-Message: Failed to load module canberra-gtk-module

My pygtk program writes this warning to stderr:Gtk-Message: Failed to load module "canberra-gtk-module"libcanberra seems to be a library for sound.My program does not use any sound. Is there …

Why does installation of some Python packages require Visual Studio?

Say, you are installing a Python package for pyEnchant or crfsuite, etc. It fails to install and in the error trace it says some .bat (or .dll) file is missing.A few forums suggest you install Visual S…

Does Django ORM have an equivalent to SQLAlchemys Hybrid Attribute?

In SQLAlchemy, a hybrid attribute is either a property or method applied to an ORM-mapped class,class Interval(Base):__tablename__ = intervalid = Column(Integer, primary_key=True)start = Column(Integer…

Building a Python shared object binding with cmake, which depends upon external libraries

We have a c file called dbookpy.c, which will provide a Python binding some C functions.Next we decided to build a proper .so with cmake, but it seems we are doing something wrong with regards to linki…

What linux distro is better suited for Python web development?

Which linux distro is better suited for Python web development?Background:I currently develop on Windows and its fine, but I am looking to move my core Python development to Linux. Im sure most any di…

Relation between 2D KDE bandwidth in sklearn vs bandwidth in scipy

Im attempting to compare the performance of sklearn.neighbors.KernelDensity versus scipy.stats.gaussian_kde for a two dimensional array.From this article I see that the bandwidths (bw) are treated diff…

How to style (rich text) in QListWidgetItem and QCombobox items? (PyQt/PySide)

I have found similar questions being asked, but without answers or where the answer is an alternative solution.I need to create a breadcrumb trail in both QComboBoxes and QListWidgets (in PySide), and …

Reconnecting to device with pySerial

I am currently having a problem with the pySerial module in Python. My problem relates to connecting and disconnecting to a device. I can successfully connect to my device and communicate with it for a…

How to check if a sentence is a question with spacy?

I am using spacy library to build a chat bot. How do I check if a document is a question with a certain confidence? I know how to do relevance, but not sure how to filter statements from questions.I a…

Python: Lifetime of module-global variables

I have a shared resource with high initialisation cost and thus I want to access it across the system (its used for some instrumentation basically, so has to be light weight). So I created a module man…