Changes in Pandas DataFrames dont preserved after end of for loop

2024/11/16 10:36:28

I have a list of Pandas DataFrames and I want to perform some operations on them. To be more precise, I want to clean their names and add new column. So I have written the following code:

import numpy as np
import pandas as pd
from janitor import clean_names
rng = np.random.RandomState(2019)
dataset = [pd.DataFrame(rng.randint(0, 10, (3, 3)), columns = ['Column A', 'Column B', 'Column C']) for i in range(4)]
for df in dataset:df = df.clean_names()df['column_d'] = df['column_a'] + df['column_b']

However, the changes are not preserved despite explicite assignment. The following code returns the original DataFrame:

dataset[1]Column A    Column B    Column C
0          8           5           3
1          0           2           5
2          7           8           5

What am I missing?

Answer

This is what is happening:

for df in dataset:

This makes df to refer to an item in the list in each iteration.

df = df.clean_names()

df.clean_names() returns a new object, different from df itself. The assignment makes df to refer to that new object instead of the original.

df['column_d'] = df['column_a'] + df['column_b']

This changes df in place but it is not the original object, the original remains untouched.

You should get what you want by using an index to the list to actually replace each item. Something like this:

for i, df in enumerate(dataset):df = df.clean_names()df['column_d'] = df['column_a'] + df['column_b']dataset[i] = df

Not the prettiest thing in the world but I don't have time to think of something better.

https://en.xdnf.cn/q/120338.html

Related Q&A

How can I prevent self from eating one of my test parameters?

I have in my test module:import pytest from src.model_code.central import AgentBasicclass AgentBasicTestee(AgentBasic):pass@pytest.fixture() def agentBasic():return AgentBasicTestee()@pytest.mark.param…

distance from a point to the nearest edge of a polygon

in the below code i want to calculate the distance from a point to the nearest edge of a polygon.as shown in the results section below, the coordinates are provided.the code posted below shows how i fi…

Is there a netcat alternative on Linux using Python? [closed]

Closed. This question is seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. It does not meet Stack Overflow guidelines. It is not currently accepting …

keras/tensorflow model: gradient w.r.t. input return the same (wrong?) value for all input data

Given a trained keras model I am trying to compute the gradient of the output with respect to the input. This example tries to fit the function y=x^2 with a keras model composed by 4 layers of relu act…

Pandas: get json from data frame

I have data framemember_id,2015-05-01,2015-05-02,2015-05-03,2015-05-04,2015-05-05,2015-05-06,2015-05-07,2015-05-08,2015-05-09,2015-05-10,2015-05-11,2015-05-12,2015-05-13,2015-05-14,2015-05-15,2015-05-1…

Python - Statistical distribution

Im quite new to python world. Also, Im not a statistician. Im in the need to implementing mathematical models developed by mathematicians in a computer science programming language. Ive chosen python a…

How to add data in list below?

i have a list :List = [[[1,2],[2,4]],[[1,4],[4,8]],[[53,8],[8,2],[2,82]]]That i want add reverse data to listTo be:[[[1,2],[2,4],[2,1],[4,2]],[[1,4],[4,8],[4,1],[8,4]],[[53,8],[8,2],[2,82],[8,53],[2,8]…

Storing lists within lists in Python

I have a question about accessing elements in lists. This is the code: movies = ["The Holy Grail", 1975, "Terry Jones and Terry Gilliam", 91,["Graham Champman", ["Mic…

getting ZeroDivisionError: integer division or modulo by zero

I had written a simple pascal triangle code in python but I am getting a errordef factorial(n):c=1re=1for c in range(n):re = re * c;return(re)print "Enter how many rows of pascal triangle u want t…

How to scrape images from a website and display them on html file?

I am scraping images from https://www.open2study.com/courses I got all the image sources but dont know how to display the images (instead of links) on a table with 2 column ( one column for title and o…