Why does one use of iloc() give a SettingWithCopyWarning, but the other doesnt?

2024/10/18 12:49:51

Inside a method from a class i use this statement:

self.__datacontainer.iloc[-1]['c'] = value

Doing this i get a "SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame"

Now i tried to reproduce this error and write the following simple code:

import pandas, numpy
df = pandas.DataFrame(numpy.random.randn(5,3),columns=list('ABC'))
df.iloc[-1]['C'] = 3

There i get no error. Why do i get an error in the first statement and not in the second?

Answer

Chain indexing

As the documentation and a couple of other answers on this site ([1], [2]) suggest, chain indexing is considered bad practice and should be avoided.

Since there doesn't seem to be a graceful way of making assignments using integer position based indexing (i.e. .iloc) without violating the chain indexing rule (as of pandas v0.23.4), it is advised to instead use label based indexing (i.e. .loc) for assignment purposes whenever possible.

However, if you absolutely need to access data by row number you can

df.iloc[-1, df.columns.get_loc('c')] = 42

or

df.iloc[[-1, 1], df.columns.get_indexer(['a', 'c'])] = 42

Pandas behaving oddly

From my understanding you're absolutely right to expect the warning when trying to reproduce the error artificially.

What I've found so far is that it depends on how a dataframe is constructed

df = pd.DataFrame({'a': [4, 5, 6], 'c': [3, 2, 1]})
df.iloc[-1]['c'] = 42 # no warning

df = pd.DataFrame({'a': ['x', 'y', 'z'], 'c': ['t', 'u', 'v']})
df.iloc[-1]['c'] = 'f' # no warning

df = pd.DataFrame({'a': ['x', 'y', 'z'], 'c': [3, 2, 1]})
df.iloc[-1]['c'] = 42 # SettingWithCopyWarning: ...

It seems that pandas (at least v0.23.4) handles mixed-type and single-type dataframes differently when it comes to chain assignments [3]

def _check_is_chained_assignment_possible(self):"""Check if we are a view, have a cacher, and are of mixed type.If so, then force a setitem_copy check.Should be called just near setting a valueWill return a boolean if it we are a view and are cached, but asingle-dtype meaning that the cacher should be updated followingsetting."""if self._is_view and self._is_cached:ref = self._get_cacher()if ref is not None and ref._is_mixed_type:self._check_setitem_copy(stacklevel=4, t='referant',force=True)return Trueelif self._is_copy:self._check_setitem_copy(stacklevel=4, t='referant')return False

which appears really odd to me although I'm not sure if it's not expected.

However, there's an old bug with a similar behavour.


UPDATE

According to the developers the above behaviour is expected.

https://en.xdnf.cn/q/72248.html

Related Q&A

Tkinter color name to color object

I need to modify a widgets color in some way, for example, to make it darker, greener, to invert it. The widgets color is given by name, for example, orchid4. How do I get RGB values from a color name …

Creating a TfidfVectorizer over a text column of huge pandas dataframe

I need to get matrix of TF-IDF features from the text stored in columns of a huge dataframe, loaded from a CSV file (which cannot fit in memory). I am trying to iterate over dataframe using chunks but…

Automatically convert jupyter notebook to .py

I know there have been a few questions about this but I have not found anything robust enough.Currently I am using, from terminal, a command that creates .py, then moves them to another folder:jupyter …

Schematron validation with lxml in Python: how to retrieve validation errors?

Im trying to do some Schematron validation with lxml. For the specific application Im working at, its important that any tests that failed the validation are reported back. The lxml documentation menti…

Getting Query Parameters as Dictionary in FastAPI [duplicate]

This question already has answers here:How to get query params including keys with blank values using FastAPI?(2 answers)Closed 6 months ago.I spent last month learning Flask, and am now moving on to …

Python Generated Signature for S3 Post

I think Ive read nearly everything there is to read on base-64 encoding of a signature for in-browser, form-based post to S3: old docs and new docs. For instance:http://doc.s3.amazonaws.com/proposals/…

Bringing a classifier to production

Ive saved my classifier pipeline using joblib: vec = TfidfVectorizer(sublinear_tf=True, max_df=0.5, ngram_range=(1, 3)) pac_clf = PassiveAggressiveClassifier(C=1) vec_clf = Pipeline([(vectorizer, vec)…

how to count the frequency of letters in text excluding whitespace and numbers? [duplicate]

This question already has answers here:Using a dictionary to count the items in a list(10 answers)Closed last year.Use a dictionary to count the frequency of letters in the input string. Only letters s…

Fastest algorithm for finding overlap between two very large lists?

Im trying to build an algorithm in Python to filter a large block of RDF data. I have one list consisting of about 70 thousand items formatted like <"datum">.I then have about 6GB worth…

Call Postgres SQL stored procedure From Django

I am working on a Django Project with a Postgres SQL Database. I have written a stored procedure that runs perfectly on Postgres.Now I want to call that stored procedure from Django 1.5 .. I have writt…