When should I use dt.column vs dt[column] pandas?

2024/9/19 9:25:38

I was doing some calculations and row manipulations and realised that for some tasks such as mathematical operations they both worked e.g.

d['c3'] = d.c1 / d. c2
d['c3'] = d['c1'] / d['c2']

I was wondering whether there are some instances where using one is better than the other or what most people used.

Answer

You should really just stop accessing columns as attributes and get into the habit of accessing using square brackets []. This avoids errors where your column names have illegal characters in python, embedded spaces, where your column name shares the same name as a built-in method, and ambiguous usage where for instance you have a column named index:

In[13]:
df = pd.DataFrame(np.random.randn(5,4), columns=[' a', 'mean', 'index', '2'])
df.columns.tolist()Out[13]: [' a', 'mean', 'index', '2']

So if we now try to access column 2:

In[14]:
df.2File "<ipython-input-14-0490d6ae2ca0>", line 1df.2^
SyntaxError: invalid syntax

It fails as it's an invalid name but df['2'] would work

In[15]:df.a
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-b9872a8755ac> in <module>()
----> 1 df.aC:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)3079             if name in self._info_axis:3080                 return self[name]
-> 3081             return object.__getattribute__(self, name)3082 3083     def __setattr__(self, name, value):AttributeError: 'DataFrame' object has no attribute 'a'

So because this is really ' a' with a leading space (this would also fail if there were spaces anywhere in the column name) it fails on KeyError

In[16]:
df.meanOut[16]: 
<bound method DataFrame.mean of           a      mean     index         2
0 -0.022122  1.858308  1.823314  0.238105
1 -0.461662  0.482116  1.848322  1.946922
2  0.615889 -0.285043  0.201804 -0.656065
3  0.159351 -1.151883 -1.858024  0.088460
4  1.066735  1.015585  0.586550 -1.898469>

This is more subtle, it looks like it did something but in fact it just returns the method address, here ipython is just pretty printing it

In[17]:
df.indexOut[17]: RangeIndex(start=0, stop=5, step=1)

Above we have ambiguous intentions, because the index is a member it's returned that instead of the column 'index'.

So you should stop accessing columns as attributes and always use square brackets as it avoids all the problems above

https://en.xdnf.cn/q/72554.html

Related Q&A

Quiver matplotlib : arrow with the same sizes

Im trying to do a plot with quiver but I would like the arrows to all have the same size.I use the following input :q = ax0.quiver(x, y, dx, dy, units=xy ,scale=1) But even if add options like norm = t…

How to convert Tensorflow dataset to 2D numpy array

I have a TensorFlow dataset which contains nearly 15000 multicolored images with 168*84 resolution and a label for each image. Its type and shape are like this: < ConcatenateDataset shapes: ((168, 8…

CSV remove field value wrap quotes

Im attempting to write a list to a csv, however when I do so I get wrapper quotes around my field values:number1,number2 "1234,2345" "1235.7890" "2345.5687"Using this code…

Python - Py_Initialize unresolved during compilation

I have statically compiled Python2.7 without any error. To test my build, I use the following snippet: #include "Python.h" int main() {Py_Initialize(); }And I am compiling it like this:$ gcc…

Python download large csv file from a url line by line for only 10 entries

I have a large csv file of the client and shared via a url to download and I want to download it line by line or by bytes and I want to limit only for 10 entries.I have the following code which will do…

Flask-Login still logged in after use logouts when using remember_me

To logout a user in flask using Flask-login, i simply call logout_user(), but after adding some additional checks with session, after I click logout and click back to "login page" again, im s…

How to write integers to a file

I need to write ranks[a], ranks[b], countto a file, each time on a new lineI am using:file = open("matrix.txt", "w") for (a, b), count in counts.iteritems():file.write(ranks[a], ran…

seaborn changing xticks from float to int

I am plotting a graph with seaborn as sns and pylab as plt:plt.figure(figsize=(10,10),) sns.barplot(y = whatever_y, x = whatever_x , data=mydata) plt.xticks(fontsize=14, fontweight=bold)The xticks are …

What are the use cases for a Python distribution?

Im developing a distribution for the Python package Im writing so I can post it on PyPI. Its my first time working with distutils, setuptools, distribute, pip, setup.py and all that and Im struggling a…

Recovering a file deleted with python

So, I deleted a file using python. I cant find it in my recycling bin. Is there a way I can undo it or something. Thanks in advance.EDIT: I used os.remove. I have tried Recuva, but it doesnt seem to fi…