I was doing some calculations and row manipulations and realised that for some tasks such as mathematical operations they both worked e.g.
d['c3'] = d.c1 / d. c2
d['c3'] = d['c1'] / d['c2']
I was wondering whether there are some instances where using one is better than the other or what most people used.
You should really just stop accessing columns as attributes and get into the habit of accessing using square brackets []
. This avoids errors where your column names have illegal characters in python, embedded spaces, where your column name shares the same name as a built-in method, and ambiguous usage where for instance you have a column named index
:
In[13]:
df = pd.DataFrame(np.random.randn(5,4), columns=[' a', 'mean', 'index', '2'])
df.columns.tolist()Out[13]: [' a', 'mean', 'index', '2']
So if we now try to access column 2
:
In[14]:
df.2File "<ipython-input-14-0490d6ae2ca0>", line 1df.2^
SyntaxError: invalid syntax
It fails as it's an invalid name but df['2']
would work
In[15]:df.a
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-15-b9872a8755ac> in <module>()
----> 1 df.aC:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)3079 if name in self._info_axis:3080 return self[name]
-> 3081 return object.__getattribute__(self, name)3082 3083 def __setattr__(self, name, value):AttributeError: 'DataFrame' object has no attribute 'a'
So because this is really ' a'
with a leading space (this would also fail if there were spaces anywhere in the column name) it fails on KeyError
In[16]:
df.meanOut[16]:
<bound method DataFrame.mean of a mean index 2
0 -0.022122 1.858308 1.823314 0.238105
1 -0.461662 0.482116 1.848322 1.946922
2 0.615889 -0.285043 0.201804 -0.656065
3 0.159351 -1.151883 -1.858024 0.088460
4 1.066735 1.015585 0.586550 -1.898469>
This is more subtle, it looks like it did something but in fact it just returns the method address, here ipython is just pretty printing it
In[17]:
df.indexOut[17]: RangeIndex(start=0, stop=5, step=1)
Above we have ambiguous intentions, because the index is a member it's returned that instead of the column 'index'
.
So you should stop accessing columns as attributes and always use square brackets as it avoids all the problems above