What are the different methods to retrieve elements in a pandas Series?

2024/9/20 12:38:40

There are at least 4 ways to retrieve elements in a pandas Series: .iloc, .loc .ix and using directly the [] operator.

What's the difference between them ? How do they handle missing labels/out of range positions ?

Answer

The general idea is that while .iloc and .loc are guaranteed to perform the look-up by position and index(label) respectively, they are a bit slower than using .ix or directly the [] operator. These two former methods perform the look-up by index or position depending of the type of index in the Series to look-up and the data that should be looked-up.

There is however a bit of inconsistency also in using .iloc and .loc, as described in this page.

The following tables summarise the behaviour of these 4 methods of lookup depending (a) if the Series to look-up has an integer or a string index (I do not consider for the moment the date index), (b) if the required data is a single element, a slice index or a list (yes, the behaviour change!) and (c) if the index is found or not in the data.

The following examples works with pandas 0.17.1, NumPy 1.10.4, Python 3.4.3.

Case 1: Integer index

s = pd.Series(np.arange(100,105), index=np.arange(10,15))
s
10    100
11    101
12    102
13    103
14    104** Single element **             ** Slice **                                       ** Tuple **
s[0]       -> LAB -> KeyError    s[0:2]        -> POS -> {10:100, 11:101}          s[[1,3]]        -> LAB -> {1:NaN, 3:Nan}
s[13]      -> LAB -> 103         s[10:12]      -> POS -> empty Series              s[[12,14]]      -> LAB -> {12:102, 14:104}
---                              ---                                               ---
s.ix[0]    -> LAB -> KeyError    s.ix[0:2]     -> LAB -> empty Series              s.ix[[1,3]]     -> LAB -> {1:NaN, 3:Nan}
s.ix[13]   -> LAB -> 103         s.ix[10:12]   -> LAB -> {10:100, 11:101, 12:102}  s.ix[[12,14]]   -> LAB -> {12:102, 14:104}
---                              ---                                               ---
s.iloc[0]  -> POS -> 100         s.iloc[0:2]   -> POS -> {10:100, 11:101}          s.iloc[[1,3]]   -> POS -> {11:101, 13:103}
s.iloc[13] -> POS -> IndexError  s.iloc[10:12] -> POS -> empty Series              s.iloc[[12,14]] -> POS -> IndexError
---                              ---                                               ---
s.loc[0]   -> LAB -> KeyError    s.loc[0:2]    -> LAB -> empty Series              s.loc[[1,3]]    -> LAB -> KeyError
s.loc[13]  -> LAB -> 103         s.loc[10:12]  -> LAB -> {10:100, 11:101, 12:102}  s.loc[[12,14]]  -> LAB -> {12:102, 14:104}

Case 2: Series with string index

s = pd.Series(np.arange(100,105), index=['a','b','c','d','e'])
s
a    100
b    101
c    102
d    103
e    104** Single element **                             ** Slice **                                           ** Tuple **
s[0]        -> POS -> 100                        s[0:2]          -> POS -> {'a':100,'b':101}           s[[0,2]]          -> POS -> {'a':100,'c':102} 
s[10]       -> LAB, POS -> KeyError, IndexError  s[10:12]        -> POS -> Empty Series                s[[10,12]]        -> POS -> IndexError 
s['a']      -> LAB -> 100                        s['a':'c']      -> LAB -> {'a':100,'b':101, 'c':102}  s[['a','c']]      -> LAB -> {'a':100,'b':101, 'c':102} 
s['g']      -> POS,LAB -> TypeError, KeyError    s['f':'h']      -> LAB -> Empty Series                s[['f','h']]      -> LAB -> {'f':NaN, 'h':NaN}
---                                              ---                                                   ---
s.ix[0]     -> POS -> 100                        s.ix[0:2]       -> POS -> {'a':100,'b':101}           s.ix[[0,2]]       -> POS -> {'a':100,'c':102} 
s.ix[10]    -> POS -> IndexError                 s.ix[10:12]     -> POS -> Empty Series                s.ix[[10,12]]     -> POS -> IndexError 
s.ix['a']   -> LAB -> 100                        s.ix['a':'c']   -> LAB -> {'a':100,'b':101, 'c':102}  s.ix[['a','c']]   -> LAB -> {'a':100,'b':101, 'c':102} 
s.ix['g']   -> POS, LAB -> TypeError, KeyError   s.ix['f':'h']   -> LAB -> Empty Series                s.ix[['f','h']]   -> LAB -> {'f':NaN, 'h':NaN}
---                                              ---                                                   ---
s.iloc[0]   -> POS -> 100                        s.iloc[0:2]     -> POS -> {'a':100,'b':101}           s.iloc[[0,2]]     -> POS -> {'a':100,'c':102} 
s.iloc[10]  -> POS -> IndexError                 s.iloc[10:12]   -> POS -> Empty Series                s.iloc[[10,12]]   -> POS -> IndexError 
s.iloc['a'] -> LAB -> TypeError                  s.iloc['a':'c'] -> POS -> ValueError                  s.iloc[['a','c']] -> POS -> TypeError    
s.iloc['g'] -> LAB -> TypeError                  s.iloc['f':'h'] -> POS -> ValueError                  s.iloc[['f','h']] -> POS -> TypeError
---                                              ---                                                   ---
s.loc[0]    -> LAB -> KeyError                   s.loc[0:2]     -> LAB -> TypeError                   s.loc[[0,2]]     -> LAB -> KeyError 
s.loc[10]   -> LAB -> KeyError                   s.loc[10:12]   -> LAB -> TypeError                   s.loc[[10,12]]   -> LAB -> KeyError 
s.loc['a']  -> LAB-> 100                         s.loc['a':'c'] -> LAB -> {'a':100,'b':101, 'c':102}  s.loc[['a','c']] -> LAB -> {'a':100,'c':102}    
s.loc['g']  -> LAB -> KeyError                   s.loc['f':'h'] -> LAB -> Empty Series                s.loc[['f','h']] -> LAB -> KeyError

Note that there are three ways to handle not found labels/out of range positions: an exception is thrown, a null Series is returned or a Series with the demanded keys associated to NaN values is returned.

Also note that when querying using slicing by position the end element is excluded, but when querying by label the ending element is included.

https://en.xdnf.cn/q/119221.html

Related Q&A

Speaker recognition - Bad Request error on microsoft oxford

I am using the python wrapper that has been given in the SDK section. Ive been trying to enroll a voice file for a created profile using the python API.I was able to create a profile and list all profi…

Remove list of phrases from string

I have an array of phrases: bannedWords = [hi, hi you, hello, and you]I want to take a sentence like "hi, how are tim and you doing" and get this:", how are tim doing"Exact case mat…

ImportError: No module named... (basics?) [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 10 years ago.Improv…

How to pass python list address

I want to convert c++ code to python. I have created a python module using SWIG to access c++ classes.Now I want to pass the following c++ code to PythonC++#define LEVEL 3double thre[LEVEL] = { 1.0l, 1…

Python open says file doesnt exist when it does

I am trying to work out why my Python open call says a file doesnt exist when it does. If I enter the exact same file url in a browser the photo appears.The error message I get is:No such file or direc…

How to map one list and dictionary in python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 4 years ago.Improve…

Sorting date inside of files (Python)

I have a txt file with names and dates like this name0 - 05/09/2020 name1 - 14/10/2020 name2 - 02/11/2020 How can I sort the text file by date? so that the file will end up like this name2 - 02/11/202…

Dicing in python

Code:-df = pd.DataFrame({col1:t, col2:wordList}) df.columns=[DNT,tweets] df.DNT = pd.to_datetime(df.DNT, errors=coerce) check=df[ (df.DNT < 09:20:00) & (df.DNT > 09:00:00) ]Dont know why this…

How to collect all specified hrefs?

In this test model I can collect the href value for the first (tr, class_=rowLive), Ive tried to create a loop to collect all the others href but it always gives IndentationError: expected an indented …

How do I sort data highest to lowest in Python from a text file?

I have tried multiple methods in doing this but none of them seem to work The answer comes up alphabetically insteadf=open("class2.txt", "r") scores=myfile.readlines() print(sorted(…