There are at least 4 ways to retrieve elements in a pandas Series: .iloc, .loc .ix and using directly the [] operator.
What's the difference between them ? How do they handle missing labels/out of range positions ?
There are at least 4 ways to retrieve elements in a pandas Series: .iloc, .loc .ix and using directly the [] operator.
What's the difference between them ? How do they handle missing labels/out of range positions ?
The general idea is that while .iloc and .loc are guaranteed to perform the look-up by position and index(label) respectively, they are a bit slower than using .ix or directly the [] operator. These two former methods perform the look-up by index or position depending of the type of index in the Series to look-up and the data that should be looked-up.
There is however a bit of inconsistency also in using .iloc and .loc, as described in this page.
The following tables summarise the behaviour of these 4 methods of lookup depending (a) if the Series to look-up has an integer or a string index (I do not consider for the moment the date index), (b) if the required data is a single element, a slice index or a list (yes, the behaviour change!) and (c) if the index is found or not in the data.
The following examples works with pandas 0.17.1, NumPy 1.10.4, Python 3.4.3.
s = pd.Series(np.arange(100,105), index=np.arange(10,15))
s
10 100
11 101
12 102
13 103
14 104** Single element ** ** Slice ** ** Tuple **
s[0] -> LAB -> KeyError s[0:2] -> POS -> {10:100, 11:101} s[[1,3]] -> LAB -> {1:NaN, 3:Nan}
s[13] -> LAB -> 103 s[10:12] -> POS -> empty Series s[[12,14]] -> LAB -> {12:102, 14:104}
--- --- ---
s.ix[0] -> LAB -> KeyError s.ix[0:2] -> LAB -> empty Series s.ix[[1,3]] -> LAB -> {1:NaN, 3:Nan}
s.ix[13] -> LAB -> 103 s.ix[10:12] -> LAB -> {10:100, 11:101, 12:102} s.ix[[12,14]] -> LAB -> {12:102, 14:104}
--- --- ---
s.iloc[0] -> POS -> 100 s.iloc[0:2] -> POS -> {10:100, 11:101} s.iloc[[1,3]] -> POS -> {11:101, 13:103}
s.iloc[13] -> POS -> IndexError s.iloc[10:12] -> POS -> empty Series s.iloc[[12,14]] -> POS -> IndexError
--- --- ---
s.loc[0] -> LAB -> KeyError s.loc[0:2] -> LAB -> empty Series s.loc[[1,3]] -> LAB -> KeyError
s.loc[13] -> LAB -> 103 s.loc[10:12] -> LAB -> {10:100, 11:101, 12:102} s.loc[[12,14]] -> LAB -> {12:102, 14:104}
s = pd.Series(np.arange(100,105), index=['a','b','c','d','e'])
s
a 100
b 101
c 102
d 103
e 104** Single element ** ** Slice ** ** Tuple **
s[0] -> POS -> 100 s[0:2] -> POS -> {'a':100,'b':101} s[[0,2]] -> POS -> {'a':100,'c':102}
s[10] -> LAB, POS -> KeyError, IndexError s[10:12] -> POS -> Empty Series s[[10,12]] -> POS -> IndexError
s['a'] -> LAB -> 100 s['a':'c'] -> LAB -> {'a':100,'b':101, 'c':102} s[['a','c']] -> LAB -> {'a':100,'b':101, 'c':102}
s['g'] -> POS,LAB -> TypeError, KeyError s['f':'h'] -> LAB -> Empty Series s[['f','h']] -> LAB -> {'f':NaN, 'h':NaN}
--- --- ---
s.ix[0] -> POS -> 100 s.ix[0:2] -> POS -> {'a':100,'b':101} s.ix[[0,2]] -> POS -> {'a':100,'c':102}
s.ix[10] -> POS -> IndexError s.ix[10:12] -> POS -> Empty Series s.ix[[10,12]] -> POS -> IndexError
s.ix['a'] -> LAB -> 100 s.ix['a':'c'] -> LAB -> {'a':100,'b':101, 'c':102} s.ix[['a','c']] -> LAB -> {'a':100,'b':101, 'c':102}
s.ix['g'] -> POS, LAB -> TypeError, KeyError s.ix['f':'h'] -> LAB -> Empty Series s.ix[['f','h']] -> LAB -> {'f':NaN, 'h':NaN}
--- --- ---
s.iloc[0] -> POS -> 100 s.iloc[0:2] -> POS -> {'a':100,'b':101} s.iloc[[0,2]] -> POS -> {'a':100,'c':102}
s.iloc[10] -> POS -> IndexError s.iloc[10:12] -> POS -> Empty Series s.iloc[[10,12]] -> POS -> IndexError
s.iloc['a'] -> LAB -> TypeError s.iloc['a':'c'] -> POS -> ValueError s.iloc[['a','c']] -> POS -> TypeError
s.iloc['g'] -> LAB -> TypeError s.iloc['f':'h'] -> POS -> ValueError s.iloc[['f','h']] -> POS -> TypeError
--- --- ---
s.loc[0] -> LAB -> KeyError s.loc[0:2] -> LAB -> TypeError s.loc[[0,2]] -> LAB -> KeyError
s.loc[10] -> LAB -> KeyError s.loc[10:12] -> LAB -> TypeError s.loc[[10,12]] -> LAB -> KeyError
s.loc['a'] -> LAB-> 100 s.loc['a':'c'] -> LAB -> {'a':100,'b':101, 'c':102} s.loc[['a','c']] -> LAB -> {'a':100,'c':102}
s.loc['g'] -> LAB -> KeyError s.loc['f':'h'] -> LAB -> Empty Series s.loc[['f','h']] -> LAB -> KeyError
Note that there are three ways to handle not found labels/out of range positions: an exception is thrown, a null Series is returned or a Series with the demanded keys associated to NaN
values is returned.
Also note that when querying using slicing by position the end element is excluded, but when querying by label the ending element is included.