For a relatively big Pandas DataFrame (a few 100k rows), I'd like to create a series that is a result of an apply function. The problem is that the function is not very fast and I was hoping that it can be sped up somehow.
df = pd.DataFrame({'value-1': [1, 2, 3, 4, 5],'value-2': [0.1, 0.2, 0.3, 0.4, 0.5],'value-3': somenumbers...,'value-4': more numbers...,'choice-index': [1, 1, np.nan, 2, 1]
})def func(row):i = row['choice-index']return np.nan if math.isnan(i) else row['value-%d' % i]df['value'] = df.apply(func, axis=1, reduce=True)# expected value = [1, 2, np.nan, 0.4, 5]
Any suggestions are welcome.
Update
A very small speedup (~1.1) can be achieved by pre-caching the selected columns. func
would change to:
cached_columns = [None, 'value-1', 'value-2', 'value-3', 'value-4']
def func(row):i = row['choice-index']return np.nan if math.isnan(i) else row[cached_columns[i]]
But I was hoping for greater speedups...