Question 1

Please convert below code to execute parallel, Here I'm trying to map nested dictionary with pandas column values. The below code works perfectly but consumes lot of time. Hence looking to parallelize the for loop(Note: df.replace(Source_Dictionary) also did the job but takes triple the time of below code).

df = pd.DataFrame({'one':['bab'],'two':['abb'],'three':['bb']})
Source_Dictionary = {'one':{'dadd':1,'bab':1.5},'two':{'ab':2},'three':{'cc':1,'bb':3}}
required_columns = ['one','two','three']
def Feature_Map(x):df[x] = df[x].map(Source_Dictionary[x]).fillna(0)for i in required_columns:Feature_Map(i)
print(df)one  two  three
0  1.5  0.0      3

Question 2

To speed up your execution you can use multi processing. Number of processes and its performance depends on the resource provided. Let's suppose you can afford 4 processes to be running in parallel.

Your function:

def Feature_Map(x):
df[x] = df[x].map(Source_Dictionary[x]).fillna(0)

Multi processing:

from multiprocessing.pool import ThreadPool
pool = ThreadPool(processes=4)
for i in required_columns:pool.apply_async(Feature_Map, (i))

You can also implement code for waiting till the process has finished execution before exiting.

You can refer to https://docs.python.org/2/library/multiprocessing.html for detailed usage.

Make for loop execute parallely with Pandas columns

Related Q&A

Pre-calculate Excel formulas when exporting data with python?

Validating Tkinter Entry Box

Imaplib with GMail offsets uids

Accessing a folder containing .wav files [closed]

What is the right Python idiom for sorting by a single criterion (field or key)? [closed]

Incorrect checking of fields in list using a for loop

Why myVar = strings.Fields(scanner.Text()) take much more time than comparable operation in python?

When reading an excel file in Python can we know which column/field is filtered

Error:init() missing 1 required positional argument: rec

Maya: Connect two Joint chains with Parent Constraint