using complex conditions to form a pandas data frame from the existing one

2024/10/12 8:16:06

I've got the following dataframe containing function names, their arguments, the default values of the arguments and argument types:

'function1'  'f1_arg1'   NAN         'NoneType'   
'function1'  'f1_arg2'   NAN          NAN
'function1'  'f1_arg3'   NAN          NAN
'function2'  'f2_arg1'   0            'int'
'function3'  'f3_arg1'   True         'bool'
'function3'  'f3_arg2'   'something'  'str'

This dataframe can be reproduced as follows:

import pandas as pdD = {'FULL_NAME': ['function1', 'function1', 'function1', 'function2', 'function3', 'function3'], 'ARGUMENT': ['f1_arg1', 'f1_arg2', 'f1_arg3', 'f2_arg1', 'f3_arg1', 'f3_arg2'], 'DEF_VAL': [float('nan'), float('nan'), float('nan'), 0, True, 'something'], 'TYPE': ['NoneType', float('nan'), float('nan'), 'int', 'bool', 'str']}
dataframe = pd.DataFrame(D)

What I'm trying to obtain as a result must look this way:

args                       function
[a1=NONE, a2=, a3=]        function1(f1_arg1=a1, f1_arg2=a2, f1_arg3=a3)
[a1=0]                     function2(f2_arg1=a1)
[a1=True, a2=something]    function3(f3_arg1=a1, f3_arg2=a2)

All the values in the columns 'FULL_NAME' and 'ARGUMENT' are strings.

As regards a{i}, a{i} should be equal to an argument default value unless the default value is NAN and its type is NAN (in this case it should be followed by the '=' sign). If the default value of the argument is NAN but the type is NoneType then a{i} must be None.

This can be achieved in the following way (the solution was suggested here):

df['args'] = 'a'+(df.groupby('FULL_NAME').cumcount()+1).astype(str)df['ARGUMENT'] = df['ARGUMENT']+ '=' + df['args']df['args'] += '='df['args'] = df.apply(lambda x: x['args']+'NONE' if x['TYPE'] == 'NoneType' else x['args'] if pd.isnull(x['TYPE']) else x['args']+str(x['DEF_VAL']),1   ) ndf = pd.concat([pd.DataFrame(df.groupby('FULL_NAME')['ARGUMENT'].apply(tuple)),pd.DataFrame(df.groupby('FULL_NAME')['args'].apply(list))],1)ndf['function'] = (ndf.reset_index()['FULL_NAME'] + ndf.reset_index()['ARGUMENT'].apply(str)).tolist()ndf = ndf.reset_index(drop=True).drop('ARGUMENT',1)ndf['function'].replace(["'",",\)"],["",")"],regex=True,inplace=True)

However, I would like to impose one important condition. Namely, some of those functions are actually class methods and the initial dataframe may look like this:

'function1'  'self'      NAN          NAN
'function1'  'f1_arg2'   0            'int'
'function1'  'f1_arg3'   NAN          TypeNone
'function2'  'f2_arg1'   0            'int'
'function3'  'f3_arg1'   True         'bool'
'function3'  'f3_arg2'   'something'  'str'

In this case I would like 'self' to be ignored and the resulting frame look like this:

args                       function
[a1=0, a2=None]            function1(f1_arg2=a1, f1_arg3=a2)
[a1=0]                     function2(f2_arg1=a1)
[a1=True, a2=something]    function3(f3_arg1=a1, f3_arg2=a2)

The self argument is ignored. How do I achieve it by using pandas?


You can do df = df[df['ARGUMENT'] != 'self'].copy(deep=True) to remove all the rows with ARGUMENT equal to "self" before apply the solution.

P.S. I am also guessing you only care about remove "self" if it's the first argument, in that case, the appropriate preprocessing step would be

df = df[~((df['ARGUMENT'] == 'self') &(df.groupby('FULL_NAME').cumcount() == 0))

