using complex conditions to form a pandas data frame from the existing one

2024/10/12 8:16:06

I've got the following dataframe containing function names, their arguments, the default values of the arguments and argument types:

FULL_NAME    ARGUMENT    DEF_VALS    TYPE
'function1'  'f1_arg1'   NAN         'NoneType'   
'function1'  'f1_arg2'   NAN          NAN
'function1'  'f1_arg3'   NAN          NAN
'function2'  'f2_arg1'   0            'int'
'function3'  'f3_arg1'   True         'bool'
'function3'  'f3_arg2'   'something'  'str'

This dataframe can be reproduced as follows:

import pandas as pdD = {'FULL_NAME': ['function1', 'function1', 'function1', 'function2', 'function3', 'function3'], 'ARGUMENT': ['f1_arg1', 'f1_arg2', 'f1_arg3', 'f2_arg1', 'f3_arg1', 'f3_arg2'], 'DEF_VAL': [float('nan'), float('nan'), float('nan'), 0, True, 'something'], 'TYPE': ['NoneType', float('nan'), float('nan'), 'int', 'bool', 'str']}
dataframe = pd.DataFrame(D)

What I'm trying to obtain as a result must look this way:

args                       function
[a1=NONE, a2=, a3=]        function1(f1_arg1=a1, f1_arg2=a2, f1_arg3=a3)
[a1=0]                     function2(f2_arg1=a1)
[a1=True, a2=something]    function3(f3_arg1=a1, f3_arg2=a2)

All the values in the columns 'FULL_NAME' and 'ARGUMENT' are strings.

As regards a{i}, a{i} should be equal to an argument default value unless the default value is NAN and its type is NAN (in this case it should be followed by the '=' sign). If the default value of the argument is NAN but the type is NoneType then a{i} must be None.

This can be achieved in the following way (the solution was suggested here):

df['args'] = 'a'+(df.groupby('FULL_NAME').cumcount()+1).astype(str)df['ARGUMENT'] = df['ARGUMENT']+ '=' + df['args']df['args'] += '='df['args'] = df.apply(lambda x: x['args']+'NONE' if x['TYPE'] == 'NoneType' else x['args'] if pd.isnull(x['TYPE']) else x['args']+str(x['DEF_VAL']),1   ) ndf = pd.concat([pd.DataFrame(df.groupby('FULL_NAME')['ARGUMENT'].apply(tuple)),pd.DataFrame(df.groupby('FULL_NAME')['args'].apply(list))],1)ndf['function'] = (ndf.reset_index()['FULL_NAME'] + ndf.reset_index()['ARGUMENT'].apply(str)).tolist()ndf = ndf.reset_index(drop=True).drop('ARGUMENT',1)ndf['function'].replace(["'",",\)"],["",")"],regex=True,inplace=True)

However, I would like to impose one important condition. Namely, some of those functions are actually class methods and the initial dataframe may look like this:

FULL_NAME    ARGUMENT    DEF_VAL      TYPE
'function1'  'self'      NAN          NAN
'function1'  'f1_arg2'   0            'int'
'function1'  'f1_arg3'   NAN          TypeNone
'function2'  'f2_arg1'   0            'int'
'function3'  'f3_arg1'   True         'bool'
'function3'  'f3_arg2'   'something'  'str'

In this case I would like 'self' to be ignored and the resulting frame look like this:

args                       function
[a1=0, a2=None]            function1(f1_arg2=a1, f1_arg3=a2)
[a1=0]                     function2(f2_arg1=a1)
[a1=True, a2=something]    function3(f3_arg1=a1, f3_arg2=a2)

The self argument is ignored. How do I achieve it by using pandas?

Answer

You can do df = df[df['ARGUMENT'] != 'self'].copy(deep=True) to remove all the rows with ARGUMENT equal to "self" before apply the solution.

P.S. I am also guessing you only care about remove "self" if it's the first argument, in that case, the appropriate preprocessing step would be

df = df[~((df['ARGUMENT'] == 'self') &(df.groupby('FULL_NAME').cumcount() == 0))
].copy(deep=True)
https://en.xdnf.cn/q/118223.html

Related Q&A

Crawl and scrape a complete site with scrapy

import scrapy from scrapy import Request#scrapy crawl jobs9 -o jobs9.csv -t csv class JobsSpider(scrapy.Spider): name = "jobs9" allowed_domains = ["vapedonia.com"] start_urls = [&qu…

Why is pip freezing and not showing a module, although pip install says its already installed?

Im following these instructions to install Odoo on Mac. It required that I install all the Python modules for the user like so: sudo pip install -—user -r requirements.txt(*A note about the --user par…

Flatten a list of strings which contains sublists

I have a list of strings which contains a sublist os strings:ids = [uspotify:track:3ftnDaaL02tMeOZBunIwls, uspotify:track:4CKjTXDDWIrS0cwSA9scgk, [uspotify:track:6oRbm1KOqskLTFc1rvGi5F, uspotify:track:…

Portscanner producing possible error

I have written a simple portscanner in python. I have already asked something about it, you can find the code here.I corrected the code and now am able to create a connection to e.g. stackoverflow.netB…

Import error on first-party library with dev_appserver.py

On Ubuntu 16.04, am suddenly getting import errors from the local GAE development server. The local dev server starts up, including the admin interface, but app no longer loads.Native python imports o…

Split dictionary based on values

I have a dictionary:data = {cluster: A, node: B, mount: [C, D, E]}Im trying to split the dictionary data into number of dictionaries based on values in key mount.I tried using:for value in data.items()…

Using defaultdict to parse multi delimiter file

I need to parse a file which has contents that look like this:20 31022550 G 1396 =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 A:2:60.00:33.00:37.00:2:0:0.02:0.02:40.00:2:0.98:126.00…

Iterating in DataFrame and writing down the index of the values where a condition is met

I have a data made of 20 rows and 2500 columns. Each column is a unique product and rows are time series, results of measurements. Therefore each product is measured 20 times and there are 2500 product…

Access denied to ClearDB database using Python/Django on Heroku

Im trying to build a webapp on Heroku using Python/Django, and I just followed the tutorial to set up a Django project and push it to Heroku. However, I can never even get to the normal Django "I…

Replacing a line in a file based on a keyword search, by line from another file

Here is my file1: agadfadsdffasdfElement 1, 0, 0, 0PcomElement 2Here is my file2: PBARElement 1, 100, 200, 300, 400Element 2Continue...I want to search with a keyword, "Element 1" in file1,…