Inputs required in python on csv files

2024/11/17 3:54:44

I have a problem and need to solve it using Pandas/Python. Not sure how to achieve it and would be great if someone help here to build the logic.

I have to generate the output file as below:

df = pd.DataFrame({'priority': [1, 1, 1, 2, 2, 3],'db_name': ['corp', 'corp', 'corp', 'sales', 'sales', 'market'],'tbl_name': ['c_tbl1', 'c_tbl1', 'c_tbl1', 's_tbl1', 's_tbl2', 'm_tbl1'],'partition': ['202301', '202302', '202303', '202301', '202302', '202301'],'size_gb': [5, 5, 10, 1, 2, 3]})

Logic would be like this for priority 1 - three entries presents with different sizes, if the size is 10 GB single entry in output file with t_size = XL or sump of the size create single entry with t_size = XL similarly for other priorities size is less than 3 GB then t_size = S otherwise M.

I tried to loop using Pandas data frames, couldn't proceed as I am not proficient in Python.

Answer

Try it:

df = pd.read_csv("F:\\sales_data.csv", header=0)df = pd.DataFrame(df.pivot_table(index=['priority', 'db_name', 'tbl_name'],values=['partition', 'size_gb'],aggfunc={'partition': lambda x: ";".join(str(v) for v in x), 'size_gb': 'sum'}).to_records())
df['t_size'] = np.where(df['size_gb'] >= 10, 'XL',np.where(df['size_gb'] < 3, 'S', 'M'))df.drop(columns=['size_gb'], inplace=True)
print(df)

Output:

  priority  db_name tbl_name    partition   t_size
0   1        corp   c_tbl1    202301;202302  XL
1   1        corp   c_tbl1    202303         XL
2   2        sales  s_tbl1    202301         S
3   2        sales  s_tbl2    202302         S
4   3        market m_tbl1    202301         M
https://en.xdnf.cn/q/118848.html

Related Q&A

ServiceBusError : Handler failed: tuple object has no attribute get_token

Im getting the below error when i run my code. This code is to requeue the Deadletter messages. Error: Exception has occurred: ServiceBusError Handler failed: tuple object has no attribute get_token. A…

sqlite3.OperationalError: near WHERE: syntax error

I want to update a series of columns Country1, Country2... Country 9 based on a comma delimited string of country names in column Country. Ive programmed a single statement to accomplish this task. cur…

If statement not working correctly in Python 3

This is the start of an RPG I am going to make, and It runs smoothly until I try to change the gender by saying yes or any other of the answers that activate the if statement. Is there something I am f…

pymc3 error. AttributeError: module arviz has no attribute geweke [closed]

Closed. This question needs debugging details. It is not currently accepting answers.Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to repro…

how to prevent duplicate text in the output file while using for loop

I have this code which compares a number to a number(what i called item in my code) in the domain range to see if it is already there. If it its then print to the output file if it is not then only pri…

How to replace \\ with \ without raising an EOL error?

I am reading from a file that contains byte data but when I open the file and store the readline data into a variable it stores it in a string with backslash escapes, So when trying to decode that data…

How to find duplicates in pandas dataframe

Editing. Suppose I have the following series in pandas:>>>p 0 0.0 1 0.0 2 0.0 3 0.3 4 0.3 5 0.3 6 0.3 7 0.3 8 1.0 9 1.0 10 1.0 11 0.2 12 0.2 1…

i have error eol while scanning string literal

i dont know what is the problem im junior on python programer what happened on my code i study but i dnt understand this #fungsi coveragedef coverage ():print("[1] Kota Besar)print("[2] Kota…

How to extract specific data from JSON?

I cant seem to extract specific data from JSON which I retrieved from a link. I wrote this code and seems to work fine up to x [print(x) that is] as you can see from the screenshot-1. But, its giving e…

python csv: getting subset

here is a snapshot of my csv:alex 123f 1 harry fwef 2 alex sef 3 alex gsdf 4 alex wf35 6 harry sdfsdf 3i would like to get the subset of this data where the occurrence of a…