How to optimize this Pandas code to run faster

2024/7/7 6:07:24

I have this code to create a swarmplot from data from a DataFrame:

df = pd.DataFrame({"Refined__Some_ID":some_id_list,"Refined_Age":age_list,"Name":name_list                   })
#Creating dataframe with strings from the lists
select  = df.apply(lambda row : any([isinstance(e, str) for e in row  ]),axis=1) 
#Exlcluding data from select in a new dataframe
dfAnalysis = df[~select]
dfAnalysis['Refined_Age'].replace('', np.nan, inplace=True)
dfAnalysis = dfAnalysis.dropna()
dfAnalysis['Refined_Age'] = dfAnalysis['Refined_Age'].apply(int)
# print dfAnalysis
print type(dfAnalysis['Refined_Patient_Age'][1])
g = sns.swarmplot(x = dfAnalysis['Refined_ID'],y = dfAnalysis['Refined_Age'], hue = dfAnalysis['Name'], orient="v")
g.set_xticklabels(g.get_xticklabels(),rotation=30)
# print g

It's taking a crazy amount of time to run (14 hours and counting!). How can I speed it up? Also, why is the code so slow in the first place?

The 3 lists being included in the dataframe are from a Couchdb database with about 320k documents.

UPDATE 1

I had intended to view the first 20 categories only but excluded the code to do so.

The line should have been:

x = dfAnalysis['Refined_ID'].iloc[:20]
Answer

Do you really mean a swarmplot with several hundred thousand points? Besides it's gonna take forever, it's nonsense. Try with the first 1000 and see what kind of mess you get. Then use a boxplot or a violinplot instead. Try to understand your tools before using them.

From the docstring:

[...] it does not scale as well to large numbers of observations (both in terms of the ability to show all the points and in terms of the computation neededto arrange them).

https://en.xdnf.cn/q/120465.html

Related Q&A

i was creating a REST api using flask and while i was about to test it on postman I saw that error

File "c:\Users\kally\rest\code\app.py", line 3, in <module>from flask_jwt import JWTFile "C:\Users\kally\AppData\Roaming\Python\Python310\site-packages\flask_jwt\__init__.py",…

Web scrape get drop-down menu data python

I am trying to get a list of all countries in the webpage https://www.nexmo.com/products/sms. I see the list is displayed in the drop-down. After inspecting the page, I tried the following code but I m…

TypeError(unsupported operand type(s) for ** or pow(): str and int,)

import mathA = input("Enter Wright in KG PLease :") B = input("Enter Height in Meters Please :")while (any(x.isalpha() for x in A)):print("No Letters Please")A = input(&qu…

How can I do assignment in a List Comprehension? [duplicate]

This question already has answers here:How can I do assignments in a list comprehension?(8 answers)Closed 1 year ago.Generally, whenever I do a for loop in python, I try to convert it into a list comp…

KeyError: column_name

I am writing a python code, it should read the values of columns but I am getting the KeyError: column_name error. Can anyone please tell me how to fix this issue. import numpy as np from sklearn.clust…

Find starting and ending indices of list chunks satisfying given condition

I am trying to find the start and stop indices of chunks of positive numbers in a list.cross = [7,5,8,0,0,0,0,2,5,8,0,0,0,0,8,7,9,3,0,0,0,3,2,1,4,5,0,0,0,7,5] For the given example input, the desired o…

How can I Scrape Business Email Contact with python?

this morning I wanted to create a little Software/Script in Python, it was 6am when I started and now Im about to become crazy because its 22pm and I have nothing that works.So basically, I want to do …

Python class that works as list of lists

Im trying to create a python class that can work as a list of lists. However, all Ive managed to develop so far is,class MyNestedList(list): ...Im aware that the above code will work as,my = MyNestedLi…

GPA Python Assignment [closed]

Its difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying thi…

Tuple Errors Python

I opened Python and attempted to run the following script (btw, this script is what was directly given to me, I havent edited it in any way as its part of an assignment aside from entering the username…