Question 1

I have this code to create a swarmplot from data from a DataFrame:

df = pd.DataFrame({"Refined__Some_ID":some_id_list,"Refined_Age":age_list,"Name":name_list                   })
#Creating dataframe with strings from the lists
select  = df.apply(lambda row : any([isinstance(e, str) for e in row  ]),axis=1) 
#Exlcluding data from select in a new dataframe
dfAnalysis = df[~select]
dfAnalysis['Refined_Age'].replace('', np.nan, inplace=True)
dfAnalysis = dfAnalysis.dropna()
dfAnalysis['Refined_Age'] = dfAnalysis['Refined_Age'].apply(int)
# print dfAnalysis
print type(dfAnalysis['Refined_Patient_Age'][1])
g = sns.swarmplot(x = dfAnalysis['Refined_ID'],y = dfAnalysis['Refined_Age'], hue = dfAnalysis['Name'], orient="v")
g.set_xticklabels(g.get_xticklabels(),rotation=30)
# print g

It's taking a crazy amount of time to run (14 hours and counting!). How can I speed it up? Also, why is the code so slow in the first place?

The 3 lists being included in the dataframe are from a Couchdb database with about 320k documents.

UPDATE 1

I had intended to view the first 20 categories only but excluded the code to do so.

The line should have been:

x = dfAnalysis['Refined_ID'].iloc[:20]

Question 2

Do you really mean a swarmplot with several hundred thousand points? Besides it's gonna take forever, it's nonsense. Try with the first 1000 and see what kind of mess you get. Then use a boxplot or a violinplot instead. Try to understand your tools before using them.

From the docstring:

[...] it does not scale as well to large numbers of observations (both in terms of the ability to show all the points and in terms of the computation neededto arrange them).

How to optimize this Pandas code to run faster

Related Q&A

i was creating a REST api using flask and while i was about to test it on postman I saw that error

Web scrape get drop-down menu data python

TypeError(unsupported operand type(s) for ** or pow(): str and int,)

How can I do assignment in a List Comprehension? [duplicate]

KeyError: column_name

Find starting and ending indices of list chunks satisfying given condition

How can I Scrape Business Email Contact with python?

Python class that works as list of lists

GPA Python Assignment [closed]

Tuple Errors Python