Normal Distribution Plot by name from pandas dataframe

2024/10/11 10:20:17

I have a dataframe like below:

dateTime        Name    DateTime        day seconds zscore
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 13:41 james   11/1/2016 13:41 Tue 49260   -0.836236954
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 13:41 james   11/1/2016 13:41 Tue 49260   -0.836236954
11/1/2016 13:41 james   11/1/2016 13:41 Tue 49260   -0.836236954
11/1/2016 13:41 james   11/1/2016 13:41 Tue 49260   -0.836236954
11/1/2016 13:41 james   11/1/2016 13:41 Tue 49260   -0.836236954
11/1/2016 13:41 james   11/1/2016 13:41 Tue 49260   -0.836236954
11/1/2016 13:41 james   11/1/2016 13:41 Tue 49260   -0.836236954
11/1/2016 13:41 james   11/1/2016 13:41 Tue 49260   -0.836236954
11/1/2016 13:42 james   11/1/2016 13:42 Tue 49320   -0.81546088
11/1/2016 13:42 james   11/1/2016 13:42 Tue 49320   -0.81546088
11/1/2016 13:42 james   11/1/2016 13:42 Tue 49320   -0.81546088
11/1/2016 13:42 james   11/1/2016 13:42 Tue 49320   -0.81546088
11/1/2016 13:42 james   11/1/2016 13:42 Tue 49320   -0.81546088
11/1/2016 13:42 james   11/1/2016 13:42 Tue 49320   -0.81546088
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:08  matt    11/1/2016 9:08  Tue 32880   -0.111873342
11/1/2016 9:48  matt    11/1/2016 9:48  Tue 35280   4.363060322

zscore is calculated as below:

grp2 = df.groupby(['Name'])['seconds']
df['zscore'] = grp2.transform(lambda x: (x - x.mean()) / x.std(ddof=1))

I would like to plot my data in a bell curve / normal distribution plot and save this as a picture/pdf file for each Name in my dataframe.

I have tried to plot the zscores like below:

df['by_name'].plot(kind='hist', normed=True)
range = np.arange(-7, 7, 0.001)
plt.plot(range, norm.pdf(range,0,1))
plt.show()

How would I go about plotting the by_name zscores column for each name in my data?

Answer
np.random.seed([3,1415])
df = pd.DataFrame(dict(Name='matt joe adam farley'.split() * 100,Seconds=np.random.randint(4000, 5000, 400)))df['Zscore'] = df.groupby('Name').Seconds.apply(lambda x: x.div(x.mean()))df.groupby('Name').Zscore.plot.kde()

enter image description here


split out plots

g = df.groupby('Name').Zscore
n = g.ngroups
fig, axes = plt.subplots(n // 2, 2, figsize=(6, 6), sharex=True, sharey=True)
for i, (name, group) in enumerate(g):r, c = i // 2, i % 2group.plot.kde(title=name, ax=axes[r, c])
fig.tight_layout()

enter image description here


kde + hist

g = df.groupby('Name').Zscore
n = g.ngroups
fig, axes = plt.subplots(n // 2, 2, figsize=(6, 6), sharex=True, sharey=True)
for i, (name, group) in enumerate(g):r, c = i // 2, i % 2a1 = axes[r, c]a2 = a1.twinx()group.plot.hist(ax=a2, alpha=.3)group.plot.kde(title=name, ax=a1, c='r')
fig.tight_layout()

enter image description here

https://en.xdnf.cn/q/69780.html

Related Q&A

Change pyttsx3 language

When trying to use pyttsx3 I can only use English voices. I would like to be able to use Dutch as well. I have already installed the text to speech language package in the windows settings menu. But I …

pandas groupby dates and years and sum up amounts

I have pandas dataframe like this:d = {dollar_amount: [200.25, 350.00, 120.00, 400.50, 1231.25, 700.00, 350.00, 200.25, 2340.00], date: [22-01-2010,22-01-2010,23-01-2010,15-02-2010,27-02-2010,07-03-201…

Is Python on every GNU/Linux distribution?

I would like to know if is Python on every G/L distribution preinstalled or not. And why is it so popular on GNU/Linux and not so much on Windows?

Installing QuantLib in Anaconda on the Spyder Editor (Windows)

How do I install the QuantLib Package in Anaconda. I have tried the following code;import QuantLib as qlbut I am getting the following result;ModuleNotFoundError: No module named QuantLibCan anyone ass…

get rows with empty dates pandas python

it looks like this:Dates N-D unit 0 1/1/2016 Q1 UD 1 Q2 UD 2 Q3 UD 3 2/1/2016 Q4 UD 4 5/1/2016 Q5 UD 5 Q6 UDI want to filter out the empty Dates row…

Python: Gridsearch Without Machine Learning?

I want to optimize an algorithm that has several variable parametersas input.For machine learning tasks, Sklearn offers the optimization of hyperparameters with the gridsearch functionality.Is there a …

Pandas division (.div) with multiindex

I have something similar to thisdf = pd.DataFrame(np.random.randint(2, 10, size = (5, 2))) df.index = pd.MultiIndex.from_tuples([(1, A), (2, A), (4, B), (5, B), (8, B)]) df.index.names = [foo, bar] df.…

Add a delay to a specific scrapy Request

Is it possible to delay the retry of a particular scrapy Request. I have a middleware which needs to defer the request of a page until a later time. I know how to do the basic deferal (end of queue), a…

importing without executing the class - python

my problem is about i have a file that contain class and inside this class there is bunch of code will be executed so whenever i import that file it will executed ! without creating an object of the…

If a command line program is unsure of stdouts encoding, what encoding should it output?

I have a command line program written in Python, and when I pipe it through another program on the command line, sys.stdout.encoding is None. This makes sense, I suppose -- the output could be another…