Question 1

I'm currently working through this SciPy example on Kernal Estimation. In particular, the one labelled "Univariate estimation". As opposed to creating random data, I am using asset returns. My 2nd estimation though (and even the simply norm pdf I create to compare to) are showing a density that peaks at 20, which makes no sense... My code is as follows:

x1 = np.array(data['actual'].values)[1:]
xs1 = np.linspace(x1.min()-1,x1.max()+1,len(x1))
std1 = x1.std()
mean1 = x1.mean()x2 = np.array(data['log_moves'].values)[1:]
xs2 = np.linspace(x2.min()-.01,x2.max()+.01,len(x2))
#xs2 = np.linspace(x2.min()-1,x2.max()+2,len(x2))
std2 = x2.std()
mean2 = x2.mean()kde1 = stats.gaussian_kde(x1)  # actuals
kde2 = stats.gaussian_kde(x1, bw_method='silverman')kde3 = stats.gaussian_kde(x2)  # log returns
kde4 = stats.gaussian_kde(x2, bw_method='silverman')fig = plt.figure(figsize=(10,8))
ax1 = fig.add_subplot(211)
ax1.plot(x1, np.zeros(x1.shape), 'b+', ms=12)  # rug plot
ax1.plot(xs1, kde1(xs1), 'k-', label="Scott's Rule")
ax1.plot(xs1, kde2(xs1), 'b-', label="Silverman's Rule")
ax1.plot(xs1, stats.norm.pdf(xs1,mean1,std1), 'r--', label="Normal PDF")ax1.set_xlabel('x')
ax1.set_ylabel('Density')
ax1.set_title("Absolute (top) and Returns (bottom) distributions")
ax1.legend(loc=1)ax2 = fig.add_subplot(212)
ax2.plot(x2, np.zeros(x2.shape), 'b+', ms=12)  # rug plot
ax2.plot(xs2, kde3(xs2), 'k-', label="Scott's Rule")
ax2.plot(xs2, kde4(xs2), 'b-', label="Silverman's Rule")
ax2.plot(xs2, stats.norm.pdf(xs2,mean2,std2), 'r--', label="Normal PDF")ax2.set_xlabel('x')
ax2.set_ylabel('Density')plt.show()

My result: results

And for reference, the data going in first and 2nd moments:

print std1
print mean1
print std2 
print mean2
4.66416718334
0.0561365678347
0.0219996729055
0.00027330546845

Further, if I change the 2nd chart to produce a lognormal PDF, I get a flat line (which, if the Y-axis was correctly scaled like the top, I'm sure would show a distribution like I'd expect)

Question 2

The result of a kernel density estimate is a probability density. While probability can't be larger than 1, a density can.

Given a probability density curve, you can find the probability within a range (x_1, x_2) by integrating the probability density in that range. Judging by eye, the integral under both your curves is approximately 1, so the output appears to be correct.

Python - SciPy Kernal Estimation Example - Density 1

Related Q&A

PyQt QFileDialog custom proxy filter not working

If I have Pandas installed correctly, why wont my import statement recognize it?

Python Issues with a Class

Dynamically populate drop down menu with selection from previous drop down menu

Web Scrape page with multiple sections

Python recv Loop

gtk+ python entry color [closed]

converting a text corpus to a text document with vocabulary_id and respective tfidf score

Numpy append array isnt working

Select a valid choice ModelChoiceField