Fitting and Plotting Lognormal

2024/10/14 9:25:17

I'm having trouble doing something as relatively simple as:

  1. Draw N samples from a gaussian with some mean and variance
  2. Take logs to those N samples
  3. Fit a lognormal (using stats.lognorm.fit)
  4. Spit out a nice and smooth lognormal pdf without inf values (using stats.lognorm.pdf)

Here's a small working example of the output I'm getting:

from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
import math%matplotlib inlinedef lognormDrive(mu,variance):size = 1000sigma = math.sqrt(variance)np.random.seed(1)gaussianData = stats.norm.rvs(loc=mu, scale=sigma, size=size)logData = np.exp(gaussianData)shape, loc, scale = stats.lognorm.fit(logData, floc=mu)return stats.lognorm.pdf(logData, shape, loc, scale)plt.plot(lognormDrive(37,0.8))

enter image description here

And as you might notice, the plot makes absolutely no sense.

Any ideas?

I've followed these posts: POST1 POST2

Thanks in advance!

Elaboration: I am building a small script that will

  1. Take raw data and fit a kernel distribution (emperical dist.)
  2. Assume different distributions given the mean and variance of the data. This would be a gaussian and a lognormal
  3. Plot those distributions together with the emperical dist using interact
  4. Calculate the Kullbeck-Leibler divergence between the different distributions when one turns the knob for the mean and variance (and skew eventually)
Answer

In the call to lognorm.fit(), use floc=0, not floc=mu.

(The location parameter of the lognorm distribution simply translates the distribution. You almost never want to do that with the log-normal distribution.)

See A lognormal distribution in python

By the way, you are plotting the PDF of the unsorted sample values, so the plot in the corrected script won't look much different. You might find it more useful to plot the PDF against the sorted values. Here's a modification of your script that creates a plot of the PDF using the sorted samples:

from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
import mathdef lognormDrive(mu,variance):size = 1000sigma = math.sqrt(variance)np.random.seed(1)gaussianData = stats.norm.rvs(loc=mu, scale=sigma, size=size)logData = np.exp(gaussianData)shape, loc, scale = stats.lognorm.fit(logData, floc=0)print "Estimated mu:", np.log(scale)print "Estimated var: ", shape**2logData.sort()return logData, stats.lognorm.pdf(logData, shape, loc, scale)x, y = lognormDrive(37, 0.8)
plt.plot(x, y)
plt.grid()
plt.show()

The script prints:

Estimated mu: 37.0347152587
Estimated var:  0.769897988163

and creates the following plot:

plot

https://en.xdnf.cn/q/117970.html

Related Q&A

Is there any way to install nose in Maya?

Im using Autodesk Maya 2008 on Linux at home, and Maya 2012 on Windows 7 at work. Most of my efforts so far have been focused on the former. I found this thread, and managed to get the setup there work…

Basic python socket server application doesnt result in expected output

Im trying to write a basic server / client application in python, where the clients sends the numbers 1-15 to the server, and the server prints it on the server side console. Code for client:import soc…

creating dictionaries to list order of ranking

I have a list of people and who controls who but I need to combine them all and form several sentences to compute which person control a list of people.The employee order comes from a txt file:

Python: How to use MFdataset in netCDF4

I am trying to read multiple NetCDF files and my code returns the error:ValueError: MFNetCDF4 only works with NETCDF3_* and NETCDF4_CLASSIC formatted files, not NETCDF4. I looked up the documentation a…

Pyspark: Concat function generated columns into new dataframe

I have a pyspark dataframe (df) with n cols, I would like to generate another df of n cols, where each column records the percentage difference b/w consecutive rows in the corresponding, original df co…

Mysql.connector to access remote database in local network Python 3

I used mysql.connector python library to make changes to my local SQL server databases using: from __future__ import print_function import mysql.connector as kkcnx = kk.connect(user=root, password=pass…

concurrent.futures not parallelizing write

I have a list dataframe_chunk which contains chunks of a very large pandas dataframe.I would like to write every single chunk into a different csv, and to do so in parallel. However, I see the files be…

Querying SQLite database file in Google Colab

print (Files in Drive:)!ls drive/AIFiles in Drive:database.sqlite Reviews.csv Untitled0.ipynb fine_food_reviews.ipynb Titanic.csvWhen I run the above code in Google Colab, clearly my sqlite file is pre…

AttributeError: function object has no attribute self

I have a gui file and I designed it with qtdesigner, and there are another py file. I tried to changing button name or tried to add item in listwidget but I didnt make that things. I got an error messa…

Find file with largest number in filename in each sub-directory with python?

I am trying to find the file with the largest number in the filename in each subdirectory. This is so I can acomplish opening the most recent file in each subdirectory. Each file will follow the namin…