I have a set of raw data and I have to identify the distribution of that data. What is the easiest way to plot a probability distribution function? I have tried fitting it in normal distribution.
But I am more curious to know which distribution does the data carry within itself ?
I have no code to show my progress as I have failed to find any functions in python that will allow me to test the distribution of the dataset. I do not want to slice the data and force it to fit in may be normal or skew distribution.
Is any way to determine the distribution of the dataset ? Any suggestion appreciated.
Is this any correct approach ? Example
This is something close what I am looking for but again it fits the data into normal distribution. Example
EDIT:
The input has million rows and the short sample is given below
Hashtag,Frequency
#Car,45
#photo,4
#movie,6
#life,1
The frequency ranges from 1
to 20,000
count and I am trying to identify the distribution of the frequency of the keywords. I tried plotting a simple histogram but I get the output as a single bar.
Code:
import pandas
import matplotlib.pyplot as pltdf = pandas.read_csv('Paris_random_hash.csv', sep=',')
plt.hist(df['Frequency'])
plt.show()
Output