Question 1

I have a set of raw data and I have to identify the distribution of that data. What is the easiest way to plot a probability distribution function? I have tried fitting it in normal distribution.

But I am more curious to know which distribution does the data carry within itself ?

I have no code to show my progress as I have failed to find any functions in python that will allow me to test the distribution of the dataset. I do not want to slice the data and force it to fit in may be normal or skew distribution.

Is any way to determine the distribution of the dataset ? Any suggestion appreciated.

Is this any correct approach ? Example
This is something close what I am looking for but again it fits the data into normal distribution. Example

EDIT:

The input has million rows and the short sample is given below

Hashtag,Frequency
#Car,45
#photo,4
#movie,6
#life,1

The frequency ranges from 1 to 20,000 count and I am trying to identify the distribution of the frequency of the keywords. I tried plotting a simple histogram but I get the output as a single bar.

Code:

import pandas
import matplotlib.pyplot as pltdf = pandas.read_csv('Paris_random_hash.csv', sep=',')
plt.hist(df['Frequency'])
plt.show()

Output Output of frequency count

Question 2

This is a minimal working example for showing a histogram. It only solves part of your question, but it can be a step towards your goal. Note that the histogram function gives you the values at the two corners of the bin and you have to interpolate to get the center value.

import numpy as np
import matplotlib.pyplot as plx = np.random.randn(10000)nbins = 20n, bins = np.histogram(x, nbins, density=1)
pdfx = np.zeros(n.size)
pdfy = np.zeros(n.size)
for k in range(n.size):pdfx[k] = 0.5*(bins[k]+bins[k+1])pdfy[k] = n[k]pl.plot(pdfx, pdfy)

You can fit your data using the example shown in:

Fitting empirical distribution to theoretical ones with Scipy (Python)?

Probability Distribution Function Python

Related Q&A

how to convert a np array of lists to a np array

Regex stemmer code explanation

Scraping data from a dynamic web database with Python [closed]

Python representation of floating point numbers [duplicate]

How to grep only duplicate key:value pair in python dictionary?

IndentationError - expected an indented block [duplicate]

No axis named 1 for object type class pandas.core.frame.DataFrame

str.replace with a variable

How to generate DTD from XML?

I have a very big list of dictionaries and I want to sum the insides