Probability Distribution Function Python

2024/7/8 7:01:46

I have a set of raw data and I have to identify the distribution of that data. What is the easiest way to plot a probability distribution function? I have tried fitting it in normal distribution.

But I am more curious to know which distribution does the data carry within itself ?

I have no code to show my progress as I have failed to find any functions in python that will allow me to test the distribution of the dataset. I do not want to slice the data and force it to fit in may be normal or skew distribution.

Is any way to determine the distribution of the dataset ? Any suggestion appreciated.

Is this any correct approach ? Example
This is something close what I am looking for but again it fits the data into normal distribution. Example

EDIT:

The input has million rows and the short sample is given below

Hashtag,Frequency
#Car,45
#photo,4
#movie,6
#life,1

The frequency ranges from 1 to 20,000 count and I am trying to identify the distribution of the frequency of the keywords. I tried plotting a simple histogram but I get the output as a single bar.

Code:

import pandas
import matplotlib.pyplot as pltdf = pandas.read_csv('Paris_random_hash.csv', sep=',')
plt.hist(df['Frequency'])
plt.show()

Output Output of frequency count

Answer

This is a minimal working example for showing a histogram. It only solves part of your question, but it can be a step towards your goal. Note that the histogram function gives you the values at the two corners of the bin and you have to interpolate to get the center value.

import numpy as np
import matplotlib.pyplot as plx = np.random.randn(10000)nbins = 20n, bins = np.histogram(x, nbins, density=1)
pdfx = np.zeros(n.size)
pdfy = np.zeros(n.size)
for k in range(n.size):pdfx[k] = 0.5*(bins[k]+bins[k+1])pdfy[k] = n[k]pl.plot(pdfx, pdfy)

You can fit your data using the example shown in:

Fitting empirical distribution to theoretical ones with Scipy (Python)?

https://en.xdnf.cn/q/120398.html

Related Q&A

how to convert a np array of lists to a np array

latest updated: >>> a = np.array(["0,1", "2,3", "4,5"]) >>> a array([0,1, 2,3, 4,5], dtype=|S3) >>> b = np.core.defchararray.split(a, sep=,) >…

Regex stemmer code explanation

Can someone please explain what does this code do?def stemmer(word):[(stem,end)] = re.findall(^(.*ss|.*?)(s)?$,word)return stem

Scraping data from a dynamic web database with Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 9…

Python representation of floating point numbers [duplicate]

This question already has answers here:Floating Point Limitations [duplicate](3 answers)Closed 10 years ago.I spent an hour today trying to figure out whyreturn abs(val-desired) <= 0.1was occasional…

How to grep only duplicate key:value pair in python dictionary?

I have following python dictionary.a={name:test,age:26,place:world,name:test1}How to grep only duplicate key:value pair from the above?Output should be: "name: test and name:test1"

IndentationError - expected an indented block [duplicate]

This question already has answers here:Im getting an IndentationError (or a TabError). How do I fix it?(6 answers)Closed 7 months ago.I get the IndentationError: expected an indented block. I was tryi…

No axis named 1 for object type class pandas.core.frame.DataFrame

I created a DataFrame and I am trying to sort it based on the columns. I used the below code.frame.sort_index(axis=1)But this is causing the below errors------------------------------------------------…

str.replace with a variable

This is probably a simple fix, but having a little trouble getting my head around it; Im reading lines from a different script, and want to replace a line with a variable, however it replaces it with b…

How to generate DTD from XML?

Can a DTD be generated from an XML file using Python?

I have a very big list of dictionaries and I want to sum the insides

Something like{A: 3, 45, 34, 4, 2, 5, 94, 2139, 230345, 283047, 230847}, {B: 92374, 324, 345, 345, 45879, 34857987, 3457938457), {C: 23874923874987, 2347}How can I reduce that to {A: 2304923094820398},…