Python Histogram using matplotlib on top words

2024/10/5 10:59:53

I am reading a file and calculating the frequency of the top 100 words. I am able to find that and create the following list:

[('test', 510), ('Hey', 362), ("please", 753), ('take', 446), ('herbert', 325), ('live', 222), ('hate', 210), ('white', 191), ('simple', 175), ('harry', 172), ('woman', 170), ('basil', 153), ('things', 129), ('think', 126), ('bye', 124), ('thing', 120), ('love', 107), ('quite', 107), ('face', 107), ('eyes', 107), ('time', 106), ('himself', 105), ('want', 105), ('good', 105), ('really', 103), ('away',100), ('did', 100), ('people', 99), ('came', 97), ('say', 97), ('cried', 95), ('looked', 94), ('tell', 92), ('look', 91), ('world', 89), ('work', 89), ('project', 88), ('room', 88), ('going', 87), ('answered', 87), ('mr', 87), ('little', 87), ('yes', 84), ('silly', 82), ('thought', 82), ('shall', 81), ('circle', 80), ('hallward', 80), ('told', 77), ('feel', 76), ('great', 74), ('art', 74), ('dear',73), ('picture', 73), ('men', 72), ('long', 71), ('young', 70), ('lady', 69), ('let', 66), ('minute', 66), ('women', 66), ('soul', 65), ('door', 64), ('hand',63), ('went', 63), ('make', 63), ('night', 62), ('asked', 61), ('old', 61), ('passed', 60), ('afraid', 60), ('night', 59), ('looking', 58), ('wonderful', 58), ('gutenberg-tm', 56), ('beauty', 55), ('sir', 55), ('table', 55), ('turned', 54), ('lips', 54), ("one's", 54), ('better', 54), ('got', 54), ('vane', 54), ('right',53), ('left', 53), ('course', 52), ('hands', 52), ('portrait', 52), ('head', 51), ("can't", 49), ('true', 49), ('house', 49), ('believe', 49), ('black', 49), ('horrible', 48), ('oh', 48), ('knew', 47), ('curious', 47), ('myself', 47)]

After getting this list, I want to draw histogram using matplotlib. I am trying something as below, but I am not able to draw a proper histogram.

My question: How do I pass the total frequency to the graph? All of my bars are at the same height right now. And even the bin center is not correct. How should I pass data to the ax.hist method on below code? I am trying to update the example from http://matplotlib.org/1.2.1/examples/api/histogram_demo.html.

totalWords = counts.most_common(100)
print(totalWords)
for z in range(len(totalWords)):words.append(totalWords[z][0])x = np.arange(len(words))
#print x
i, s = 100, 15fig = plt.figure()
ax = fig.add_subplot(111)n, bins, patches = ax.hist(x, 50, normed=1, facecolor='green', alpha=0.75)bincenters = 0.5*(bins[1:]+bins[:-1])y = mlab.normpdf(bincenters*1.00, i, s)
l = ax.plot(bincenters, y, 'r--', linewidth=1)ax.set_xlabel('Words')
ax.set_ylabel('Frequency')
ax.set_xlim(50, 160)
ax.set_ylim(0, 0.04)
ax.grid(True)plt.show()
Answer

It's a little unclear exactly what you want to graph, and how relevant the matplotlib demo you are adapting actually is.

I'll run through some options, and try and answer your specific questions in each case:

  • Using the matplotlib demo, you only need to give ax.hist the list of word frequencies x = words[n][1] ,but this just gives you the relative frequency of the different frequencies... so most of the words occur <100 times, while a couple of words occur much more frequently. This is why your code above returns a histogram of equal bars, because you are giving ax.hist the numbers from 0 to 99 once each. Note that this approach doesn't show the individual words

  • Otherwise, I think you want a bar chart with each bar labelled as a different word.

This worked for me.

words = [('test', 510), ('Hey', 362), ("please", 753), ('take', 446),     ('herbert', 325), ('live', 222), ('hate', 210), ('white', 191), ('simple', 175),     ('harry', 172), ('woman', 170), ('basil', 153), ('things', 129), ('think', 126), ('bye', 124), ('thing', 120), ('love', 107), ('quite', 107), ('face', 107), ('eyes', 107), ('time', 106), ('himself', 105), ('want', 105), ('good', 105), ('really', 103), ('away',100), ('did', 100), ('people', 99), ('came', 97), ('say', 97), ('cried', 95), ('looked', 94), ('tell', 92), ('look', 91), ('world', 89), ('work', 89), ('project', 88), ('room', 88), ('going', 87), ('answered', 87), ('mr', 87), ('little', 87), ('yes', 84), ('silly', 82), ('thought', 82), ('shall', 81), ('circle', 80), ('hallward', 80), ('told', 77), ('feel', 76), ('great', 74), ('art', 74), ('dear',73), ('picture', 73), ('men', 72), ('long', 71), ('young', 70), ('lady', 69), ('let', 66), ('minute', 66), ('women', 66), ('soul', 65), ('door', 64), ('hand',63), ('went', 63), ('make', 63), ('night', 62), ('asked', 61), ('old', 61), ('passed', 60), ('afraid', 60), ('night', 59), ('looking', 58), ('wonderful', 58), ('gutenberg-tm', 56), ('beauty', 55), ('sir', 55), ('table', 55), ('turned', 54), ('lips', 54), ("one's", 54), ('better', 54), ('got', 54), ('vane', 54), ('right',53), ('left', 53), ('course', 52), ('hands', 52), ('portrait', 52), ('head', 51), ("can't", 49), ('true', 49), ('house', 49), ('believe', 49), ('black', 49), ('horrible', 48), ('oh', 48), ('knew', 47), ('curious', 47), ('myself', 47)]
wordsdict = {}
for w in words:wordsdict[w[0]]=w[1]plt.bar(range(len(wordsdict)), wordsdict.values(), align='center')
plt.xticks(range(len(wordsdict)), wordsdict.keys())plt.show()
https://en.xdnf.cn/q/120196.html

Related Q&A

how can I add field in serializer?

Below is my serializer.py file: from rest_framework import serializersclass TaskListSerializer(serializers.Serializer):id = serializers.CharField()user_id = serializers.CharField()status = serializers.…

Read and write a variable in module A from module B

Need a solution or workaround to read and write the variable in moduleA from moduleB please. Here is the code:moduleAimport moduleBvariable = 10def changeVariable():global variablevariable = 20def main…

Iterate through a pandas dataframe to get a specific output [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 1 year ago.The comm…

Detect if you click inside a box in Python Zelle graphics

I have created a dice betting game. When my code is run, there is a "CLICK TO ROLL" button. Currently, if you click anywhere on the screen, the dice will roll. How can I make it so the progra…

Nested Triangle in Python

My assingmentAt each level the complete triangle for the previous level is placed into an extra outer triangle. The user should be asked to input the two characters to be used and the width of the inne…

How to fix cannot open file Test.py: [Errno2] No such file or directory?

I tried to run a python script which does not exist in current folder, for exampleC:\>python Test.pypython:cant open file Test.py:[Errno2] No such file or directoryI have to specify the absolute pat…

Highlight cells in a column in google spreadsheet when the value above a threshold with python

Here is a simplified example of my codes and the screenshot of the results I want to get in google spreadsheet. I hope to either save the dataframe style to google spreadsheet as applying table style t…

cannot concatenate str and file objects : Python error

I have Following piece of code: for src_filename, src_code in src_dict.iteritems(): try: set.dependencies = subprocess.check_output(unifdef -s /home/c/maindir/folder/ +src_filename, shell=True) except…

Run python script from html button submit

i have a code input data to txt file :<form action="proses.php" method="post">Nomor Polisi : <br><input type="text" name="nopol"><br><…

Reading log files in python

I have a log file (it is named data.log) containing data that I would like to read and manipulate. The file is structured as follows:#Comment line 1 #Comment line 2 1.00000000,3.02502604,343260.6865…