Inverted Index in Python not returning desired results

2024/11/19 14:40:19

I'm having trouble returning proper results for an inverted index in python. I'm trying to load a list of strings in the variable 'strlist' and then with my Inverse index looping over the strings to return the word + where it occurs. Here is what I have going so far:

def inverseIndex(strlist):d={}for x in range(len(strlist)):for y in strlist[x].split():for index, word in set(enumerate([y])):if word in d:d=d.update(index)else:d._setitem_(index,word)breakbreakbreakreturn d

Now when i run inverseIndex(strlist)

all it returns is {0:'This'} where what I want is a dictionary mapping all the words in 'strlist' to the set d.

Is my initial approach wrong? am i tripping up in the if/else? Any and all help is greatly appreciated. to point me in the right direction.

Answer

Based on what you're saying, I think you're trying to get some data like this:

input = ["hello world", "foo bar", "red cat"]
data_wanted = {"foo" : 1,"hello" : 0,"cat" : 2,"world" : 0,"red" : 2"bar" : 1
}

So what you should be doing is adding the words as keys to a dictionary, and have their values be the index of the substring in strlist in which they are located.

def locateWords(strlist):
d = {}
for i, substr in enumerate(strlist):   # gives you the index and the item itselffor word in substr.split()d[word] = i
return d

If the word occurs in more than one string in strlist, you should change the code to the following:

def locateWords(strlist):
d = {}
for i, substr in enumerate(strlist):for word in substr.split()if word not in d:d[word] = [i]else:d[word].append(i)
return d

This changes the values to lists, which contain the indices of the substrings in strlist which contain that word.

Some of your code's problems explained

  1. {} is not a set, it's a dictionary.
  2. break forces a loop to terminate immediately - you didn't want to end the loop early because you still had data to process.
  3. d.update(index) will give you a TypeError: 'int' object is not iterable. This method actually takes an iterable object and updates the dictionary with it. Normally you would use a list of tuples for this: [("foo",1), ("hello",0)]. It just adds the data to the dictionary.
  4. You don't normally want to use d.__setitem__ (which you typed wrong anyway). You'd just use d[key] = value.
  5. You can iterate using a "for each" style loop instead, like my code above shows. Looping over the range means you are looping over the indices. (Not exactly a problem, but it could lead to extra bugs if you're not careful to use the indices properly).

It looks like you are coming from another programming language in which braces indicate sets and there is a keyword which ends control blocks (like if, fi). It's easy to confuse syntax when you're first starting - but if you run into trouble running the code, look at the exceptions you get and search them on the web!

P.S. I'm not sure why you wanted a set - if there are duplicates, you probably want to know all of their locations, not just the first or the last one or anything in between. Just my $0.02.

https://en.xdnf.cn/q/119945.html

Related Q&A

Remove white space from an image using python

There are multiple images that have white spaces that I need to remove. Simply crop the image so as to get rid of the white spaces Heres the code I tried so far (this is a result of search) import nump…

How to calculate the ratio between two numbers in python

I have to calculate the ratio between 0.000857179311146189 and 0.026955533883055983 but am unsure how to do this other than by dividing the two numbers. Is it possible to calculate this with the result…

Built-in variable to get current function

I have a lot of functions like the following, which recursively call themselves to get one or many returns depending on the type of the argument: def get_data_sensor(self, sensorname):if isinstance(sen…

Python run from subdirectory

I have the following file hierarchy structure:main.py Main/A/a.pyb.pyc.pyB/a.pyb.pyc.pyC/a.pyb.pyc.pyFrom main.py I would like to execute any of the scripts in any of the subfolders. The user will pass…

How to create duplicate for each value in a python list given the number of dups I want?

I have this list: a=[7086, 4914, 1321, 1887, 7060]. Now, I want to create duplicate of each value n-times. Such as: n=2a=[7086,7086,4914,4914,1321,1321,7060,7060]How would I do this best? I tried a lo…

How do I generate random float and round it to 1 decimal place

How would I go about generating a random float and then rounding that float to the nearest decimal point in Python 3.4?

Error extracting text from website: AttributeError NoneType object has no attribute get_text

I am scraping this website and get "title" and "category" as text using .get_text().strip().I have a problem using the same approach for extracting the "author" as text.d…

Fastest way to extract tar files using Python

I have to extract hundreds of tar.bz files each with size of 5GB. So tried the following code:import tarfile from multiprocessing import Poolfiles = glob.glob(D:\\*.tar.bz) ##All my files are in D for …

Python - Split a string but keep contiguous uppercase letters [duplicate]

This question already has answers here:Splitting on group of capital letters in python(3 answers)Closed 3 years ago.I would like to split strings to separate words by capital letters, but if it contain…

Python: Find a Sentence between some website-tags using regex

I want to find a sentence between the ...class="question-hyperlink"> tags. With this code:import urllib2 import reresponse = urllib2.urlopen(https://stackoverflow.com/questions/tagged/pyth…