I'm having trouble returning proper results for an inverted index in python. I'm trying to load a list of strings in the variable 'strlist' and then with my Inverse index looping over the strings to return the word + where it occurs. Here is what I have going so far:
def inverseIndex(strlist):d={}for x in range(len(strlist)):for y in strlist[x].split():for index, word in set(enumerate([y])):if word in d:d=d.update(index)else:d._setitem_(index,word)breakbreakbreakreturn d
Now when i run inverseIndex(strlist)
all it returns is {0:'This'}
where what I want is a dictionary mapping all the words in 'strlist'
to the set d
.
Is my initial approach wrong? am i tripping up in the if/else? Any and all help is greatly appreciated. to point me in the right direction.
Based on what you're saying, I think you're trying to get some data like this:
input = ["hello world", "foo bar", "red cat"]
data_wanted = {"foo" : 1,"hello" : 0,"cat" : 2,"world" : 0,"red" : 2"bar" : 1
}
So what you should be doing is adding the words as keys to a dictionary, and have their values be the index of the substring in strlist
in which they are located.
def locateWords(strlist):
d = {}
for i, substr in enumerate(strlist): # gives you the index and the item itselffor word in substr.split()d[word] = i
return d
If the word occurs in more than one string in strlist
, you should change the code to the following:
def locateWords(strlist):
d = {}
for i, substr in enumerate(strlist):for word in substr.split()if word not in d:d[word] = [i]else:d[word].append(i)
return d
This changes the values to lists, which contain the indices of the substrings in strlist
which contain that word.
Some of your code's problems explained
{}
is not a set, it's a dictionary.
break
forces a loop to terminate immediately - you didn't want to end the loop early because you still had data to process.
d.update(index)
will give you a TypeError: 'int' object is not iterable
. This method actually takes an iterable object and updates the dictionary with it. Normally you would use a list of tuples for this: [("foo",1), ("hello",0)]
. It just adds the data to the dictionary.
- You don't normally want to use
d.__setitem__
(which you typed wrong anyway). You'd just use d[key] = value
.
- You can iterate using a "for each" style loop instead, like my code above shows. Looping over the range means you are looping over the indices. (Not exactly a problem, but it could lead to extra bugs if you're not careful to use the indices properly).
It looks like you are coming from another programming language in which braces indicate sets and there is a keyword which ends control blocks (like if, fi
). It's easy to confuse syntax when you're first starting - but if you run into trouble running the code, look at the exceptions you get and search them on the web!
P.S. I'm not sure why you wanted a set - if there are duplicates, you probably want to know all of their locations, not just the first or the last one or anything in between. Just my $0.02.