How to extract quotations from text using NLTK [duplicate]
2024/11/20 9:41:22
I have a project wherein I need to extract quotations from a huge set of articles . Here , by quotations I mean things said by people , for eg: Alen said " text to be extracted ." I'm using NLTK for my other NLP related tasks so any solution using NLTK or any kind of Python library would be quite useful.
Thanks
Answer
As Mayur mentioned, you can do a regex to pick up everything between quotes
list = re.findall("\".*?\"", string)
The problem you'll run into is that there can be a surprisingly large amount of things between quotation marks that are actually not quotations.
If you're doing academic articles, you can look for a number after the closing quotation to pick up the footnote number. Else with non academic articles, perhaps you could run something like:
"(said|writes|argues|concludes)(,)? \".?\""
can be more precise, but risks losing quotes such as blockquotes (blockquotes will cause you problems anyways because they can include a newline before the closing quotation mark)
As for using NLTK, I can't think of anything there that will be of much help other than perhaps wordnet for finding synonyms for "said".
Its difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying thi…
If I have a function that the independent variable is the upper limit of an definite integral of a mathematical model. This mathematical model has the parameters I want to do regression.
This mathemati…
My flask app is outputting no content for the for() block and i dont know why.I tested my query in app.py , here is app.py:# mysql config
app.config[MYSQL_DATABASE_USER] = user
app.config[MYSQL_DATABAS…
I have created a calendar app with month and week view in python.
In month view, I can write notes in each day, store them in a dictionary and save the dictionary in to disk so I can read it any time.…
As title. the class set a attribute value inside inner class. then, access that inner attribute class from outer function. In below, attribute sets with inner function set_error. then, use outer functi…
I need to sum up the "value" column amount for each value of col1 of the File1 and export it to an output file. Im new in python and need to do it for thousands of records.File1col1 col2 …
The function below works perfectly and only needs one thing:
Removal of the for loop that creates the 1000 element array arr.
Can you help me get rid of that for loop?
Code is below
#Test with europea…
This question already has answers here:re.findall behaves weird(3 answers)Closed 4 years ago.So doing this (in python 3.7.3):>>> from re import findall
>>> s = 7.95 + 10 pieces
>&g…
varSentence = ("The fat cat sat on the mat")print (varSentence)varWord = input("Enter word ")varSplit = varSentence.split()if varWord in varSplit:print ("Found word")
else…