regex to extract a set number of words around a matched word

2024/11/17 12:58:41

I was looking around for a way to grab words around a found match, but they were much too complicated for my case. All I need is a regex statement to grab, lets say 10, words before and after a matched word. Would anybody be able to help me set up a pattern to do that?

For example, let's take the sentence (won't make sense):

    sentence = "The hairy yellow, stinkin' dog, sat round' the c4mpfir3 and ate the brown/yellow smore's that the kids(*adults) were makin."

and let's say we want to match 3 words before and after smore's (already cleaned to match). The output would be:

   "ate the brown/yellow smore's that the were"

now lets take the example of wanting to take one word before and after stinkin' :

   "yellow, stinkin' dog"

Another example. "sat":

   "yellow, stinkin' dog, round' the and

Let's make a new sentence now:

   sentence = "If the problem is still there after 30 minutes. Give up"

If I was trying to match the word there, and take 2 words before and after the output would be:

   "is still there after minutes"

I know it's not 10, but I think you get the example? If not, let me know and I will provide more. As I made this, I realized how much more I want than I originally thought. I'm rather new to regex, but I'm going to give the pattern a shot.

    ('[a-zA-Z\'.,/]{3}(word_to_match)[a-zA-Z\'.,/]{3}')

Thanks

Answer

This regex will get you started

((?:\w*\s*){2})\s*word3\s*((?:\s*\w*){2})

Group 1 will have the words before your target and group 2 will have the words that come after

In the example I choose to capture 2 words but you can adjust this at will.

Let me know how it goes and if it works on your input.

You can improve your question by reading this short advice http://worksol.be/regex.html

enter image description here

https://en.xdnf.cn/q/118803.html

Related Q&A

How do I make a minimal and reproducible example for neural networks?

I would like to know how to make a minimal and reproducible deep learning example for Stack Overflow. I want to make sure that people have enough information to pinpoint the exact problem with my code.…

Increase the capture and stream speed of a video using OpenCV and Python [duplicate]

This question already has answers here:OpenCV real time streaming video capture is slow. How to drop frames or get synced with real time?(4 answers)Closed 2 years ago.I need to take a video and analyz…

Getting Pyphons Tkinter to update a label with a changing variable [duplicate]

This question already has answers here:Making python/tkinter label widget update?(5 answers)Closed 8 years ago.I have a python script which I have written for a Raspberry Pi project, the script reads …

Can someone help me installing pyHook?

I have python 3.5 and I cant install pyHook. I tried every method possible. pip, open the cmd directly from the folder, downloaded almost all the pyHook versions. Still cant install it.I get this error…

What is the bit-wise NOT operator in Python? [duplicate]

This question already has answers here:The tilde operator in Python(10 answers)Closed last year.Is there a function that takes a number with binary numeral a, and does the NOT? (For example, the funct…

PyQt QScrollArea no scrollarea

I haveclass View(QtWidgets.QLabel):def __init__(self):super(View,self).__init__()self.cropLabel = QtWidgets.QLabel(self)self.label = QtWidgets.QLabel(self)self.ogpixmap = QtGui.QPixmap()fileName = rC:/…

svg tag scraping from funnels

I am trying to scrape data from here but getting error. I have taken code from here Scraping using Selenium and pythonThis code was working perfectly fine but now I am getting errorwait.until(EC.visibi…

Python search for multiple values and show with boundaries

I am trying to allow the user to do this:Lets say initially the text says:"hello world hello earth"when the user searches for "hello" it should display:|hello| world |hello| earthhe…

Python: create human-friendly string from a list of datetimes

Im actually looking for the opposite of this question: Converting string into datetimeI have a list of datetime objects and I want to create a human-friendly string from them, e.g., "Jan 27 and 3…

Python replace non digit character in a dataframe [duplicate]

This question already has answers here:Removing non numeric characters from a string in Python(9 answers)Closed 5 years ago.I have the following dataframe column>>> df2[Age]1 25 2 35 3 …