Splitting a python string

2024/11/15 14:06:08

I have a string in python that I want to split in a very particular manner. I want to split it into a list containing each separate word, except for the case when a group of words are bordered by a particular character. For example, the following strings would be split as such.

'Jimmy threw his ball through the window.'

becomes

['Jimmy', 'threw', 'his', 'ball', 'through', 'the', 'window.']

However, with a border character I'd want

'Jimmy |threw his ball| through the window.'

to become

['Jimmy', 'threw his ball', 'through', 'the', 'window.']

As an additional component I need - which may appear outside the grouping phrase to appear inside it after splitting up i.e.,

'Jimmy |threw his| ball -|through the| window.'

would become

['Jimmy', 'threw his', 'ball', '-through the', 'window.']

I cannot find a simple, pythonic way to do this without a lot of complicated for loops and if statements. Is there a simple way to handle something like this?

Answer

This isn't something with an out-of-the-box solution, but here's a function that's pretty Pythonic that should handle pretty much anything you throw at it.

def extract_groups(s):separator = re.compile("(-?\|[\w ]+\|)")components = separator.split(s)groups = []for component in components:component = component.strip()if len(component) == 0:continueelif component[0] in ['-', '|']:groups.append(component.replace('|', ''))else:groups.extend(component.split(' '))return groups

Using your examples:

>>> extract_groups('Jimmy threw his ball through the window.')
['Jimmy', 'threw', 'his', 'ball', 'through', 'the', 'window.']
>>> extract_groups('Jimmy |threw his ball| through the window.')
['Jimmy', 'threw his ball', 'through the', 'window.']
>>> extract_groups('Jimmy |threw his| ball -|through the| window.')
['Jimmy', 'threw his', 'ball', '-through the', 'window.']
https://en.xdnf.cn/q/119653.html

Related Q&A

file modifiaction and manipulation

How would you scan a dir for a text file and read the text file by date modified, print it to screen having the script scan the directory every 5 seconds for a newer file creadted and prints it. Is it …

Get quantitative value for color on two-color scale

I have run a chemical test that produces a color depending on how much of a given chemical is in the sample. It is green if there is no chemical, and yellow if there is a saturating amount of chemical.…

how to save python session input and output [duplicate]

This question already has answers here:How to save a Python session, including input and output, as a text?(4 answers)Closed 2 years ago.All of the ways which discussed this question save the history …

flask_mysqldb Delete FROM variable table [duplicate]

This question already has answers here:Python sqlite3 parameterized drop table(1 answer)How do I use SQL parameters with python?(1 answer)Closed 6 years ago.So i use flask_mysqldb in a Flask(Python we…

Syntax Error at the end of a while loop

EDIT: This question was ask at the start of my learning process for python. The Syntax Error was produced by pythons IDLE with no trackback to speak of. This was the main cause of the problem and confu…

Creating an adjacency list class in Python

I was wondering how to create an adjacency list class Here is what I have so far:class AdjNode:def __init__(self, value):self.vertex = valueself.next = Noneclass Graph:def __init__(self):# Add edgesdef…

How can I separate a rust library and the pyo3 exported python extensions which wrap it

I have a rust library which provides useful functionality for use in other rust programs. Additionally I would like to provide this functionality as a python extension (using pyo3 and setuptools-rust, …

how do I count unique words of text files in specific directory with Python? [closed]

Its difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying thi…

Python convert path to dict

I have a list of paths that need to be converted to a dict ["/company/accounts/account1/accountId=11111","/company/accounts/account1/accountName=testacc","/company/accounts/acc…

Python: How to download images with the URLs in the excel and replace the URLs with the pictures?

As shown in the below picture,theres an excel sheet and about 2,000 URLs of cover images in the F column. What I want to do is that downloading the pictures with the URLs and replace the URL with the…