Remove values before and after special character

2024/9/20 6:47:58

I have a dataframe, df, where I would like to remove the values that come before the underscore '_' and after the underscore '_' , essentially, keeping the middle.

Also keeping the digits at the end and concatenate with the middle part extracted.

Data

col1         col2
a_bu1        dd
a_lap_aa1    d     
a_lap_aa2    d
h_bb_led1    dd

Desired

col1    col2
bu1     dd
lap1    d      
lap2    d
bb1     dd

Doing

re.sub(r'^.*?I', 'I', stri)

However, the entire dataset is not being maintained. I am still researching. Any advice is appreciated.

Answer

To remove the values that come before the '_' and after the '_' , essentially, keeping the middle, you can use .str.extract() with regex, as follows:

df['col1'] = df['col1'].str.extract(r'\w*?_([^_]*)(?:_)?')

Result:

print(df)col1 col2
0  bu1   dd
1  lap    d
2  lap    d
3   bb   dd

Edit

To extract also the digits at the end, you can do:

s_df = df['col1'].str.split('_', expand=True) 
s_df[2] = s_df[2].str.extract(r'(\d+)$').fillna('') 
df['col1'] = s_df[1] + s_df[2]

Result:

print(df)col1 col2
0   bu1   dd
1  lap1    d
2  lap2    d
3   bb1   dd
https://en.xdnf.cn/q/119407.html

Related Q&A

Python selection sort

Question: The code is supposed to take a file (that contains one integer value per line), print the (unsorted) integer values, sort them, and then print the sorted values.Is there anything that doesnt…

Simple inheritance issue with Django templates

just getting started in Django, and I have some problems with the inheritances. It just seems that the loop for doesnt work when inheriting other template. Heres my code in base.html:<!DOCTYPE html&…

Replacing values in a list [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.This question was caused by a typo or a problem that can no longer be reproduced. While similar q…

Azure Release Pipeline - Environment variables on python script

Lately Ive been requested to run a python script on my Azure Release Pipeline. This script needs some environment variables for being executed, as Ive seen that in the build pipeline, the task include …

Problem with python prepared stmt parameter passing

File C:\Users\User\AppData\Local\Programs\Python\Python37\lib\site-packages\mysql\connector\cursor.py, line 1149, in execute elif len(self._prepared[parameters]) != len(params): TypeError: object of ty…

list of lists to list of tuples without loops or list comprehensions [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 6 years ago.Improve…

How can I merge CSV rows that have the same value in the first cell?

This is the file: https://drive.google.com/file/d/0B5v-nJeoVouHc25wTGdqaDV1WW8/view?usp=sharingAs you can see, there are duplicates in the first column, but if I were to combine the duplicate rows, no…

i usually get this error : ValueError: invalid literal for int() with base 10

I have loaded a csv file and as i try to print it i get this error Traceback (most recent call last):File "C:\Users\FSTC\Downloads\spaceproject\main.py", line 389, in <module>world_data…

How to Draw a triangle shape in python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 1…

DataFrame from list of string dicts with array() values

So I have a list where each entry looks something like this: "{A: array([1]), B: array([2]), C: array([3])}"I am trying to get a dataframe that looks like thisA B C 0 1 2 3 1 4 …