Loop and arrays of strings in python

2024/10/12 2:28:17

I have the following data set:

column1HL111
PG3939HL11
HL339PG
RC--HL--PG

I am attempting to write a function that does the following:

  1. Loop through each row of column1
  2. Pull only the alphabet and put into an array
  3. If the array has "HL" in it, remove it from the array UNLESS HL is the only word in the array.
  4. Take the first word in the array and output results.

So for the above example, my array (step2) would look like this:

[HL]
[PG,HL]
[HL,PG]
[RC,HL,PG]

and my desired final output (step4) would look like this:

desired_columnHL
PG
PG
RC

I have the code for step 2, and it seems to work fine

df['array_column'] = (df.column1.str.extractall('([A-Z]+)').unstack().values.tolist())

But I don't know how to get from here to my final output (step4).

Answer

You may achieve what you need by replacing all non-letters first, then extracting pairs of letters and then applying some custom logic to extract the necessary value from the array:

>>> df['array_column'].str.replace('[^A-Z]+', '').str.findall('([A-Z]{2})').apply(lambda d: [''] if len(d) == 0 else d).apply(lambda x: 'HL' if len(x) == 1 and x[0] == 'HL' else [m for m in x if m != 'HL'][0])
0    HL
1    PG
2    PG
3    RC
Name: array_column, dtype: object
>>> 

Details

  • .replace('[^A-Z]+', '') - remove all chars other the uppercase letters
  • .str.findall('([A-Z]{2})') - extract pairs of letters
  • .apply(lambda d: [''] if len(d) == 0 else d) will add an empty item if there is no regex match in the previous step
  • .apply(lambda x: 'HL' if len(x) == 1 and x[0] == 'HL' else [m for m in x if m != 'HL'][0]) - custom logic: if the list length is 1 and it is equal to HL, keep it, else remove all HL and get the first element
https://en.xdnf.cn/q/118252.html

Related Q&A

2 Dendrograms + Heatmap from condensed correlationmatrix with scipy

I try to create something like this: plotting results of hierarchical clustering ontop of a matrix of data in pythonUnfortunatelly when I try to execute the code, I get the following warnings:Warning (…

Iterator example from Dive Into Python 3

Im learning Python as my 1st language from http://www.diveintopython3.net/. On Chp 7, http://www.diveintopython3.net/iterators.html, there is an example of how to use an iterator.import redef build_mat…

Getting a 500 Internal Server Error using render_template and Flask [duplicate]

This question already has answers here:How to debug a Flask app(13 answers)Comments not working in jinja2(2 answers)Closed 5 years ago.I am trying to use Flask to render an HTML template. I had it work…

Bokeh use of Column Data Source and Box_Select

Im lost as to how to set up a Column Data Source so that I can select points from one graph and have the corresponding points highlighted in another graph. I am trying to learn more about how this work…

How Does a Pyqtgraph Export for Three Subplots Look Like?

Using PyQtGraph, I would like to generate three sub plots in one chart and export this chart to a file.As I will repeat this a lot of times, it is quite performance sensitive. Therefore I do not need t…

Use class variables as instance vars?

What I would like to do there is declaring class variables, but actually use them as vars of the instance. I have a class Field and a class Thing, like this:class Field(object):def __set__(self, instan…

Get amount from django-paypal

I am using django-paypal to receive payment. I am currently paying as well as receiving payment using sandbox accounts. The payment procedure seems to be working fine. My problem is once I get back the…

Python get file regardless of upper or lower

Im trying to use this on my program to get an mp3 file regardless of case, and Ive this code:import glob import fnmatch, redef custom_song(name):for song in re.compile(fnmatch.translate(glob.glob("…

how to save h5py arrays with different sizes?

I am referring this question to this. I am making this new thread because I did not really understand the answer given there and hopefully there is someone who could explain it more to me. Basically my…

Cannot allocate memory on Popen commands

I have a VPS server with Ubuntu 11.10 64bit and sometimes when I execute a subprocess.Popen command I get am getting too much this error:OSError: [Errno 12] Cannot allocate memoryConfig details: For ea…