Extract number between text and | with RegEx Python

2024/10/9 19:17:15

I want to extract the information between CVE and |, but only the first time that CVE appear in the txt.

I have now the follow code:

import re
f = open ('/Users/anna/PycharmProjects/extractData/DiarioOficial/aaa1381566.pdf.txt','r')
mensaje = f.read()
mensaje = mensaje.replace("\n","")print re.findall(r'\sCVE\s+([^|]*)', mensaje)

Here is the txt file:

CVE 1381566     |     Director: Juan Jorge Lazo Rodríguez    Sitio Web:   www.diarioficial.cl    |     Mesa Central:   +562 2486 3600    Email:    [email protected]   Dirección:    Dr. Torres Boonen N°511, Providencia, Santiago, Chile.       Este documento ha sido firmado electrónicamente de acuerdo con la ley N°19.799 e incluye sellado de tiempo y firma electrónica  avanzada. Para verificar la autenticidad de una representación impresa del mismo, ingrese este código en el sitio web www.diarioficial.cl                           DIARIO OFICIAL    DE LA REPUBLICA DE CHILE    Ministerio del Interior y Seguridad Pública      V    SECCIÓN       CONSTITUCIONES, MODIFICACIONES Y DISOLUCIONES DE SOCIEDADES Y COOPERATIVAS                      Núm. 42.031    |    Viernes 13 de Abril de 2018    |    Página 1 de 1      Empresas y Cooperativas    CVE 1381566        EXTRACTO     MARÍA SOLEDAD LÁSCAR MERINO, Notario Público Titular de la Sexta Notaría de  Antofagasta, Prat Nº 482, local 25, certifica: Escritura hoy ante mí: CARLOS ANDRES ROJAS  ANGEL, calle Antilhue Nº 1613; CAROLINA ANDREA ROJAS VALERO, calle Catorce de  Febrero Nº 2339; NADIA TATIANA LEON BELMAR, calle Azapa Nº 4831; MARIO  ANTONIO LUQUE HERRERA, calle Huanchaca Nº 398; PEDRO EDUARDO BARRAZA  ZAPATA, Avenida Andrés Sabella Nº 2766; JOSE ANTONIO REYES RASSE, calle Altos del  Mar Nº 1147, casa 15; y PATRICIA ALICIA MARCHANT ROJAS, calle Ossa N° 2741; todos  domicilios Antofagasta, rectificaron y complementaron sociedad "CENTRO DE  ACONDICIONAMIENTO FISICO LEFTRARU LIMITADA, LEFTRARU LIMITADA  nombre de fantasía "LEFTRARU BOX LTDA"., constituida escritura este oficio, fecha 20 de  febrero de 2018, publicada en extracto Diario Oficial fecha 13 de marzo de 2018, edición Nº  42006; sentido señalar que la razón social correcta de la sociedad es: CENTRO DE  ACONDICIONAMIENTO FISICO LEFTRARU LIMITADA; y su nombre de fantasía es  LEFTRARU BOX LTDA.; y no "CENTRO DE ACONDICIONAMIENTO FISICO  LEFTRARU, y nombre fantasía "LEFTRARU LTDA"., como erróneamente allí se menciona.-  Demás estipulaciones escritura.- ANTOFAGASTA, 27 de marzo de 2018.-   
Answer

What you might do is instead of matching \s at the start, match a whitespace character\s*zero or more times or assert the start of the string ^ and use search to find the first location where the regular expression pattern produces a match.

Then get the value from the capturing group:

mensaje = mensaje.replace("\n","")
regex = r"\s*CVE\s+([^|]*)"
matches = re.search(regex, mensaje)
if matches:print (matches.group(1).strip()) # 1381566

Demo

https://en.xdnf.cn/q/118547.html

Related Q&A

How to reset a loop that iterates over a set?

How can I reset a loop that iterates over a set? A common answer for iterating over a list is to reset the index you are using to access the list, however sets do not support indices. The point is to …

Can i set a threading timer with clock time to sync with cron job in python

I have a cron job that runs at 12, 12:30,1, 1:30. So every half hour intervals on the clock. I want to run a thread in my python code whenever the cron job runs.I have seen examples where to run a tim…

How do I make a simple countdown time in tkinter?

I am making a simple countdown timer in minutes. I cant seem to display the countdown in text label. Can someone help me?import tkinter as tk import timedef countdown(t):while t:mins, secs = divmod(t,…

Embed one pdf into another pdf using PyMuPDF

In need of help from learned people on this forum. I just want to embed one pdf file to another pdf file. So that when I go to the attachment section of the second file I can get to see and open the fi…

How to fix - TypeError: write() argument must be str, not None

Here is my code - sentence = input("Enter a sentence without punctuation") sentence = sentence.lower() words = sentence.split() pos = [words.index(s)+1 for s in words] hi = print("This s…

Is there a way to get source of a python file while executing within the python file?

Assuming you have a python file like so#python #comment x = raw_input() exec(x)How could you get the source of the entire file, including the comments with exec?

How can I stop find_next_sibling() once I reach a certain tag?

I am scraping athletic.net, a website that stores track and field times. So far I have printed event titles and times, but my output contains all times from that season rather than only times for that …

How can I make a map editor?

I am making a map editor. For 3 tiles I have to make 3 classes: class cloud:def __init__(self,x,y,height,width,color):self.x = xself.y = yself.height = heightself.width = widthself.color = colorself.im…

Counting total number of unique characters for Python string

For my question above, Im terribly stuck. So far, the code I have come up with is:def count_bases():get_user_input()amountA=get_user_input.count(A)if amountA == 0:print("wrong")else:print (&q…

adding a newly created and uploaded package to pycharm

I created a package (thompcoUtils) on test.pypi.org and pypi.org https://pypi.org/project/thompcoUtils/ and https://test.pypi.org/project/thompcoUtils/ show the package is installed in both the test an…