How do I use a regular expression to match a name?

2024/11/15 12:26:00

I am a newbie in Python. I want to write a regular expression for some name checking. My input string can contain a-z, A-Z, 0-9, and ' _ ', but it should start with either a-z or A-Z (not 0-9 and ' _ '). I want to write a regular expression for this. I tried, but nothing was matching perfectly.

Once the input string follows the regular expression rules, I can proceed further, otherwise discard that string.

Answer

Here's an answer to your question:

Interpreting that you want _ (not -), this should do the job:

>>> tests = ["a", "A", "a1", "a_1", "1a", "_a", "a\n", "", "z_"]
>>> for test in tests:
...    print repr(test), bool(re.match(r"[A-Za-z]\w*\Z", test))
...
'a' True
'A' True
'a1' True
'a_1' True
'1a' False
'_a' False
'a\n' False
'' False
'z_' True
>>>

Stoutly resist the temptation to use $; here's why:

Hello, hello, using $ is WRONG, use \Z instead

>>> re.match(r"[a-zA-Z][\w-]*$","A")
<_sre.SRE_Match object at 0x00BAFE90>
>>> re.match(r"[a-zA-Z][\w-]*$","A\n")
<_sre.SRE_Match object at 0x00BAFF70> # WRONG; SHOULDN'T MATCH
>>>>>> re.match(r"[a-zA-Z][\w-]*\Z","A")
<_sre.SRE_Match object at 0x00BAFE90>
>>> re.match(r"[a-zA-Z][\w-]*\Z","A\n")
>>> # CORRECT: NO MATCH

The Fine Manual says:

'$'
Matches the end of the string or just before the newline at the end of the string [my emphasis], and in MULTILINE mode also matches before a newline. foo matches both ‘foo’ and ‘foobar’, while the regular expression foo$ matches only ‘foo’. More interestingly, searching for foo.$ in 'foo1\nfoo2\n' matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode; searching for a single $ in 'foo\n' will find two (empty) matches: one just before the newline, and one at the end of the string.

and

\Z
Matches only at the end of the string.

=== And now for something completely different ===

>>> import string
>>> letters = set(string.ascii_letters)
>>> ok_chars = letters | set(string.digits + "_")
>>>
>>> def is_valid_name(strg):
...     return strg and strg[0] in letters and all(c in ok_chars for c in strg)
...
>>> for test in tests:
...     print repr(test), repr(is_valid_name(test))
...
'a' True
'A' True
'a1' True
'a_1' True
'1a' False
'_a' False
'a\n' False
'' ''
'z_' True
>>>
https://en.xdnf.cn/q/71886.html

Related Q&A

python - multiprocessing module

Heres what I am trying to accomplish - I have about a million files which I need to parse & append the parsed content to a single file. Since a single process takes ages, this option is out. Not us…

How to make VSCode always run main.py

I am writing my first library in Python, When developing I want my run code button in VS Code to always start running the code from the main.py file in the root directory. I have added a new configurat…

Why does tesseract fail to read text off this simple image?

I have read mountains of posts on pytesseract, but I cannot get it to read text off a dead simple image; It returns an empty string.Here is the image:I have tried scaling it, grayscaling it, and adjust…

python click subcommand unified error handling

In the case where there are command groups and every sub-command may raise exceptions, how can I handle them all together in one place?Given the example below:import click@click.group() def cli():pass…

Data structure for large ranges of consecutive integers?

Suppose you have a large range of consecutive integers in memory, each of which belongs to exactly one category. Two operations must be O(log n): moving a range from one category to another, and findin…

polars slower than numpy?

I was thinking about using polars in place of numpy in a parsing problem where I turn a structured text file into a character table and operate on different columns. However, it seems that polars is ab…

namespace error lxml xpath python

I am transforming word documents to xml to compare them using the following code:word = win32com.client.Dispatch(Word.Application) wd = word.Documents.Open(inFile) # Converts the word infile to xml out…

lark grammar: How does the escaped string regex work?

The lark parser predefines some common terminals, including a string. It is defined as follows:_STRING_INNER: /.*?/ _STRING_ESC_INNER: _STRING_INNER /(?<!\\)(\\\\)*?/ ESCAPED_STRING : "\&quo…

Pycharm unresolved reference on join of os.path

After upgrade pycharm to 2018.1, and upgrade python to 3.6.5, pycharm reports "unresolved reference join". The last version of pycharm doesnt show any warning for the line below:from os.path …

Apply Border To Range Of Cells Using Openpyxl

I am using python 2.7.10 and openpyxl 2.3.2 and I am a Python newbie.I am attempting to apply a border to a specified range of cells in an Excel worksheet (e.g. C3:H10). My attempt below is failing wit…