How to retrieve only arabic texts from a string using regular expression?

2024/9/21 16:43:03

I have a string which has both Arabic and English sentences. What I want is to extract Arabic Sentences only.

my_string="""
What is the reason
ذَلِكَ الْكِتَابُ لَا رَيْبَ فِيهِ هُدًى لِلْمُتَّقِينَ
behind this?
ذَلِكَ الْكِتَابُ لَا رَيْبَ فِيهِ هُدًى لِلْمُتَّقِينَ
"""

This Link shows that the Unicode range for Arabic letters is 0600-06FF.

So, very basic attempt came to my mind is:

import re
print re.findall(r'[\u0600-\u06FF]+',my_string)

But, this fails miserably as it returns the following list.

['What', 'is', 'the', 'reason', 'behind', 'this?']

As you can see, this is exactly opposite of what I want. What I am missing here?

N.B.

I know I can match the Arabic letters by using inverse matching like below:

print re.findall(r'[^a-zA-Z\s0-9]+',my_string)

But, I don't want that.

Answer

You can use re.sub to replace ascii characters with empty string.

>>> my_string="""
... What is the reason
... ذَلِكَ الْكِتَابُ لَا رَيْبَ فِيهِ هُدًى لِلْمُتَّقِينَ
... behind this?
... ذَلِكَ الْكِتَابُ لَا رَيْبَ فِيهِ هُدًى لِلْمُتَّقِينَ
... """
>>> print(re.sub(r'[a-zA-Z?]', '', my_string).strip())
ذَلِكَ الْكِتَابُ لَا رَيْبَ فِيهِ هُدًى لِلْمُتَّقِينَذَلِكَ الْكِتَابُ لَا رَيْبَ فِيهِ هُدًى لِلْمُتَّقِينَ

Your regex didn't work because you are using Python 2 and your string is str you need to convert my_string to unicode for it to work. However it did perfectly work on Python3.x

>>> print "".join(re.findall(ur'[\u0600-\u06FF]', unicode(my_string, "utf-8"), re.UNICODE))
ذَلِكَالْكِتَابُلَارَيْبَفِيهِهُدًىلِلْمُتَّقِينَذَلِكَالْكِتَابُلَارَيْبَفِيهِهُدًىلِلْمُتَّقِينَ
https://en.xdnf.cn/q/72044.html

Related Q&A

Formatted output in OpenOffice/Microsoft Word with Python

I am working on a project (in Python) that needs formatted, editable output. Since the end-user isnt going to be technically proficient, the output needs to be in a word processor editable format. The …

Issue in calling Python code from Java (without using jython)

I found this as one of the ways to run (using exec() method) python script from java. I have one simple print statement in python file. However, my program is doing nothing when I run it. It neither pr…

AttributeError: tuple object has no attribute dim, when feeding input to Pytorch LSTM network

I am trying to run the following code:import matplotlib.pylab as plt import numpy as np import torch import torch.nn as nnclass LSTM(nn.Module):def __init__(self, input_shape, n_actions):super(LSTM, se…

Python - Idiom to check if string is empty, print default

Im just wondering, is there a Python idiom to check if a string is empty, and then print a default if its is?(The context is Django, for the __unicode__(self) function for UserProfile - basically, I w…

Does WordNet have levels? (NLP)

For example...Chicken is an animal. Burrito is a food.WordNet allows you to do "is-a"...the hiearchy feature.However, how do I know when to stop travelling up the tree? I want a LEVEL. That …

Merge two DataFrames based on columns and values of a specific column with Pandas in Python 3.x

Hello i have a problem which i am not able to implement a solution on. I have following two DataFrames:>>> df1 A B date 1 1 01-2016 2 1 02-2017 1 2 03-2017 2 2 04-2020>>> d…

Use range as a key value in a dictionary, most efficient way?

I have been wondering if there is some kind of data-structure or clever way to use a dictionary (O(1) lookup) to return a value if there are given values for defined ranges that do not overlap. So far …

How to replace all instances of a sub-sequence in a list in Python?

I currently use this code:""" Replace all occurrences of subsequence a with b in list l """ def replace_subsequence(l,a,b):for i in range(len(l)):if(l[i:i+len(a)] == a):l…

How to initialise a 2D array in Python?

Ive been given the pseudo-code:for i= 1 to 3for j = 1 to 3board [i] [j] = 0next jnext iHow would I create this in python?(The idea is to create a 3 by 3 array with all of the elements set to 0 using a…

numpy: broadcast multiplication over one common axis of two 2d arrays

Im looking for a way to element-wise multiply two 2d arrays of shape (a, b) and (b, c), respectively. Over the b axis, which the two arrays have in common.For instance, an example of what Id like to br…