I have a string which has both Arabic and English sentences. What I want is to extract Arabic Sentences only.
my_string="""
What is the reason
ذَلِكَ الْكِتَابُ لَا رَيْبَ فِيهِ هُدًى لِلْمُتَّقِينَ
behind this?
ذَلِكَ الْكِتَابُ لَا رَيْبَ فِيهِ هُدًى لِلْمُتَّقِينَ
"""
This Link shows that the Unicode range for Arabic letters is 0600-06FF
.
So, very basic attempt came to my mind is:
import re
print re.findall(r'[\u0600-\u06FF]+',my_string)
But, this fails miserably as it returns the following list.
['What', 'is', 'the', 'reason', 'behind', 'this?']
As you can see, this is exactly opposite of what I want. What I am missing here?
N.B.
I know I can match the Arabic letters by using inverse matching like below:
print re.findall(r'[^a-zA-Z\s0-9]+',my_string)
But, I don't want that.