How to remove english text from arabic string in python?

2024/5/20 15:52:11

I have an Arabic string with English text and punctuations. I need to filter Arabic text and I tried removing punctuations and English words using sting. However, I lost the spacing between Arabic words. Where am I wrong?

import string
exclude = set(string.punctuation)main_text = "وزارة الداخلية: لا تتوفر لدينا معلومات رسمية عن سعوديين موقوفين في ليبيا http://alriyadh.com/1031499"
main_text = ''.join(ch for ch in main_text if ch not in exclude)
[output after this step="وزارة الداخلية لا تتوفر لدينا معلومات رسمية عن سعوديين موقوفين في ليبيا httpalriyadhcom1031499]"
n = filter(lambda x: x not in string.printable, n)
print n
وزارةالداخليةلاتتوفرلدينامعلوماترسميةعنسعوديينموقوفينفيليبيا

I am able to remove punctuations and english text but I lost the space between words. How can I retain each words?

Answer

You can save the spaces in your string by using

n = filter(lambda x: True if x==' ' else x not in string.printable , main_text)

or

n = filter(lambda x: x==' ' or x not in string.printable , main_text)

This will check if the character is space, if not then it will check if it is printable.

https://en.xdnf.cn/q/72899.html

Related Q&A

python module pandas has no attribute plotting

I am a beginner of Python. I follow the machine learning course of Intel. And I encounter some troubles in coding. I run the code below in Jupyter and it raises an AttributeError.import pandas as pd st…

pandass resample with fill_method: Need to know data from which row was copied?

I am trying to use resample method to fill the gaps in timeseries data. But I also want to know which row was used to fill the missed data.This is my input series.In [28]: data Out[28]: Date 2002-09-0…

Inefficient multiprocessing of numpy-based calculations

Im trying to parallelize some calculations that use numpy with the help of Pythons multiprocessing module. Consider this simplified example:import time import numpyfrom multiprocessing import Pooldef t…

SQLite: return only top 2 results within each group

I checked other solutions to similar problems, but sqlite does not support row_number() and rank() functions or there are no examples which involve joining multiple tables, grouping them by multiple co…

Python list.append if not in list vs set.add performance [duplicate]

This question already has answers here:Which is faster and why? Set or List?(3 answers)Closed 6 years ago.Which is more performant, and what is asymptotic complexity (or are they equivalent) in Pytho…

using the hardware rng from python

Are there any ready made libraries so that the intel hardware prng (rdrand) can be used by numpy programs to fill buffers of random numbers?Failing this can someone point me in the right direction for…

How do I revert sys.stdout.close()?

In the interactive console:>>> import sys >>> sys.stdout <open file <stdout>, mode w at 0xb7810078> >>> sys.stdout.close() >>> sys.stdout # confirming th…

Find a value from x axis that correspond to y axis in matplotlib python

I am trying to do simple task such as to read values of x axis that corresponds to value of y axis in matplotlib and I cannot see what is wrong. In this case I am interested for example to find which v…

Django accessing OneToOneField

Made a view that extended User:class Client(models.Model):user = models.OneToOneField(User, related_name=user)def __unicode__(self):return "%s" % (self.user) I am currently trying to access…

Pandas DataFrame: copy the contents of a column if it is empty

I have the following DataFrame with named columns and index:a a* b b* 1 5 NaN 9 NaN 2 NaN 3 3 NaN 3 4 NaN 1 NaN 4 NaN 9 NaN 7The data…