Extract Text with its Font Details (Style,Size,color,Italic etc) from a PDF in Python [closed]

2024/9/30 17:35:04

I am looking to Extract Text with its Font Details (Style,Size,color,Italic etc) from a PDF in Python.

I need to extract text and its metadata for translation purpose.Can anyone suggest any libraries for the same.

Answer

There is a python library for that. Please have a look at PDFMiner.

http://www.unixuser.org/~euske/python/pdfminer/index.html.

pdftext.py gives you the text extracted out of pdf and it also gives you other information like font and font size etc.

You can try that.

Note: Python 3 is not supported

https://en.xdnf.cn/q/71056.html

Related Q&A

How to keep track of status with multiprocessing and pool.map?

Im setting up a multiprocessing module for the first time, and basically, I am planning to do something along the lines offrom multiprocessing import pool pool = Pool(processes=102) results = pool.map(…

How to get time 17:00:00 today or yesterday?

If 17:00:00 today is already passed, then it should be todays date, otherwise - yesterdays. Todays time I get with:test = datetime.datetime.now().replace(hour=17,minute=0,second=0,microsecond=0)But I d…

PyMongo Aggregate how to get executionStats

I am trying to get executionStats of a Particular mongo aggregate query. I run db.command but that doesnt give "execution status"This is what I am trying to do. how to get Python Mongo Aggreg…

Is it possible to do parallel reads on one h5py file using multiprocessing?

I am trying to speed up the process of reading chunks (load them into RAM memory) out of a h5py dataset file. Right now I try to do this via the multiprocessing library. pool = mp.Pool(NUM_PROCESSES) g…

Where is a django validator functions return value stored?

In my django app, this is my validator.py from django.core.exceptions import ValidationError from django.core.validators import URLValidatordef validate_url(value):url_validator = URLValidator()url_inv…

Modifying YAML using ruamel.yaml adds extra new lines

I need to add an extra value to an existing key in a YAML file. Following is the code Im using.with open(yaml_in_path, r) as f:doc, ind, bsi = load_yaml_guess_indent(f, preserve_quotes=True) doc[phase1…

How to get the background color of a button or label (QPushButton, QLabel) in PyQt

I am quite new to PyQt. Does anyone tell me how to get the background color of a button or label (QPushButton, QLabel) in PyQt.

Is it possible to make sql join on several fields using peewee python ORM?

Assuming we have these three models.class Item(BaseModel):title = CharField()class User(BaseModel):name = CharField()class UserAnswer(BaseModel):user = ForeignKeyField(User, user_answers)item = Foreign…

Django multiple form factory

What is the best way to deal with multiple forms? I want to combine several forms into one. For example, I want to combine ImangeFormSet and EntryForm into one form:class ImageForm(forms.Form):image =…

How to include the private key in paramiko after fetching from string?

I am working with paramiko, I have generated my private key and tried it which was fine. Now I am working with Django based application where I have already copied the private key in database.I saved m…