Extract a region of a PDF page by coordinates

2024/9/25 18:06:20

I am looking for a tool to extract a given rectangular region (by coordinates) of a 1-page PDF file and produce a 1-page PDF file with the specified region:

# in.pdf is a 1-page pdf file
extract file.pdf 0 0 100 100 > out.pdf
# out.pdf is now a 1-page pdf file with a page of size 100x100
# it contains the region (0, 0) to (100, 100) of file.pdf

I could convert the PDF to an image and use convert, but this would mean that the resulting PDF would not be vectorial anymore, which is not acceptable (I want to be able to zoom).

I would ideally like to perform this task with a command-line tool or a Python library.

Thanks!

Answer

using pyPdf, you could do something like this:

import sys
import pyPdfdef extract(in_file, coords, out_file):with open(in_file, 'rb') as infp:reader = pyPdf.PdfFileReader(infp)page = reader.getPage(0)writer = pyPdf.PdfFileWriter()page.mediaBox.lowerLeft = coords[:2]page.mediaBox.upperRight = coords[2:]# you could do the same for page.trimBox and page.cropBoxwriter.addPage(page)with open(out_file, 'wb') as outfp:writer.write(outfp)if __name__ == '__main__':in_file = sys.argv[1]coords = [int(i) for i in sys.argv[2:6]]out_file = sys.argv[6]extract(in_file, coords, out_file)
https://en.xdnf.cn/q/71498.html

Related Q&A

Is it possible to concatenate QuerySets?

After a search of a database I end up with an array of querysets. I wanted to concatenate these queryset somewhat like we can do with list elements. Is this possible or maybe there an altogether better…

Pickling a Python Extension type defined as a C struct having PyObject* members

I am running C++ code via Python and would like to pickle an extension type.So I have a C++ struct (py_db_manager) containing pointers to a database object and a object manager object (both written in …

Generating random ID from list - jinja

I am trying to generate a random ID from a list of contacts (in Python, with jinja2) to display in an HTML template. So I have a list of contacts, and for the moment I display all of them in a few cell…

Unit testing Flask app running under uwsgi

I’m relatively new to python and am looking for a pythonic way to handle this practice. I’ve inherited a fairly trivial Python 2.7 Flask app that runs under uwsgi that I want to add some unit tests t…

fastest way to find the smallest positive real root of quartic polynomial 4 degree in python

[What I want] is to find the only one smallest positive real root of quartic function ax^4 + bx^3 + cx^2 + dx + e [Existing Method] My equation is for collision prediction, the maximum degree is quarti…

Split strings by 2nd space

Input :"The boy is running on the train"Output expected:["The boy", "boy is", "is running", "running on", "on the", "the train"]Wha…

Searching for a random python program generator

Im searching for a program that can generate random but valid python programs, similar to theRandom C program generator. I was trying to do this myself giving random input to the python tokenize.untoke…

Python tk framework

I have python code that generates the following error:objc[36554]: Class TKApplication is implemented in both /Library/Frameworks/Tk.framework/Versions/8.5/Tk and /System/Library/Frameworks/Tk.framewor…

SQLAlchemy relationship on many-to-many association table

I am trying to build a relationship to another many-to-many relationship, the code looks like this: from sqlalchemy import Column, Integer, ForeignKey, Table, ForeignKeyConstraint, create_engine from …

Python: interpolating in a triangular mesh

Is there any decent Pythonic way to interpolate in a triangular mesh, or would I need to implement that myself? That is to say, given a (X,Y) point well call P, and a mesh (vertices at (X,Y) with val…