Is there a way to shallow copy an existing file-object?

2024/9/8 10:22:31

The use case for this would be creating multiple generators based on some file-object without any of them trampling each other's read state.

Originally I (thought I) had a working implementation using seek() and tell() where each generator was decorated by a meta-generator which maintained the file-handle position. This worked fine on things like StringIO, but failed on real files due the to read-ahead buffer mutilating the offset.

Using readline() or otherwise mocking the real file-object isn't viable as the reason for doing this was the excessively large files prompting a generator expression in the first place. So losing the read-ahead buffer isn't really a good option (as an aside, why was Python implemented this way in the first place? Shouldn't the buffer be like a cache and not actually exposed to the user? Proper encapsulation should have prevented this tell() issue in the first place...)

I then tried to use copy.copy, but that results in something like this: <closed file '<uninitialized file>', mode '<uninitialized file>' at 0x7f722ffda810>. Which appears unusable.

Does there exist an alternative way to copy? Is there a way to initialize a file-object? Or should I give up on this use case entirely because it is not possible in Python?

Answer

You are looking for itertools.tee.

from itertools import tee
with open("somefile.txt", "r") as fh:fh1, fh2, fh3 = tee(fh, 3)

Once you call tee, do not use the parent iterator again. The iterators returned from tee may be used freely and independently, however.

For file objects specifically (to keep file-specific methods like read), you can just open a file multiple times; each file object will maintain its own file pointer as it reads the file.

fh1, fh2, fh3 = [open("somefile.txt") for i in range(3)]

or, if you already have a file object fh:

fh1, fh2, fh3 = [open(fh.name) for i in range(3)]

This doesn't preserve an already advanced file pointer, but it's easy enough to jump ahead:

for x in fh1, fh2, fh3:x.seek(fh.tell())
https://en.xdnf.cn/q/72584.html

Related Q&A

Python - Import package modules before as well as after setup.py install

Assume a Python package (e.g., MyPackage) that consists of several modules (e.g., MyModule1.py and MyModule2.py) and a set of unittests (e.g., in MyPackage_test.py).. ├── MyPackage │ ├── __ini…

Where can I find the _sre.py python built-in module? [closed]

Closed. This question is seeking recommendations for books, tools, software libraries, and more. It does not meet Stack Overflow guidelines. It is not currently accepting answers.We don’t allow questi…

When numba is effective?

I know numba creates some overheads and in some situations (non-intensive computation) it become slower that pure python. But what I dont know is where to draw the line. Is it possible to use order of …

How to launch a Windows shortcut using Python

I want to launch a shortcut named blender.ink located at "D://games//blender.ink". I have tryed using:-os.startfile ("D://games//blender.ink")But it failed, it only launches exe fil…

Providing context in TriggerDagRunOperator

I have a dag that has been triggered by another dag. I have passed through to this dag some configuration variables via the DagRunOrder().payload dictionary in the same way the official example has don…

Install gstreamer support for opencv python package

I have built my own opencv python package from source. import cv2 print(cv2.__version__)prints: 3.4.5Now the issue I am facing is regarding the use of gstreamer from the VideoCapture class of opencv. I…

How to use query function with bool in python pandas?

Im trying to do something like df.query("column == a").count()but withdf.query("column == False").count()What is the right way of using query with a bool column?

Text Extraction from image after detecting text region with contours

I want to build an OCR for an image using machine learning in python. I have preprocessed image by converting it to grayscale , applied otsu thresholding . I then used the contours to find the text re…

Pyinstaller executable fails importing torchvision

This is my main.py:import torchvision input("Press key")It runs correctly in the command line: python main.pyI need an executable for windows. So I did : pyinstaller main.pyBut when I launche…

Embedding Python in C: Having problems importing local modules

I need to run Python scripts within a C-based app. I am able to import standard modules from the Python libraries i.e.:PyRun_SimpleString("import sys")But when I try to import a local module …