Read a large big-endian binary file

2024/9/28 13:18:47

I have a very large big-endian binary file. I know how many numbers in this file. I found a solution how to read big-endian file using struct and it works perfect if file is small:

    data = []file = open('some_file.dat', 'rb')for i in range(0, numcount)data.append(struct.unpack('>f', file.read(4))[0])

But this code works very slow if file size is more than ~100 mb. My current file has size 1.5gb and contains 399.513.600 float numbers. The above code works with this file an about 8 minutes.

I found another solution, that works faster:

    datafile = open('some_file.dat', 'rb').read()f_len = ">" + "f" * numcount   #numcount = 399513600numbers = struct.unpack(f_len, datafile)

This code runs in about ~1.5 minute, but this is too slow for me. Earlier I wrote the same functional code in Fortran and it run in about 10 seconds.

In Fortran I open the file with flag "big-endian" and I can simply read file in REAL array without any conversion, but in python I have to read file as a string and convert every 4 bites in float using struct. Is it possible to make the program run faster?

Answer

You can use numpy.fromfile to read the file, and specify that the type is big-endian specifying > in the dtype parameter:

numpy.fromfile(filename, dtype='>f')

There is an array.fromfile method too, but unfortunately I cannot see any way in which you can control endianness, so depending on your use case this might avoid the dependency on a third party library or be useless.

https://en.xdnf.cn/q/71332.html

Related Q&A

SWIG Python Structure Array

Ive been searching for a few days trying to figure out how to turn an array of structures into a Python list. I have a function that returns a pointer to the beginning of the array.struct foo {int memb…

Hashing tuple in Python causing different results in different systems

I was practicing tuple hashing. In there I was working on Python 2.7. Below is the code:num = int(raw_input()) num_list = [int(x) for x in raw_input().split()] print(hash(tuple(num_list)))The above cod…

ctypes pointer into the middle of a numpy array

I know how to get a ctypes pointer to the beginning of a numpy array:a = np.arange(10000, dtype=np.double) p = a.ctypes.data_as(POINTER(c_double)) p.contents c_double(0.0)however, I need to pass the po…

Extracting unsigned char from array of numpy.uint8

I have code to extract a numeric value from a python sequence, and it works well in most cases, but not for a numpy array.When I try to extract an unsigned char, I do the followingunsigned char val = b…

How to hold keys down with pynput?

Im using pynput and I would like to be able to hold keys down, specifically wasd but when I try and run this code it only presses the key and doesnt hold it for 2 seconds. If anyone knows what Im doing…

How well does your language support unicode in practice?

Im looking into new languages, kind of craving for one where I no longer need to worry about charset problems amongst inordinate amounts of other niggles I have with PHP for a new project.I tend to fin…

analogy to scipy.interpolate.griddata?

I want to interpolate a given 3D point cloud:I had a look at scipy.interpolate.griddata and the result is exactly what I need, but as I understand, I need to input "griddata" which means some…

Mongodb TTL expires documents early

I am trying insert a document into a Mongo database and have it automatically expire itself after a predetermine time. So far, my document get inserted but always get deleted from the database from 0 -…

sqlalchemy, hybrid property case statement

This is the query Im trying to produce through sqlalchemySELECT "order".id AS "id", "order".created_at AS "created_at", "order".updated_at AS "u…

Whats the newest way to develop gnome panel applets (using python)

Today Ive switched to GNOME (from XFCE) and found some of the cool stuff missing and I would like to (try to) do them on my own. I tried to find information on how to develop Gnome applets (items you p…