Question 1

I have a very large big-endian binary file. I know how many numbers in this file. I found a solution how to read big-endian file using struct and it works perfect if file is small:

    data = []file = open('some_file.dat', 'rb')for i in range(0, numcount)data.append(struct.unpack('>f', file.read(4))[0])

But this code works very slow if file size is more than ~100 mb. My current file has size 1.5gb and contains 399.513.600 float numbers. The above code works with this file an about 8 minutes.

I found another solution, that works faster:

    datafile = open('some_file.dat', 'rb').read()f_len = ">" + "f" * numcount   #numcount = 399513600numbers = struct.unpack(f_len, datafile)

This code runs in about ~1.5 minute, but this is too slow for me. Earlier I wrote the same functional code in Fortran and it run in about 10 seconds.

In Fortran I open the file with flag "big-endian" and I can simply read file in REAL array without any conversion, but in python I have to read file as a string and convert every 4 bites in float using struct. Is it possible to make the program run faster?

Question 2

You can use numpy.fromfile to read the file, and specify that the type is big-endian specifying > in the dtype parameter:

numpy.fromfile(filename, dtype='>f')

There is an array.fromfile method too, but unfortunately I cannot see any way in which you can control endianness, so depending on your use case this might avoid the dependency on a third party library or be useless.

Read a large big-endian binary file

Related Q&A

SWIG Python Structure Array

Hashing tuple in Python causing different results in different systems

ctypes pointer into the middle of a numpy array

Extracting unsigned char from array of numpy.uint8

How to hold keys down with pynput?

How well does your language support unicode in practice?

analogy to scipy.interpolate.griddata?

Mongodb TTL expires documents early

sqlalchemy, hybrid property case statement

Whats the newest way to develop gnome panel applets (using python)