Question 1

Background:

A binary file is read on a Linux machine using the following Fortran code:

        parameter(nx=720, ny=360, nday=365)
c dimension tmax(nx,ny,nday),nmax(nx,ny,nday)dimension tmin(nx,ny,nday),nmin(nx,ny,nday)
c open(10,&file='FILE',&access='direct',recl=nx*ny*4)
cdo k=1,ndayread(10,rec=(k-1)*4+1)((tmax(i,j,k),i=1,nx),j=1,ny) read(10,rec=(k-1)*4+2)((nmax(i,j,k),i=1,nx),j=1,ny) read(10,rec=(k-1)*4+3)((tmin(i,j,k),i=1,nx),j=1,ny) read(10,rec=(k-1)*4+4)((nmin(i,j,k),i=1,nx),j=1,ny) end do

File Details:

options  little_endian
title global daily analysis (grid box mean, the grid shown is the center of the grid box)
undef -999.0
xdef 720 linear    0.25 0.50
ydef 360  linear -89.75 0.50
zdef 1 linear 1 1
tdef 365 linear 01jan2015 1dy
vars 4
tmax     1  00 daily maximum temperature (C)
nmax     1  00 number of reports for maximum temperature (C)
tmin     1  00 daily minimum temperature (C)
nmin     1  00 number of reports for minimum temperature (C)
ENDVARS

Attempts at a solution:

I am trying to parse this into an array in python using the following code (purposely leaving out two attributes):

with gzip.open("/FILE.gz", "rb") as infile:data = numpy.frombuffer(infile.read(), dtype=numpy.dtype('<f4'), count = -1)while x <= len(data) / 4:tmax.append(data[(x-1)*4])tmin.append(data[(x-1)*4 + 2])x += 1data_full = zip(tmax, tmin)

When testing some records, the data does not seem to line up with some sample records from the file when using Fortran. I have also tried dtype=numpy.float32 as well with no success. It seems as though I am reading the file in correctly in terms of number of observations though. I was also using struct before I learned the file was created with Fortran. That was not working

There are similar questions out here, some of which have answers that I have tried adapting with no luck.

UPDATE

I am a little bit closer after trying out this code:

#Define numpy variables and empty arrays
nx = 720 #number of lon
ny = 360 #number of lat
nday = 0 #iterate up to 364 (or 365 for leap year)   
tmax = numpy.empty([0], dtype='<f', order='F')
tmin = numpy.empty([0], dtype='<f', order='F')#Parse the data into numpy arrays, shifting records as the date increments
while nday < 365:tmax = numpy.append(tmax, data[(nx*ny)*nday:(nx*ny)*(nday + 1)].reshape((nx,ny), order='F'))tmin = numpy.append(tmin, data[(nx*ny)*(nday + 2):(nx*ny)*(nday + 3)].reshape((nx,ny), order='F'))nday += 1

I get the correct data for the first day, but for the second day I get all zeros, the third day the max is lower than the min, and so on.

Question 2

While the exact format of Fortran binary files is compiler dependent, in all cases I'm aware of direct access files (files opened with access='direct' as in this question) do not have any record markers between records. Each record is of a fixed size, as given by the recl= specifier in the OPEN statement. That is, the record N starts at offset (N - 1) * RECL bytes in the file.

One portability gotcha is that the unit of the recl= is in terms of file storage units. For most compilers, the file storage unit specifies the size in 8-bit octets (as recommended in recent versions of the Fortran standard), but for the Intel Fortran compiler, recl= is in units of 32 bits; there is a commandline option -assume byterecl which can be used to make Intel Fortran match most other compilers.

So in the example given here and assuming a 8-bit file storage unit, your recl would be 1036800 bytes.

Further, looking at the code, it seems to assume the arrays are of a 4-byte type (e.g. integer or single precision real). So if it's single precision real, and the file has been created in little endian, then the numpy dtype <f4 that you have used seems to be the correct choice.

Now, getting back to the Intel Fortran compiler gotcha, if the file has been created by ifort without -assume byterecl then the data you want will be in the first quarter of each record, with the rest being padding (all zeros or maybe even random data?). Then you'll have to do some extra gymnastics to extract the correct data in python and not the padding. It should be easy to check this by checking the size of the file, is it nx * ny * 4 * nday *4 or is it nx * ny * 4 * nday * 4 * 4 bytes?

Reading direct access binary file format in Python

Related Q&A

Fabric/Python: AttributeError: NoneType object has no attribute partition

Python object @property

Tkinter - Inserting text into canvas windows

How to run nosetests without showing of my matplotlibs graph?

Variable name to string in Python

In Python, how to print FULL ISO 8601 timestamp, including current timezone

TextVariable not working

Is there a Python module to get next runtime from a crontab-style time definition?

Django Generic Relations error: cannot resolve keyword content_object into field

How to benchmark unit tests in Python without adding any code