Question 1

I am opening up an extremely large binary file I am opening in Python 3.5 in file1.py:

with open(pathname, 'rb') as file:for i, line in enumerate(file):# parsing here

However, I naturally get an error because I am reading the file in binary mode and then creating a list of bytes. Then with a for loop, you are comparing string to bytes and here the code fails.

If I was reading in individual lines, I would do this:

with open(fname, 'rb') as f:lines = [x.decode('utf8').strip() for x in f.readlines()]

However, I am using for index, lines in enumerate(file):. What is the correct approach in this case? Do I decode the next objects?

Here is the actual code I am running:

with open(bam_path, 'rb') as file:for i, line in enumerate(file):line_data=pd.DataFrame({k.strip():v.strip()for k,_,v in (e.partition(':')for e in line.split('\t'))}, index=[i])

And here is the error:

Traceback (most recent call last):                                                                                                File "file1.py", line 18, in <module>                                                                                        for e in line.split('\t'))}, index=[i])                                                                                       
TypeError: a bytes-like object is required, not 'str'

Question 2

You could feed a generator with the decoded lines to enumerate:

for i, line in enumerate(l.decode(errors='ignore') for l in f):

Which does the trick of yielding every line in f after decoding it. I've added errors='ignore' due to the fact that opening with r failed with an unknown start byte.

As an aside, you could just replace all string literals with byte literals when operating on bytes, i.e: partition(b':'), split(b'\t') and do your work using bytes (pretty sure pandas works fine with them).

How to decode binary file with for index, line in enumerate(file)?

Related Q&A

how to install pyshpgeocode from git [duplicate]

How to export dictionary as CSV using Python?

Passing values to a function from within a function in python

How to make Stop button to terminate start function already running in Tkinter (Python)

adding language to markdown codeblock in bulk

Cant randomize list with classes inside of it Python 2.7.4

calculate the queue for orders based on creation and delivery date, by product group

Python print with string invalid syntax

How to load images and text labels for CNN regression from different folders

How to calculate number of dates within a year of a date in pandas