How to decode binary file with for index, line in enumerate(file)?

2024/11/13 10:06:37

I am opening up an extremely large binary file I am opening in Python 3.5 in file1.py:

with open(pathname, 'rb') as file:for i, line in enumerate(file):# parsing here

However, I naturally get an error because I am reading the file in binary mode and then creating a list of bytes. Then with a for loop, you are comparing string to bytes and here the code fails.

If I was reading in individual lines, I would do this:

with open(fname, 'rb') as f:lines = [x.decode('utf8').strip() for x in f.readlines()]

However, I am using for index, lines in enumerate(file):. What is the correct approach in this case? Do I decode the next objects?

Here is the actual code I am running:

with open(bam_path, 'rb') as file:for i, line in enumerate(file):line_data=pd.DataFrame({k.strip():v.strip()for k,_,v in (e.partition(':')for e in line.split('\t'))}, index=[i])

And here is the error:

Traceback (most recent call last):                                                                                                File "file1.py", line 18, in <module>                                                                                        for e in line.split('\t'))}, index=[i])                                                                                       
TypeError: a bytes-like object is required, not 'str' 
Answer

You could feed a generator with the decoded lines to enumerate:

for i, line in enumerate(l.decode(errors='ignore') for l in f):

Which does the trick of yielding every line in f after decoding it. I've added errors='ignore' due to the fact that opening with r failed with an unknown start byte.

As an aside, you could just replace all string literals with byte literals when operating on bytes, i.e: partition(b':'), split(b'\t') and do your work using bytes (pretty sure pandas works fine with them).

https://en.xdnf.cn/q/119491.html

Related Q&A

how to install pyshpgeocode from git [duplicate]

This question already has answers here:The unauthenticated git protocol on port 9418 is no longer supported(10 answers)Closed 2 years ago.I would like to install the following from Git https://github.c…

How to export dictionary as CSV using Python?

I am having problems exporting certain items in a dictionary to CSV. I can export name but not images (the image URL).This is an example of part of my dictionary: new = [{ "name" : "pete…

Passing values to a function from within a function in python

I need to pass values from one function to the next from within the function.For example (my IRC bot programmed to respond to commands in the channel):def check_perms(nick,chan,cmd):sql = "SELECT …

How to make Stop button to terminate start function already running in Tkinter (Python)

I am making a GUI using Tkinter with two main buttons: "Start" and "Stop". Could you, please, advise on how to make the "Stop" button to terminate the already running func…

adding language to markdown codeblock in bulk

My Problem is to add to every single block of code a language in my markdown files. Ive hundreds of files in nested directories. The files have this form: ```language a ```Normal text``` b ```Normal te…

Cant randomize list with classes inside of it Python 2.7.4

I am new to coding and I need some help. Im trying to randomize these rooms or scenes in a text adventure but whenever I try to randomize it they dont even show up when I run it! Here is the script:fro…

calculate the queue for orders based on creation and delivery date, by product group

I have a Pandas dataframe containing records for a lot of orders, one recorde for each order. Each record has order_id, category_id, created_at and picked_at. I need to calculate queue length for each …

Python print with string invalid syntax

I have a rock, paper, scissors code Ive been working on lately (yes, I am a total noob at coding), and I get an Invalid Syntax error with this specific line:print(The magical 8ball reads "Your for…

How to load images and text labels for CNN regression from different folders

I have two folders, X_train and Y_train. X_train is images, Y_train is vector and .txt files. I try to train CNN for regression. I could not figure out how to take data and train the network. When i us…

How to calculate number of dates within a year of a date in pandas

I have the following dataframe and I need to calculate the amount of ER visit Dates with a score of 1 that are one year after the PheneDate for that pheneDate for a given subject. So basically phenevi…