How to skip blank lines with read_fwf in pandas?

2024/10/12 0:27:16

I use pandas.read_fwf() function in Python pandas 0.19.2 to read a file fwf.txt that has the following content:

# Column1 Column2123     abc456     def#
#

My code is the following:

import pandas as pd
file_path = "fwf.txt"
widths = [len("# Column1"), len(" Column2")]
names = ["Column1", "Column2"]
data = pd.read_fwf(filepath_or_buffer=file_path, widths=widths, names=names, skip_blank_lines=True, comment="#")

The printed dataframe is like this:

    Column1 Column2
0   123.0   abc
1   NaN     NaN
2   456.0   def
3   NaN     NaN

It looks like the skip_blank_lines=True argument is ignored, as the dataframe contains NaN's.

What should be the valid combination of pandas.read_fwf() arguments that would ensure the skipping of blank lines?

Answer
import io
import pandas as pd
file_path = "fwf.txt"
widths = [len("# Column1 "), len("Column2")]
names = ["Column1", "Column2"]class FileLike(io.TextIOBase):def __init__(self, iterable):self.iterable = iterabledef readline(self):return next(self.iterable)with open(file_path, 'r') as f:lines = (line for line in f if line.strip())data = pd.read_fwf(FileLike(lines), widths=widths, names=names, comment='#')print(data)

prints

   Column1 Column2
0      123     abc
1      456     def

with open(file_path, 'r') as f:lines = (line for line in f if line.strip())

defines a generator expression (i.e. an iterable) which yields lines from the file with blank lines removed.

The pd.read_fwf function can accept TextIOBase objects. You can subclass TextIOBase so that its readline method returns lines from an iterable:

class FileLike(io.TextIOBase):def __init__(self, iterable):self.iterable = iterabledef readline(self):return next(self.iterable)

Putting these two together gives you a way to manipulate/modify lines of a file before passing them to pd.read_fwf.

https://en.xdnf.cn/q/69710.html

Related Q&A

Pandas rolling std yields inconsistent results and differs from values.std

Using pandas v1.0.1 and numpy 1.18.1, I want to calculate the rolling mean and std with different window sizes on a time series. In the data I am working with, the values can be constant for some subse…

How to change attributes of a networkx / matplotlib graph drawing?

NetworkX includes functions for drawing a graph using matplotlib. This is an example using the great IPython Notebook (started with ipython3 notebook --pylab inline):Nice, for a start. But how can I in…

Deploying MLflow Model without Conda environment

Currently working on deploying my MLflow Model in a Docker container. The Docker container is set up with all the necessary dependencies for the model so it seems redundant for MLflow to also then crea…

Insert Data to SQL Server Table using pymssql

I am trying to write the data frame into the SQL Server Table. My code:conn = pymssql.connect(host="Dev02", database="DEVDb") cur = conn.cursor() query = "INSERT INTO dbo.SCORE…

module object has no attribute discover_devices

Im trying to get Pybluez to work for me. Here is what happens when I try to discover bluetooth devises. import bluetooth nearby_devices = bluetooth.discover_devices()Traceback (most recent call last):F…

scipy sparse matrix: remove the rows whose all elements are zero

I have a sparse matrix which is transformed from sklearn tfidfVectorier. I believe that some rows are all-zero rows. I want to remove them. However, as far as I know, the existing built-in functions, e…

Time complexity for adding elements to list vs set in python

Why does adding elements to a set take longer than adding elements to a list in python? I created a loop and iterated over 1000000 elements added it to a list and a set. List is consistently taking ar…

ERROR: Could not install packages due to an EnvironmentError: [Errno 28] No space left on device

I was trying to install turicreate using pip install -U turicreate But got the error Could not install packages due to an EnvironmentError: [Errno 28] Nospace left on device.I followed all the steps on…

How to find cluster centroid with Scikit-learn [closed]

Closed. This question needs debugging details. It is not currently accepting answers.Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to repro…

How do I use the FPS argument in cv2.VideoWriter?

Ok, so I am making a video. I want to know exactly how to use the FPS argument. It is a float, so I assumed it was what interval do I want between each frame. Can you give an example? I just want to k…