How can you read a gzipped parquet file in Python

2024/10/13 0:37:12

I need to open a gzipped file, that has a parquet file inside with some data. I am having so much trouble trying to print/read what is inside the file. I tried the following:

with gzip.open("myFile.parquet.gzip", "rb") as f:data = f.read()

This does not seem to work, as I get an error that my file id not a gz file. Thanks!

Answer

You can use read_parquet function from pandas module:

  1. Install pandas and pyarrow:
pip install pandas pyarrow
  1. use read_parquet which returns DataFrame:
data = read_parquet("myFile.parquet.gzip")
print(data.count()) # example of operation on the returned DataFrame
https://en.xdnf.cn/q/69593.html

Related Q&A

Pandas - combine row dates with column times

I have a dataframe:Date 0:15 0:30 0:45 ... 23:15 23:30 23:45 24:00 2004-05-01 3.74618 3.58507 3.30998 ... 2.97236 2.92008 2.80101 2.6067 2004-05-02 3.09098 3.846…

How to extract tables in Images

I wanted to extract tables from images.This python module https://pypi.org/project/ExtractTable/ with their website https://www.extracttable.com/pro.html doing the job very well but they have limited f…

Extract string if match the value in another list

I want to get the value of the lookup list instead of a boolean. I have tried the following codes:val = pd.DataFrame([An apple,a Banana,a cat,a dog]) lookup = [banana,dog] # I tried the follow code: va…

Automating HP Quality Center with Python or Java

We have a project that uses HP Quality Center and one of the regular issues we face is people not updating comments on the defect.So I was thinkingif we could come up with a small script or tool that c…

indexing numpy array with logical operator

I have a 2d numpy array, for instance as:import numpy as np a1 = np.zeros( (500,2) )a1[:,0]=np.arange(0,500) a1[:,1]=np.arange(0.5,1000,2) # could be also read from txtthen I want to select the indexes…

Stream multiple files into a readable object in Python

I have a function which processes binary data from a file using file.read(len) method. However, my file is huge and is cut into many smaller files 50 MBytes each. Is there some wrapper class that feeds…

AWS Python SDK | Route 53 - delete resource record

How to delete a DNS record in Route 53? I followed the documentation but I still cant make it work. I dont know if Im missing something here.Based on the documentation:DELETE : Deletes a existing reso…

How can I change to gt; and gt; to ? [duplicate]

This question already has answers here:Decode HTML entities in Python string?(7 answers)Closed 8 years ago.print u<How can I print <print > How can I print >

basemap: How to remove actual lat/lon lines while keeping the ticks on the axis

I plotted a map by basemap as below:plt.figure(figsize=(7,6)) m = Basemap(projection=cyl,llcrnrlat=40.125,urcrnrlat=44.625,\llcrnrlon=-71.875,urcrnrlon=-66.375,resolution=h) m.drawparallels(np.arange(i…

Re-initialize variables in Tensorflow

I am using a Tensorflow tf.Saver to load a pre-trained model and I want to re-train a few of its layers by erasing (re-initializing to random) their appropriate weights and biases, then training those …