Note: I have solve this problem as per below:
I can use to_csv to write to stdout in python / pandas. Something like this works fine:
final_df.to_csv(sys.stdout, index=False)
I would like to read in an actual excel file (not a csv). I want to output CSV, but input xlsx. I have this file
bls_df = pd.read_excel(sys.stdin, sheet_name="MSA_dl", index_col=None)
But that doesn't seem to work. Is it possible to do what I'm trying and, if so, how does one do it?
Notes:
- The actual input file is "MSA_M2018_dl.xlsx" which is in the zip file https://www.bls.gov/oes/special.requests/oesm18ma.zip.
I download and extract the datafile like this:
curl -o oesm18ma.zip'https://www.bls.gov/oes/special.requests/oesm18ma.zip'
7z x oesm18ma.zip
I have solved the problem as follows, with script test01.py that reads from stdin and writes to stdout. NOTE the use of sys.stdin.buffer in the read_excel() call.
import sys import os import pandas as pd
BLS_DF = pd.read_excel(sys.stdin.buffer, sheet_name="MSA_dl", index_col=None)
BLS_DF.to_csv(sys.stdout, index=False)
I invoke this as:
cat MSA_M2018_dl.xlsx | python3 test01.py
This is a small test program to illustrate the idea while removing complexity. It's not the actual program I'm working on.