Question 1

I have a python module that has a variety of data files, (a set of csv files representing curves) that need to be loaded at runtime. The csv module works very well

  # curvefile = "ntc.10k.csv"raw = csv.reader(open(curvefile, 'rb'), delimiter=',')

But if I import this module into another script, I need to find the full path to the data file.

/project/sharedcurve.pyntc.10k.csvntc.2k5.csv/appsscript.py

I want the script.py to just refer to the curves by basic filename, not with full paths. In the module code, I can use:

pkgutil.get_data("curve", "ntc.10k.csv")

which works very well at finding the file, but it returns the csv file already read in, whereas the csv.reader requires the file handle itself. Is there any way to make these two modules play well together? They're both standard libary modules, so I wasn't really expecting problems. I know I can start splitting the pkgutil binary file data, but then I might as well not be using the csv library.

I know I can just use this in the module code, and forget about pkgutils, but it seems like pkgutils is really exactly what this is for.

this_dir, this_filename = os.path.split(__file__)
DATA_PATH = os.path.join(this_dir, curvefile)
raw = csv.reader(open(DATA_PATH, "rb"))

Question 2

I opened up the source code to get_data, and it is trivial to have it return the path to the file instead of the loaded file. This module should do the trick. Use the keyword as_string=True to return the file read into memory, or as_string=False, to return the path.

import os, sysfrom pkgutil import get_loaderdef get_data_smart(package, resource, as_string=True):
"""Rewrite of pkgutil.get_data() that actually lets the user determine if data should
be returned read into memory (aka as_string=True) or just return the file path.
"""loader = get_loader(package)
if loader is None or not hasattr(loader, 'get_data'):return None
mod = sys.modules.get(package) or loader.load_module(package)
if mod is None or not hasattr(mod, '__file__'):return None# Modify the resource name to be compatible with the loader.get_data
# signature - an os.path format "filename" starting with the dirname of
# the package's __file__
parts = resource.split('/')
parts.insert(0, os.path.dirname(mod.__file__))
resource_name = os.path.join(*parts)
if as_string:return loader.get_data(resource_name)
else:return resource_name

how to use pkgutils.get_data with csv.reader in python?

Related Q&A

How to make celery retry using the same worker?

Make an AJAX call to pass drop down value to the python script

PyLint 1.0.0 with PyDev + Eclipse: include-ids option no longer allowed, breaks Eclipse integration

Shifting all rows in dask dataframe

Pandas dataframe: omit weekends and days near holidays

How to dump a boolean matrix in numpy?

Cant append_entry FieldList in Flask-wtf more than once

What is the best way to use python code from Scala (or Java)? [duplicate]

Pandas groupby week given a datetime column

Django form to indicate input type