I would like to import a .py file that contains some modules. I have saved the files init.py and util_func.py under this folder:
/usr/local/lib/python3.4/site-packages/myutil
The util_func.py contains all the modules that i would like to use. I also need to create a pyspark udf so I can use it to transform my dataframe. My code looks like this:
import myutil
from myutil import util_func
myudf = pyspark.sql.functions.udf(util_func.ConvString, StringType())
somewhere down the code, I am using this to convert one of the columns in my dataframe:
df = df.withColumn("newcol", myudf(df["oldcol"]))
then I am trying to see if it converts it my using:
df.head()
It fails with an error "No module named myutil".
I am able to bring up the functions within ipython. Somehow the pyspark engined does not see the module. Any idea how to make sure that the pyspark engine picks up the module?