I'm trying the mongodb hadoop integration with spark but can't figure out how to make the jars accessible to an IPython notebook.
Here what I'm trying to do:
# set up parameters for reading from MongoDB via Hadoop input format
config = {"mongo.input.uri": "mongodb://localhost:27017/db.collection"}
inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat"# these values worked but others might as well
keyClassName = "org.apache.hadoop.io.Text"
valueClassName = "org.apache.hadoop.io.MapWritable"# Do some reading from mongo
items = sc.newAPIHadoopRDD(inputFormatClassName, keyClassName, valueClassName, None, None, config)
This code works fine when I launch it in pyspark using the following command:
spark-1.4.1/bin/pyspark --jars 'mongo-hadoop-core-1.4.0.jar,mongo-java-driver-3.0.2.jar'
where mongo-hadoop-core-1.4.0.jar
and mongo-java-driver-2.10.1.jar
allows using mongodb from java. However, when I do this:
IPYTHON_OPTS="notebook" spark-1.4.1/bin/pyspark --jars 'mongo-hadoop-core-1.4.0.jar,mongo-java-driver-3.0.2.jar'
The jars are not available anymore and I get the following error:
java.lang.ClassNotFoundException: com.mongodb.hadoop.MongoInputFormat
Does anyone know how to make jars available to the spark in the IPython notebook? I'm pretty sure this is not specific to mongo so maybe someone already has succeeded in adding jars to the classpath while using the notebook?