Add jar to pyspark when using notebook

2024/9/22 10:25:04

I'm trying the mongodb hadoop integration with spark but can't figure out how to make the jars accessible to an IPython notebook.

Here what I'm trying to do:

# set up parameters for reading from MongoDB via Hadoop input format
config = {"mongo.input.uri": "mongodb://localhost:27017/db.collection"}
inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat"# these values worked but others might as well
keyClassName = "org.apache.hadoop.io.Text"
valueClassName = "org.apache.hadoop.io.MapWritable"# Do some reading from mongo
items = sc.newAPIHadoopRDD(inputFormatClassName, keyClassName, valueClassName, None, None, config)

This code works fine when I launch it in pyspark using the following command:

spark-1.4.1/bin/pyspark --jars 'mongo-hadoop-core-1.4.0.jar,mongo-java-driver-3.0.2.jar'

where mongo-hadoop-core-1.4.0.jar and mongo-java-driver-2.10.1.jar allows using mongodb from java. However, when I do this:

IPYTHON_OPTS="notebook" spark-1.4.1/bin/pyspark --jars 'mongo-hadoop-core-1.4.0.jar,mongo-java-driver-3.0.2.jar'

The jars are not available anymore and I get the following error:

java.lang.ClassNotFoundException: com.mongodb.hadoop.MongoInputFormat

Does anyone know how to make jars available to the spark in the IPython notebook? I'm pretty sure this is not specific to mongo so maybe someone already has succeeded in adding jars to the classpath while using the notebook?

Answer

Very similar, please let me know if this helps: https://issues.apache.org/jira/browse/SPARK-5185

https://en.xdnf.cn/q/71955.html

Related Q&A

how do I create a python list with a negative index

Im new to python and need to create a list with a negative index but have not been successful so far.Im using this code:a = [] for i in xrange( -20, 0, -1 ):a[i] = -(i)log.info(a[{i}]={v}.format(i=i, v…

Select subset of Data Frame rows based on a list in Pandas

I have a data frame df1 and list x:In [22] : import pandas as pd In [23]: df1 = pd.DataFrame({C: range(5), "B":range(10,20,2), "A":list(abcde)}) In [24]: df1 Out[24]:A B C 0 a …

convert csv to json (nested objects)

I am new to python, and I am having to convert a csv file to json in following format:CSV File :firstname, lastname, email, customerid, dateadded, customerstatus john, doe, [email protected], 124,26/11…

How can I read exactly one response chunk with pythons http.client?

Using http.client in Python 3.3+ (or any other builtin python HTTP client library), how can I read a chunked HTTP response exactly one HTTP chunk at a time?Im extending an existing test fixture (writt…

ValueError: cannot reindex from a duplicate axis in groupby Pandas

My dataframe looks like this:SKU # GRP CATG PRD 0 54995 9404000 4040 99999 1 54999 9404000 4040 99999 2 55037 9404000 4040 1556894 3 55148 9404000 4040 1556894 4 55254 94…

How to calculate class weights of a Pandas DataFrame for Keras?

Im tryingprint(Y) print(Y.shape)class_weights = compute_class_weight(balanced,np.unique(Y),Y) print(class_weights)But this gives me an error:ValueError: classes should include all valid labels that can…

How to change the layout of a Gtk application on fullscreen?

Im developing another image viewer using Python and Gtk and the viewer is currently very simple: it consists of a GtkWindow with a GtkTreeView on the left side displaying the list of my images, and a G…

How to upload multiple file in django admin models

file = models.FileField(upload_to=settings.FILE_PATH)For uploading a file in django models I used the above line. But For uploading multiple file through django admin model what should I do? I found t…

Convert numpy array to list of datetimes

I have a 2D array of dates of the form:[Y Y Y ... ] [M M M ... ] [D D D ... ] [H H H ... ] [M M M ... ] [S S S ... ]So it looks likedata = np.array([[2015, 2015, 2015, 2015, 2015, 2015], # ...[ 1, …

PyQt: how to handle event without inheritance

How can I handle mouse event without a inheritance, the usecase can be described as follows:Suppose that I wanna let the QLabel object to handel MouseMoveEvent, the way in the tutorial often goes in th…