Spark-submit fails to import SparkContext

2024/11/16 7:19:56

I'm running Spark 1.4.1 on my local Mac laptop and am able to use pyspark interactively without any issues. Spark was installed through Homebrew and I'm using Anaconda Python. However, as soon as I try to use spark-submit, I get the following error:

15/09/04 08:51:09 ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: Added file file:test.py does not exist.at org.apache.spark.SparkContext.addFile(SparkContext.scala:1329)at org.apache.spark.SparkContext.addFile(SparkContext.scala:1305)at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458)at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458)at scala.collection.immutable.List.foreach(List.scala:318)at org.apache.spark.SparkContext.<init>(SparkContext.scala:458)at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:422)at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)at py4j.Gateway.invoke(Gateway.java:214)at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)at py4j.GatewayConnection.run(GatewayConnection.java:207)at java.lang.Thread.run(Thread.java:745)
15/09/04 08:51:09 ERROR SparkContext: Error stopping SparkContext after init error.
java.lang.NullPointerExceptionat org.apache.spark.network.netty.NettyBlockTransferService.close(NettyBlockTransferService.scala:152)at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1216)at org.apache.spark.SparkEnv.stop(SparkEnv.scala:96)at org.apache.spark.SparkContext.stop(SparkContext.scala:1659)at org.apache.spark.SparkContext.<init>(SparkContext.scala:565)at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)at sun.reflect.NativeConstructorAccessorImpl.newInstance0(NativeMethod)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:422)at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)at py4j.Gateway.invoke(Gateway.java:214)at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)at py4j.GatewayConnection.run(GatewayConnection.java:207)at java.lang.Thread.run(Thread.java:745)
Traceback (most recent call last):File "test.py", line 35, in <module> sc = SparkContext("local","test") File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 113, in __init__File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 165, in _do_initFile "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 219, in _initialize_contextFile "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 701, in __call__File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.io.FileNotFoundException: Added file file:test.py does not exist.at org.apache.spark.SparkContext.addFile(SparkContext.scala:1329)at org.apache.spark.SparkContext.addFile(SparkContext.scala:1305)at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458)at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458)at scala.collection.immutable.List.foreach(List.scala:318)at org.apache.spark.SparkContext.<init>(SparkContext.scala:458)at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:422)at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)at py4j.Gateway.invoke(Gateway.java:214)at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)at py4j.GatewayConnection.run(GatewayConnection.java:207)at java.lang.Thread.run(Thread.java:745)

Here is my code:

from pyspark import SparkContextif __name__ == "__main__":sc = SparkContext("local","test")sc.parallelize([1,2,3,4])sc.stop()

If I move the file to anywhere in the /usr/local/Cellar/apache-spark/1.4.1/ directory, then spark-submit works fine. I have my environment variables set as follows:

export SPARK_HOME="/usr/local/Cellar/apache-spark/1.4.1"
export PATH=$SPARK_HOME/bin:$PATH
export PYTHONPATH=$SPARK_HOME/libexec/python:$SPARK_HOME/libexec/python/lib/py4j-0.8.2.1-src.zip

I'm sure something is set incorrectly in my environment, but I can't seem to track it down.

Answer

The python files that are executed by spark-submit should be on the PYTHONPATH. Either add the full path of the directory by doing:

export PYTHONPATH=full/path/to/dir:$PYTHONPATH

or you can also add '.' to the PYTHONPATH if you are already inside the directory where the python script is

export PYTHONPATH='.':$PYTHONPATH

Thanks to @Def_Os for pointing that out!

https://en.xdnf.cn/q/71370.html

Related Q&A

Is there a Python API for event-driven Kafka consumer?

I have been trying to build a Flask app that has Kafka as the only interface. For this reason, I want have a Kafka consumer that is triggered when there is new message in the stream of the concerned to…

SWIG python initialise a pointer to NULL

Is it possible to initialise a ptr to NULL from the python side when dealing with SWIG module?For example, say I have wrapped a struct track_t in a swig module m (_m.so), I can create a pointer to the…

Replacing punctuation in a data frame based on punctuation list [duplicate]

This question already has answers here:Fast punctuation removal with pandas(4 answers)Closed 5 years ago.Using Canopy and Pandas, I have data frame a which is defined by:a=pd.read_csv(text.txt)df=pd.Da…

How to import one submodule from different submodule? [duplicate]

This question already has answers here:Relative imports for the billionth time(14 answers)Closed 6 years ago.My project has the following structure:DSTC/st/__init__.pya.pyg.pytb.pydstc.pyHere is a.py i…

How to add dimension to a tensor using Tensorflow

I have method reformat in which using numpy I convert a label(256,) to label(256,2) shape. Now I want to do same operation on a Tensor with shape (256,)My code looks like this (num_labels=2) :--def ref…

Down arrow symbol in matplotlib

I would like to create a plot where some of the points have a downward pointing arrow (see image below). In Astronomy this illustrates that the true value is actually lower than whats measured.Note tha…

Overwrite the previous print value in python?

How can i overwrite the previous "print" value in python?print "hello" print "dude" print "bye"It will output:hello dude byeBut i want to overwrite the value.In…

pyQt4 - How to select table rows and disable editing cells

I create a QTableWidget with:self.table = QtGui.QTableWidget() self.table.setObjectName(table) self.table.setSelectionBehavior(QtGui.QAbstractItemView.SelectRows) verticalLayout.addWidget(self.table)wi…

Error when checking input: expected dense_input to have shape (21,) but got array with shape (1,)

How to fix the input array to meet the input shape?I tried to transpose the input array, as described here, but an error is the same.ValueError: Error when checking input: expected dense_input to have…

Sort order when loading related objects using selectinload in SQLAlchemy

Is there a way to specify the sort order when loading related objects using the selectinload option in SQLAlchemy?My SQLAlchemy version: 1.2.10 My python version: 3.6.6