I'm running Spark 1.4.1 on my local Mac laptop and am able to use pyspark
interactively without any issues. Spark was installed through Homebrew and I'm using Anaconda Python. However, as soon as I try to use spark-submit
, I get the following error:
15/09/04 08:51:09 ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: Added file file:test.py does not exist.at org.apache.spark.SparkContext.addFile(SparkContext.scala:1329)at org.apache.spark.SparkContext.addFile(SparkContext.scala:1305)at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458)at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458)at scala.collection.immutable.List.foreach(List.scala:318)at org.apache.spark.SparkContext.<init>(SparkContext.scala:458)at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:422)at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)at py4j.Gateway.invoke(Gateway.java:214)at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)at py4j.GatewayConnection.run(GatewayConnection.java:207)at java.lang.Thread.run(Thread.java:745)
15/09/04 08:51:09 ERROR SparkContext: Error stopping SparkContext after init error.
java.lang.NullPointerExceptionat org.apache.spark.network.netty.NettyBlockTransferService.close(NettyBlockTransferService.scala:152)at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1216)at org.apache.spark.SparkEnv.stop(SparkEnv.scala:96)at org.apache.spark.SparkContext.stop(SparkContext.scala:1659)at org.apache.spark.SparkContext.<init>(SparkContext.scala:565)at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)at sun.reflect.NativeConstructorAccessorImpl.newInstance0(NativeMethod)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:422)at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)at py4j.Gateway.invoke(Gateway.java:214)at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)at py4j.GatewayConnection.run(GatewayConnection.java:207)at java.lang.Thread.run(Thread.java:745)
Traceback (most recent call last):File "test.py", line 35, in <module> sc = SparkContext("local","test") File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 113, in __init__File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 165, in _do_initFile "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 219, in _initialize_contextFile "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 701, in __call__File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.io.FileNotFoundException: Added file file:test.py does not exist.at org.apache.spark.SparkContext.addFile(SparkContext.scala:1329)at org.apache.spark.SparkContext.addFile(SparkContext.scala:1305)at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458)at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458)at scala.collection.immutable.List.foreach(List.scala:318)at org.apache.spark.SparkContext.<init>(SparkContext.scala:458)at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:422)at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)at py4j.Gateway.invoke(Gateway.java:214)at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)at py4j.GatewayConnection.run(GatewayConnection.java:207)at java.lang.Thread.run(Thread.java:745)
Here is my code:
from pyspark import SparkContextif __name__ == "__main__":sc = SparkContext("local","test")sc.parallelize([1,2,3,4])sc.stop()
If I move the file to anywhere in the /usr/local/Cellar/apache-spark/1.4.1/
directory, then spark-submit
works fine. I have my environment variables set as follows:
export SPARK_HOME="/usr/local/Cellar/apache-spark/1.4.1"
export PATH=$SPARK_HOME/bin:$PATH
export PYTHONPATH=$SPARK_HOME/libexec/python:$SPARK_HOME/libexec/python/lib/py4j-0.8.2.1-src.zip
I'm sure something is set incorrectly in my environment, but I can't seem to track it down.