So, I'm running this simple program on a 16 core multicore system. I run it by issuing the following.
spark-submit --master local[*] pi.py
And the code of that program is the following.
#"""pi.py"""
from pyspark import SparkContext
import randomN = 12500000def sample(p):x, y = random.random(), random.random()return 1 if x*x + y*y < 1 else 0sc = SparkContext("local", "Test App")
count = sc.parallelize(xrange(0, N)).map(sample).reduce(lambda a, b: a + b)
print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
When I use top to see CPU consumption, only 1 core is being utilized. Why is it so? Seconldy, spark documentation says that the default parallelism is contained in property spark.default.parallelism. How can I read this property from within my python program?