Question 1

So, I'm running this simple program on a 16 core multicore system. I run it by issuing the following.

spark-submit --master local[*] pi.py

And the code of that program is the following.

#"""pi.py"""
from pyspark import SparkContext
import randomN = 12500000def sample(p):x, y = random.random(), random.random()return 1 if x*x + y*y < 1 else 0sc = SparkContext("local", "Test App")
count = sc.parallelize(xrange(0, N)).map(sample).reduce(lambda a, b: a + b)
print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)

When I use top to see CPU consumption, only 1 core is being utilized. Why is it so? Seconldy, spark documentation says that the default parallelism is contained in property spark.default.parallelism. How can I read this property from within my python program?

Question 2

As none of the above really worked for me (maybe because I didn't really understand them), here is my two cents.

I was starting my job with spark-submit program.py and inside the file I had sc = SparkContext("local", "Test"). I tried to verify the number of cores spark sees with sc.defaultParallelism. It turned out that it was 1. When I changed the context initialization to sc = SparkContext("local[*]", "Test") it became 16 (the number of cores of my system) and my program was using all the cores.

I am quite new to spark, but my understanding is that local by default indicates the use of one core and as it is set inside the program, it would overwrite the other settings (for sure in my case it overwrites those from configuration files and environment variables).

Why is this simple Spark program not utlizing multiple cores?

Related Q&A

How to get python dictionaries into a pandas time series dataframe where key is date object

Changing the color of an image based on RGB value

Python NumPy - FFT and Inverse FFT?

Tools to help developers reading class hierarchy faster

Python Last Iteration in For Loop [duplicate]

Django 1.7 multisite User model

Does for key in dict in python always iterate in a fixed order?

Kinesis Firehose lambda transformation

Python: find out whether a list of integers is coherent

Create resizable/multiline Tkinter/ttk Labels with word wrap