Why is this simple Spark program not utlizing multiple cores?

2024/11/17 20:40:49

So, I'm running this simple program on a 16 core multicore system. I run it by issuing the following.

spark-submit --master local[*] pi.py

And the code of that program is the following.

#"""pi.py"""
from pyspark import SparkContext
import randomN = 12500000def sample(p):x, y = random.random(), random.random()return 1 if x*x + y*y < 1 else 0sc = SparkContext("local", "Test App")
count = sc.parallelize(xrange(0, N)).map(sample).reduce(lambda a, b: a + b)
print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)

When I use top to see CPU consumption, only 1 core is being utilized. Why is it so? Seconldy, spark documentation says that the default parallelism is contained in property spark.default.parallelism. How can I read this property from within my python program?

Answer

As none of the above really worked for me (maybe because I didn't really understand them), here is my two cents.

I was starting my job with spark-submit program.py and inside the file I had sc = SparkContext("local", "Test"). I tried to verify the number of cores spark sees with sc.defaultParallelism. It turned out that it was 1. When I changed the context initialization to sc = SparkContext("local[*]", "Test") it became 16 (the number of cores of my system) and my program was using all the cores.

I am quite new to spark, but my understanding is that local by default indicates the use of one core and as it is set inside the program, it would overwrite the other settings (for sure in my case it overwrites those from configuration files and environment variables).

https://en.xdnf.cn/q/71149.html

Related Q&A

How to get python dictionaries into a pandas time series dataframe where key is date object

I have a python dictionaries where the key is a dateobject and the value is the timeseires.timeseries = {datetime.datetime(2013, 3, 17, 18, 19): {t2: 400, t1: 1000},datetime.datetime(2013, 3, 17, 18, 2…

Changing the color of an image based on RGB value

Situation:You have an image with 1 main color and you need to convert it to another based on a given rgb value.Problem:There are a number of different, but similar shades of that color that also need …

Python NumPy - FFT and Inverse FFT?

Ive been working with FFT, and Im currently trying to get a sound waveform from a file with FFT, (modify it eventually), but then output that modified waveform back to a file. Ive gotten the FFT of the…

Tools to help developers reading class hierarchy faster

I mostly spend time on Python/Django and Objective-C/CocoaTouch and js/jQuery in the course of my daily work.My editor of choice is vim for Python/Django and js/jQuery and xcode for Objective-C/CocoaTo…

Python Last Iteration in For Loop [duplicate]

This question already has answers here:What is the pythonic way to detect the last element in a for loop?(34 answers)How do I read and write CSV files?(7 answers)Closed 9 months ago.Is there any simp…

Django 1.7 multisite User model

I want to serve a Django application that serves multiple web sites by single database but different user sets. Think like a blog application, it will be used by several domains with different themes, …

Does for key in dict in python always iterate in a fixed order?

Does the python codefor key in dict:..., where dict is a dict data type, always iterate in a fixed order with regrard to key? For example, suppose dict={"aaa":1,"bbb",2}, will the …

Kinesis Firehose lambda transformation

I have the following lambda function as part of Kinesis firehose record transformation which transforms msgpack record from the kinesis input stream to json.Lambda Runtime: python 3.6from __future__ im…

Python: find out whether a list of integers is coherent

I am trying to find out whether a list of integers is coherent or at one stretch, meaning that the difference between two neighboring elements must be exactly one and that the numbers must be increasin…

Create resizable/multiline Tkinter/ttk Labels with word wrap

Is it possible to create a multi-line label with word wrap that resizes in sync with the width of its parent? In other words the wordwrap behavior of Notepad as you change the width of the NotePad win…