How to run script in Pyspark and drop into IPython shell when done?

2024/11/15 8:48:16

I want to run a spark script and drop into an IPython shell to interactively examine data.

Running both:

$ IPYTHON=1 pyspark --master local[2] myscript.py

and

$ IPYTHON=1 spark-submit --master local[2] myscript.py

both exit out of IPython once done.

This seems really simple, but can't find how to do it anywhere.

Answer

If you launch the iPython shell with:

$ IPYTHON=1 pyspark --master local[2]

you can do:

 >>> %run myscript.py

and all variables will stay in the workspace. You can also debug step by step with:

>>> %run -d myscript.py
https://en.xdnf.cn/q/71802.html

Related Q&A

Finding Min/Max Date with List Comprehension in Python

So I have this list:snapshots = [2014-04-05,2014-04-06,2014-04-07,2014-04-08,2014-04-09]I would like to find the earliest date using a list comprehension.Heres what I have now, earliest_date = snapshot…

plotting single 3D point on top of plot_surface in python matplotlib

I have some code to plot 3D surfaces in Python using matplotlib:import math import numpy as np import matplotlib.pyplot as plt from pylab import meshgrid,cm,imshow,contour,clabel,colorbar,axis from mpl…

python group/user management packages

I was looking for python user/group management package.(Creation of user group and adding/removing members to that group) I found flask_dashed. https://github.com/jeanphix/Flask-Dashed/ It more or less…

Resize NumPy array to smaller size without copy

When I shrink a numpy array using the resize method (i.e. the array gets smaller due to the resize), is it guaranteed that no copy is made?Example:a = np.arange(10) # array([0, 1, 2, 3, 4, …

TensorFlow FileWriter not writing to file

I am training a simple TensorFlow model. The training aspect works fine, but no logs are being written to /tmp/tensorflow_logs and Im not sure why. Could anyone provide some insight? Thank you# import…

python time.strftime %z is always zero instead of timezone offset

>>> import time >>> t=1440935442 >>> time.strftime("%Y/%m/%d-%H:%M:%S %z",time.gmtime(t)) 2015/08/30-11:50:42 +0000 >>> time.strftime("%Y/%m/%d-%H:%M:…

Python: Nested for loops or next statement

Im a rookie hobbyist and I nest for loops when I write python, like so:dict = {key1: {subkey/value1: value2} ... keyn: {subkeyn/valuen: valuen+1}}for key in dict:for subkey/value in key:do it to itIm a…

How to install cython an Anaconda 64 bits with Windows 10?

Its all in the title, does someone have a step by step method to install cython and run it on Anaconda 64 bits on Windows 10? I search for hours and there are a lot of tutorials... For things that I w…

Using DictWriter to write a CSV when the fields are not known beforehand

I am parsing a large piece of text into dictionaries, with the end objective of creating a CSV file with the keys as column headers. csv.DictWriter(csvfile, fieldnames, restval=, extrasaction=raise, di…

How to Save io.BytesIO pdfrw PDF into Django FileField

What I am trying to do is basically:Get PDF from URL Modify it via pdfrw Store it in memory as a BytesIO obj Upload it into a Django FileField via Model.objects.create(form=pdf_file, name="Some n…