Moving Spark DataFrame from Python to Scala whithn Zeppelin

2024/10/15 21:14:34

I created a spark DataFrame in a Python paragraph in Zeppelin.

sqlCtx = SQLContext(sc)
spDf = sqlCtx.createDataFrame(df)

and df is a pandas dataframe

print(type(df))
<class 'pandas.core.frame.DataFrame'>

what I want to do is moving spDf from one Python paragraph to another Scala paragraph. It look a reasonable way to do is using z.put.

z.put("spDf", spDf)

and I got this error:

AttributeError: 'DataFrame' object has no attribute '_get_object_id'

Any suggestion to fix the error? Or any suggestion to move spDf?

Answer

You canput internal Java object not a Python wrapper:

%pysparkdf = sc.parallelize([(1, "foo"), (2, "bar")]).toDF(["k", "v"])
z.put("df", df._jdf)

and then make sure you use correct type:

val df = z.get("df").asInstanceOf[org.apache.spark.sql.DataFrame]
// df: org.apache.spark.sql.DataFrame = [k: bigint, v: string]

but it is better to register temporary table:

%pyspark# registerTempTable in Spark 1.x
df.createTempView("df")

and use SQLContext.table to read it:

// sqlContext.table in Spark 1.x
val df = spark.table("df")
df: org.apache.spark.sql.DataFrame = [k: bigint, v: string]

To convert in the opposite direction see Zeppelin: Scala Dataframe to python

https://en.xdnf.cn/q/69238.html

Related Q&A

How do I efficiently do a bulk insert-or-update with SQLAlchemy?

Im using SQLAlchemy with a Postgres backend to do a bulk insert-or-update. To try to improve performance, Im attempting to commit only once every thousand rows or so:trans = engine.begin()for i, rec in…

How to pass variables from javascript to python in Jupyter?

As I understand it, I should be able to print the variable foo in the snippet below. from IPython.display import HTML HTML(<script type="text/javascript">IPython.notebook.kernel.execute…

SVR Model --Feature Scaling - Expected 2D array, got 1D array instead

I am trying to understand what is wrong with the code below. I know that the Y variable is 1D array and expected to be 2D array and need to reshape the structure but that code was working previously fi…

How to find the version of jupyter notebook from within the notebook

I wish to return the version of Jupyter Notebook from within a cell of a notebook. For example, to get the python version, I run: from platform import python_version python_version()or to get the panda…

Python logging - multiple modules

Im working on a small python project that has the following structure -project -- logs-- project__init.py__classA.pyclassB.pyutils.py-- main.pyIve set up the logging configuration in __init.py__ under …

Can you search backwards from an offset using a Python regular expression?

Given a string, and a character offset within that string, can I search backwards using a Python regular expression?The actual problem Im trying to solve is to get a matching phrase at a particular of…

Django AttributeError: Form object has no attribute _errors

Im overriding the init method in my form andthis is now returning an error TransactionForm object has no attribute _errors. I would expect this to work because Ive included super in my init, however pe…

Add new keys to a dictionary while incrementing existing values

I am processing a CSV file and counting the unique values of column 4. So far I have coded this three ways. One uses "if key in dictionary", the second traps the KeyError and the third uses &…

ImportError: cannot import name aiplatform from google.cloud (unknown location)

I was wondering where that error comes from. The package has to be installed additionally to google.cloud

What does : TypeError: cannot concatenate str and list objects mean?

What does this error mean?TypeError: cannot concatenate str and list objectsHeres part of the code:for j in (90.,52.62263.,26.5651.,10.8123.):if j == 90.:z = (0.)elif j == 52.62263.:z = (0., 72., 144.…