PySpark - Create DataFrame from Numpy Matrix

2024/11/11 12:18:48

I have a numpy matrix:

arr = np.array([[2,3], [2,8], [2,3],[4,5]])

I need to create a PySpark Dataframe from arr. I can not manually input the values because the length/values of arr will be changing dynamically so I need to convert arr into a dataframe.

I tried the following code to no success.

df= sqlContext.createDataFrame(arr,["A", "B"])

However, I get the following error.

TypeError: Can not infer schema for type: <type 'numpy.ndarray'>
Answer
import numpy as np#sample data
arr = np.array([[2,3], [2,8], [2,3],[4,5]])rdd1 = sc.parallelize(arr)
rdd2 = rdd1.map(lambda x: [int(i) for i in x])
df = rdd2.toDF(["A", "B"])
df.show()

Output is:

+---+---+
|  A|  B|
+---+---+
|  2|  3|
|  2|  8|
|  2|  3|
|  4|  5|
+---+---+
https://en.xdnf.cn/q/72009.html

Related Q&A

RunTimeError during one hot encoding

I have a dataset where class values go from -2 to 2 by 1 step (i.e., -2,-1,0,1,2) and where 9 identifies the unlabelled data. Using one hot encode self._one_hot_encode(labels)I get the following error:…

Is there a Mercurial or Git version control plugin for PyScripter? [closed]

Closed. This question is seeking recommendations for books, tools, software libraries, and more. It does not meet Stack Overflow guidelines. It is not currently accepting answers.We don’t allow questi…

How to make a color map with many unique colors in seaborn

I want to make a colormap with many (in the order of hundreds) unique colors. This code: custom_palette = sns.color_palette("Paired", 12) sns.palplot(custom_palette)returns a palplot with 12 …

Swap column values based on a condition in pandas

I would like to relocate columns by condition. In case country is Japan, I need to relocate last_name and first_name reverse.df = pd.DataFrame([[France,Kylian, Mbappe],[Japan,Hiroyuki, Tajima],[Japan,…

How to improve performance on a lambda function on a massive dataframe

I have a df with over hundreds of millions of rows.latitude longitude time VAL 0 -39.20000076293945312500 140.80000305175781250000 1…

How to detect if text is rotated 180 degrees or flipped upside down

I am working on a text recognition project. There is a chance the text is rotated 180 degrees. I have tried tesseract-ocr on terminal, but no luck. Is there any way to detect it and correct it? An exa…

Infinite loops using for in Python [duplicate]

This question already has answers here:Is there an expression for an infinite iterator?(7 answers)Closed 5 years ago.Why does this not create an infinite loop? a=5 for i in range(1,a):print(i)a=a+1or…

How to print the percentage of zipping a file python

I would like to get the percentage a file is at while zipping it. For instance it will print 1%, 2%, 3%, etc. I have no idea on where to start. How would I go about doing this right now I just have the…

kafka-python read from last produced message after a consumer restart

i am using kafka-python to consume messages from a kafka queue (kafka version 0.10.2.0). In particular i am using KafkaConsumer type. If the consumer stops and after a while it is restarted i would lik…

Python lib to Read a Flash swf Format File

Im interested in using Python to hack on the data in Flash swf files. There is good documentation available on the format of swf files, and I am considering writing my own Python lib to parse that dat…