pyspark row number dataframe

2024/10/16 2:30:33

I have a dataframe, with columns time,a,b,c,d,val. I would like to create a dataframe, with additional column, that will contain the row number of the row, within each group, where a,b,c,d is a group key.

I tried with spark sql, by defining a window function, in particular, in sql it will look like this:

select time, a,b,c,d,val, row_number() over(partition by a,b,c,d order by     time) as rn from table
group by a,b,c,d,val

I would like to do this on the dataframe itslef, without using sparksql.

Thanks

Answer

I don't know the python api too much, but I will give it a try. You can try something like:

from pyspark.sql import functions as Fdf.withColumn("row_number", F.row_number().over(Window.partitionBy("a","b","c","d").orderBy("time"))).show()
https://en.xdnf.cn/q/69211.html

Related Q&A

Python mysql-connector hangs indefinitely when connecting to remote mysql via SSH

I am Testing out connection to mysql server with python. I need to ssh into the server and establish a mysql connection. The following code works: from sshtunnel import SSHTunnelForwarder import pymysq…

Smooth the edges of binary images (Face) using Python and Open CV

I am looking for a perfect way to smooth edges of binary images. The problem is the binary image appears to be a staircase like borders which is very unpleasing for my further masking process. I am att…

Is there some way to save best model only with tensorflow.estimator.train_and_evaluate()?

I try retrain TF Object Detection API model from checkpoint with already .config file for training pipeline with tf.estimator.train_and_evaluate() method like in models/research/object_detection/model_…

Matching words with NLTKs chunk parser

NLTKs chunk parsers regular expressions can match POS tags, but can they also match specific words? So, suppose I want to chunk any structure with a noun followed by the verb "left" (call th…

How to create a dual-authentication HTTPS client in Python without (L)GPL libs?

Both the client and the server are internal, each has a certificate signed by the internal CA and the CA certificate. I need the client to authenticate the servers certificate against the CA certificat…

Generate a certificate for .exe created by pyinstaller

I wrote a script for my company that randomly selects employees for random drug tests. It works wonderfully, except when I gave it to the person who would use the program. She clicked on it and a messa…

Some doubts modelling some features for the libsvm/scikit-learn library in python

I have scraped a lot of ebay titles like this one:Apple iPhone 5 White 16GB Dual-Coreand I have manually tagged all of them in this wayB M C S NAwhere B=Brand (Apple) M=Model (iPhone 5) C=Color (White)…

Python ReportLab use of splitfirst/splitlast

Im trying to use Python with ReportLab 2.2 to create a PDF report. According to the user guide,Special TableStyle Indeces [sic]In any style command the first row index may be set to one of the special …

Extract specific section from LaTeX file with python

I have a set of LaTeX files. I would like to extract the "abstract" section for each one: \begin{abstract}.....\end{abstract}I have tried the suggestion here: How to Parse LaTex fileAnd tried…

Installing LXML, facing a legacy-install-failure error

Trying to install lxml on Python 311. Faced with this error. PS C:\Users\chharlie\Desktop\code> pip install lxml Collecting lxmlUsing cached lxml-4.9.1.tar.gz (3.4 MB)Preparing metadata (setup.py) .…