Pyspark Dataframe pivot and groupby count

2024/10/4 23:31:00

I am working on a pyspark dataframe which looks like below

id	category
1	A
1	A
1	B
2	B
2	A
3	B
3	B
3	B

I want to unstack the category column and count their occurrences. So, the result I want is shown below

id	A	B
1	2	1
2	1	1
3	Null	3

I tried finding something on the internet that can help me but I couldn't find anything that could give me this specific result.

Answer

Short version, dont have to do multiple groupBy's

df.groupBy("id").pivot("category").count().show()

https://en.xdnf.cn/q/70551.html

Related Q&A

Create an excel file from BytesIO using python

I am using pandas library to store excel into bytesIO memory. Later, I am storing this bytesIO object into SQL Server as below-df = pandas.DataFrame(data1, columns=[col1, col2, col3])output = BytesIO()…

python send csv data to spark streaming

I would like to try and load a csv data in python and stream each row spark via SPark Streaming.Im pretty new to network stuff. Im not exactly if Im supposed to create a server python script that once …

Python string representation of binary data

Im trying to understand the way Python displays strings representing binary data.Heres an example using os.urandomIn [1]: random_bytes = os.urandom(4)In [2]: random_bytes Out[2]: \xfd\xa9\xbe\x87In [3]…

Combining Spark Streaming + MLlib

Ive tried to use a Random Forest model in order to predict a stream of examples, but it appears that I cannot use that model to classify the examples. Here is the code used in pyspark:sc = SparkContext…

How to select dataframe rows according to multi-(other column)-condition on columnar groups?

Copy the following dataframe to your clipboard:textId score textInfo 0 name1 1.0 text_stuff 1 name1 2.0 different_text_stuff 2 name1 2.0 text_stuff …

Python Recursive Search of Dict with Nested Keys

I recently had to solve a problem in a real data system with a nested dict/list combination. I worked on this for quite a while and came up with a solution, but I am very unsatisfied. I had to resort t…

Scrapy: how to catch download error and try download it again

During my crawling, some pages failed due to unexpected redirection and no response returned. How can I catch this kind of error and re-schedule a request with original url, not with the redirected url…

Cryptacular is broken

this weekend our docker image broke because it cannot be build anymore. While looking into the stats, I saw this line:crypt_blowfish-1.2/crypt.h:17:23: fatal error: gnu-crypt.h: No such file or directo…

how to run test against the built image before pushing to containers registry?

From the gitlab documentation this is how to create a docker image using kaniko: build:stage: buildimage:name: gcr.io/kaniko-project/executor:debugentrypoint: [""]script:- mkdir -p /kaniko/.d…

Adding a colorbar to a pcolormesh with polar projection

I am trying to add a colorbar to a pcolormesh plot with polar projection. The code works fine if I dont specify a polar projection. With polar projection specified, a tiny plot results, and the colorba…

Latest Q&A