How to expand out a Pyspark dataframe based on column?

2024/10/5 18:25:32

How do I expand a dataframe based on column values? I intend to go from this dataframe:

+---------+----------+----------+
|DEVICE_ID|  MIN_DATE|  MAX_DATE|
+---------+----------+----------+
|        1|2019-08-29|2019-08-31|
|        2|2019-08-27|2019-09-02|
+---------+----------+----------+

To one that looks like this:

+---------+----------+
|DEVICE_ID|      DATE|
+---------+----------+
|        1|2019-08-29|
|        1|2019-08-30|
|        1|2019-08-31|
|        2|2019-08-27|
|        2|2019-08-28|
|        2|2019-08-29|
|        2|2019-08-30|
|        2|2019-08-31|
|        2|2019-09-01|
|        2|2019-09-02|
+---------+----------+

Any help would be much appreciated.

Answer
from datetime import timedelta, date
from pyspark.sql.functions import udf
from pyspark.sql.types import ArrayType# Create a sample data row.
df = sqlContext.sql("""
select 'dev1' as device_id, 
to_date('2020-01-06') as start, 
to_date('2020-01-09') as end""")# Define a UDf to return a list of dates
@udf
def datelist(start, end):return ",".join([str(start + datetime.timedelta(days=x)) for x in range(0, 1+(end-start).days)])# explode the list of dates into rows
df.select("device_id", F.explode(F.split(datelist(df["start"], df["end"]), ",")).alias("date")).show(10, False)
https://en.xdnf.cn/q/119042.html

Related Q&A

How can I trigger my python script to automatically run via a ping?

I wrote a script that recurses through a set of Cisco routers in a network, and gets traffic statistics. On the router itself, I have it ping to the loopback address of my host PC, after a traffic thre…

How do I make my bot delete a message when it contains a certain word?

Okay so Im trying to make a filter for my bot, but one that isnt too complicated. Ive got this:@bot.event async def on_message(ctx,message):if fuck in Message.content.lower:Message.delete()But it gives…

pyinstaller cant find package Tix

I am trying to create an executable with pyinstaller for a python script with tix from tkinter. The following script also demonstrates the error: from tkinter import * from tkinter import tixroot = ti…

form.validate_on_submit() doesnt work(nothing happen when I submit a form)

Im creating a posting blog function for social media website and Im stuck on a problem: when I click on the "Post" button(on create_post.html), nothing happens.In my blog_posts/views.py, when…

How to find determinant of matrix using python

New at python and rusty on linear Algebra. However, I am looking for guidance on the correct way to create a determinant from a matrix in python without using Numpy. Please see the snippet of code belo…

How do I pass variables around in Python?

I want to make a text-based fighting game, but in order to do so I need to use several functions and pass values around such as damage, weapons, and health.Please allow this code to be able to pass &qu…

How to compare an item within a list of list in python

I am a newbie to python and just learning things as I do my project and here I have a list of lists which I need to compare between the second and last column and get the output for the one which has t…

Make for loop execute parallely with Pandas columns

Please convert below code to execute parallel, Here Im trying to map nested dictionary with pandas column values. The below code works perfectly but consumes lot of time. Hence looking to parallelize t…

Pre-calculate Excel formulas when exporting data with python?

The code is pulling and then putting the excel formulas and not the calculated data of the formula. xxtab.write(8, 3, "=H9+I9")When this is read in and stored in that separate file, it is sto…

Validating Tkinter Entry Box

I am trying to validate my entry box, that only accepts floats, digits, and operators (+-, %). But my program only accepts numbers not symbols. I think it is a problem with my conditions or Python Rege…