How to split data from a merged cell into other cells in its same row of a Python data frame?

2024/10/9 12:30:40

I have a sample of a data frame which looks like this:

+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
|   | Date                                                                                 | Professional  | Description                                |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 0 | 2019-12-19 00:00:00                                                                  | Katie Cool    | Travel to Space ...                        |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 1 | 2019-12-20 00:00:00                                                                  | Jenn Blossoms | Review stuff; prepare cancellations of ... |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 2 | 2019-12-27 00:00:00                                                                  | Jenn Blossoms | Review lots of stuff/o...                  |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 3 | 2019-12-27 00:00:00                                                                  | Jenn Blossoms | Draft email to world leader...             |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 4 | 2019-12-30 00:00:00                                                                  | Jenn Blossoms | Review this thing.                         |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 5 | 12-30-2019 Jenn Blossoms Telephone   Call   to   A.   Bell   return   her   multiple | NaN           | NaN                                        |
|   | voicemails.                                                                          |               |                                            |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+

Much of the row's data is in the date cell.

I would like for the sample to look like this:

+---+---------------------+---------------+-------------------------------------------------------------+
|   | Date                | Professional  | Description                                                 |
+---+---------------------+---------------+-------------------------------------------------------------+
| 0 | 2019-12-19 00:00:00 | Katie Cool    | Travel to Space ...                                         |
+---+---------------------+---------------+-------------------------------------------------------------+
| 1 | 2019-12-20 00:00:00 | Jenn Blossoms | Review stuff; prepare cancellations of ...                  |
+---+---------------------+---------------+-------------------------------------------------------------+
| 2 | 2019-12-27 00:00:00 | Jenn Blossoms | Review lots of stuff/o...                                   |
+---+---------------------+---------------+-------------------------------------------------------------+
| 3 | 2019-12-27 00:00:00 | Jenn Blossoms | Draft email to world leader...                              |
+---+---------------------+---------------+-------------------------------------------------------------+
| 4 | 2019-12-30 00:00:00 | Jenn Blossoms | Review this thing.                                          |
+---+---------------------+---------------+-------------------------------------------------------------+
| 5 | 12-30-2019          | Jenn Blossoms | Telephone   Call   to   A.   Bell   return   her   multiple |
|   |                     |               | voicemails.                                                 |
+---+---------------------+---------------+-------------------------------------------------------------+

I have tried this code:

date = dftopdata['Date'].str.extract('(\d{2}-\d{2}-\d{4})(\s\w+\s\w+)\s(\w+.*)')[0]
name = dftopdata['Date'].str.extract('(\d{2}-\d{2}-\d{4})(\s\w+\s\w+)\s(\w+.*)')[1]
description = dftopdata['Date'].str.extract('(\d{2}-\d{2}-\d{4})(\s\w+\s\w+)\s(\w+.*)')[2]dftopdata.loc[pd.to_datetime(dftopdata['Date'],errors='coerce').isnull(),'Professional'] = name
dftopdata.loc[pd.to_datetime(dftopdata['Date'],errors='coerce').isnull(),'Description'] = description
dftopdata.loc[pd.to_datetime(dftopdata['Date'],errors='coerce').isnull(),'Date'] = date

But when I run the above code, the data frame sample looks like this:

+---+------------+---------------+--------------------------------------------+
|   | Date       | Professional  | Description                                |
+---+------------+---------------+--------------------------------------------+
| 0 | 12/19/2019 | Katie Cool    | Travel to space ...                        |
+---+------------+---------------+--------------------------------------------+
| 1 | 12/20/2019 | Jenn Blossoms | Review stuff; prepare cancellations of ... |
+---+------------+---------------+--------------------------------------------+
| 2 | 12/27/2019 | Jenn Blossoms | Review lots of stuff/o…                    |
+---+------------+---------------+--------------------------------------------+
| 3 | 12/27/2019 | Jenn Blossoms | Draft email to world leader...             |
+---+------------+---------------+--------------------------------------------+
| 4 | 12/30/2019 | Jenn Blossoms | Review this thing.                         |
+---+------------+---------------+--------------------------------------------+
| 5 | NaN        | NaN           | NaN                                        |
+---+------------+---------------+--------------------------------------------+
Answer

You can use the str.split method to split the string into "words".

df['list_of_words'] = dftopdata['Date'].str.split()

If there is a pattern to split the Professional and Description parts from this list_of_words - you can use it. For instance, if the first 2 words of list_of_words make up the name of the professional then you can do -

df['Professional'] = df.apply(lambda x: ' '.join(x['list_of_words'][:2]), axis=1)
df['Description'] = df.apply(lambda x: ' '.join(x['list_of_words'][2:]), axis=1)
https://en.xdnf.cn/q/118586.html

Related Q&A

Collect data in chunks from stdin: Python

I have the following Python code where I collect data from standard input into a list and run syntaxnet on it. The data is in the form of json objects from which I will extract the text field and feed …

Getting and calculating stuff through tkinter widets

I was wondering how to calculate stuff using tkinter buttons. Im making a simple program to calculate seconds to hours:minutes:seconds. The user inputs an integer using the entry widget on the seconds …

Why does this condition execute to false when it should execute to true?

I have this code in my spider basic.py file:if l.add_xpath(price, //*[@id="price"]/text(),MapCompose(lambda i: i.replace(,, ), float),re = [,.0-9]):l.add_value(available, 1) else:l.add_value(…

Convert nested JSON to CSV in Python 2.7

Have seen a lot of thread but unable to found the solution for mine. I want to convert one nested JSON to CSV in Python 2.7. The sample JSON file is as below:sample.json # My JSON file that mainly cont…

How do I rectify this error: newline is invalid keyword argument for this function

Im currently working with raspberry pi and using DHT11 to read temperature and humidity values every second. I have to save these values into a database in real time. Heres my code that showing sensor …

How to remove substring from a string in python?

How can I remove the all lowercase letters before and after "Johnson" in these strings? str1 = aBcdJohnsonzZz str2 = asdVJohnsonkkkExpected results are as below:str1 = BJohnsonZ str2 = VJohn…

Try to print frame * and diagonal in python

I try to print * in frame and in diagonal .This is what I did:x=10 y=10 def print_frame(n, m, c):print c * mfor i in range(1, n - 1):print c , *(n-2-i),c, *i , c , cprint c * mprint_frame(10, 10, *)T…

How do I have an object rebound off the canvas border?

I am using the canvas widget from tkinter to create an ellipse and have it move around in the canvas. However when the ellipse comes in contact with the border it gets stuck to wall instead of bouncing…

How to scrape data using next button with ellipsis using Scrapy

I need to continuously get the data on next button <1 2 3 ... 5> but theres no provided href link in the source also theres also elipsis. any idea please? heres my codedef start_requests(self):u…

Execution Code Tracking - How to know which code has been executed in project?

Let say that I have open source project from which I would like to borrow some functionality. Can I get some sort of report generated during execution and/or interaction of this project? Report should…