Split datetime64 column into a date and time column in pandas dataframe

2024/11/18 7:46:08

If I have a dataframe with the first column being a datetime64 column. How do I split this column into 2 new columns, a date column and a time column. Here is my data and code so far:

DateTime,Actual,Consensus,Previous
20140110 13:30:00,74000,196000,241000
20131206 13:30:00,241000,180000,200000
20131108 13:30:00,200000,125000,163000
20131022 12:30:00,163000,180000,193000
20130906 12:30:00,193000,180000,104000
20130802 12:30:00,104000,184000,188000
20130705 12:30:00,188000,165000,176000
20130607 12:30:00,176000,170000,165000
20130503 12:30:00,165000,145000,138000
20130405 12:30:00,138000,200000,268000
...import pandas as pd
nfp = pd.read_csv("NFP.csv", parse_dates=[0])
nfp

Gives:

Out[10]: <class 'pandas.core.frame.DataFrame'>Int64Index: 83 entries, 0 to 82Data columns (total 4 columns):DateTime     82  non-null valuesActual       82  non-null valuesConsensus    82  non-null valuesPrevious     82  non-null valuesdtypes: datetime64[ns](1), float64(3)

All good but not sure what to do from here.

Two points specifically I am unsure about:

  1. Is it possible to do this when I read the csv file in the first place? If so, how?
  2. Can any one help show me how to do the split once I have performed csv_read?

Also is there anywhere I can look up this kind of information?

Having a hard time finding a detailed reference of the class libraries Thanks!

Answer

How to parse the CSV directly into the desired DataFrame:

Pass a dict of functions to pandas.read_csv's converters keyword argument:

import pandas as pd
import datetime as DT
nfp = pd.read_csv("NFP.csv", sep=r'[\s,]',              # 1header=None, skiprows=1,converters={               # 20: lambda x: DT.datetime.strptime(x, '%Y%m%d'),  1: lambda x: DT.time(*map(int, x.split(':')))},names=['Date', 'Time', 'Actual', 'Consensus', 'Previous'])print(nfp)

yields

        Date      Time  Actual  Consensus  Previous
0 2014-01-10  13:30:00   74000     196000    241000
1 2013-12-06  13:30:00  241000     180000    200000
2 2013-11-08  13:30:00  200000     125000    163000
3 2013-10-22  12:30:00  163000     180000    193000
4 2013-09-06  12:30:00  193000     180000    104000
5 2013-08-02  12:30:00  104000     184000    188000
6 2013-07-05  12:30:00  188000     165000    176000
7 2013-06-07  12:30:00  176000     170000    165000
8 2013-05-03  12:30:00  165000     145000    138000
9 2013-04-05  12:30:00  138000     200000    268000
  1. sep=r'[\s,]' tells read_csv to split lines of the csv on the regex pattern r'[\s,]' -- a whitespace or a comma.
  2. The converters parameter tells read_csv to apply the given functions to certain columns. The keys (e.g. 0 and 1) refer to the column index, and the values are the functions to be applied.

How to split the DataFrame after performing csv_read

import pandas as pd
nfp = pd.read_csv("NFP.csv", parse_dates=[0], infer_datetime_format=True)
temp = pd.DatetimeIndex(nfp['DateTime'])
nfp['Date'] = temp.date
nfp['Time'] = temp.time
del nfp['DateTime']print(nfp)

Which is faster?

It depends on the size of the CSV. (Thanks to Jeff for pointing this out.)

For tiny CSVs, parsing the CSV into the desired form directly is faster than using a DatetimeIndex after parsing with parse_dates=[0]:

def using_converter():nfp = pd.read_csv("NFP.csv", sep=r'[\s,]', header=None, skiprows=1,converters={0: lambda x: DT.datetime.strptime(x, '%Y%m%d'),1: lambda x: DT.time(*map(int, x.split(':')))},names=['Date', 'Time', 'Actual', 'Consensus', 'Previous'])return nfpdef using_index():nfp = pd.read_csv("NFP.csv", parse_dates=[0], infer_datetime_format=True)temp = pd.DatetimeIndex(nfp['DateTime'])nfp['Date'] = temp.datenfp['Time'] = temp.timedel nfp['DateTime']return nfpIn [114]: %timeit using_index()
100 loops, best of 3: 1.71 ms per loopIn [115]: %timeit using_converter()
1000 loops, best of 3: 914 µs per loop

However, for CSVs of just a few hundred lines or more, using a DatetimeIndex is faster.

N = 20
filename = '/tmp/data'
content = '''\
DateTime,Actual,Consensus,Previous
20140110 13:30:00,74000,196000,241000
20131206 13:30:00,241000,180000,200000
20131108 13:30:00,200000,125000,163000
20131022 12:30:00,163000,180000,193000
20130906 12:30:00,193000,180000,104000
20130802 12:30:00,104000,184000,188000
20130705 12:30:00,188000,165000,176000
20130607 12:30:00,176000,170000,165000
20130503 12:30:00,165000,145000,138000
20130405 12:30:00,138000,200000,268000'''def setup(n):header, remainder = content.split('\n', 1)with open(filename, 'w') as f:f.write('\n'.join([header]+[remainder]*n))In [304]: setup(50)In [305]: %timeit using_converter()
100 loops, best of 3: 9.78 ms per loopIn [306]: %timeit using_index()
100 loops, best of 3: 9.3 ms per loop

Where can I look up this kind of information?

  1. Sometimes you can find examples in the Pandas Cookbook.
  2. Sometimes web searching or searching Stackoverflow suffices.
  3. Spending a weekend snowed in with nothing to do but reading the pandas documentation will surely help too.
  4. Install IPython. It has tab completion and if you type a ? after a function, it gives you the function's docstring. Those two features really help you introspect Python objects quickly. It also tells you in what file the function is defined (if defined in pure Python) -- which leads me to...
  5. Reading the source code

Just keep at it. The more you know the easier it gets.

If you give it your best shot and still can't find the answer, post a question on Stackoverflow. You'll hopefully get an answer quickly, and help others searching for the same thing.

https://en.xdnf.cn/q/71084.html

Related Q&A

Django - Setting date as date input value

Im trying to set a date as the value of a date input in a form. But, as your may have guessed, its not working.Heres what I have in my template:<div class="form-group"><label for=&qu…

ReduceLROnPlateau gives error with ADAM optimizer

Is it because adam optimizer changes the learning rate by itself. I get an error saying Attempting to use uninitialized value Adam_1/lr I guess there is no point in using ReduceLRonPlateau as Adam wil…

How to make add replies to comments in Django?

Im making my own blog with Django and I already made a Comments system.. I want to add the replies for each comment (like a normal comments box) and I dont know what to do this is my current models.py …

Which Regular Expression flavour is used in Python?

I want to know which RegEx-flavour is used for Python? Is it PCRE, Perl compatible or is it ICU or something else?

Python regex: Including whitespace inside character range

I have a regular expression that matches alphabets, numbers, _ and - (with a minimum and maximum length).^[a-zA-Z0-9_-]{3,100}$I want to include whitespace in that set of characters.According to the Py…

Python - how can I override the functionality of a class before its imported by a different module?

I have a class thats being imported in module_x for instantiation, but first I want to override one of the classs methods to include a specific feature dynamically (inside some middleware that runs bef…

Calling a stateful LSTM as a functional model?

I have a stateful LSTM defined as a Sequential model:model = Sequential() model.add(LSTM(..., stateful=True)) ...Later, I use it as a Functional model:input_1, input_2 = Input(...), Input(...) output_1…

How to cluster Gantt bars without overlap?

Using create_gantt I have overlapping start and end dates: import plotly.plotly as py import plotly.figure_factory as ff import plotlydf = [dict(Task="Milestone A", Start=2017-01-01, Finish=2…

Fail to install lxml using pip

This is the command I used to install lxml:sudo pip install lxmlAnd I got the following message in the Cleaning Up stage:Cleaning up... Command /usr/bin/python -c "import setuptools, tokenize;…

Python 3.x list comprehension VS tuple generator

Is there any reason for memory, speed or whatever, that I would want to use:tuple(i for i in range(5000))instead of:[i for i in range(5000)]If I didnt mind the immutability of tuples