Pandas read csv dateint columns to datetime

2024/4/15 2:02:55

I'm new to both StackOverflow and pandas. I am trying to read in a large CSV file with stock market bin data in the following format:

date,time,open,high,low,close,volume,splits,earnings,dividends,sym
20130625,715,49.2634,49.2634,49.2634,49.2634,156.293,1,0,0,JPM
20130625,730,49.273,49.273,49.273,49.273,208.39,1,0,0,JPM
20130625,740,49.1866,49.1866,49.1866,49.1866,224.019,1,0,0,JPM
20130625,745,49.321,49.321,49.321,49.321,208.39,1,0,0,JPM
20130625,750,49.3306,49.369,49.3306,49.369,4583.54,1,0,0,JPM
20130625,755,49.369,49.369,49.369,49.369,416.78,1,0,0,JPM
20130625,800,49.369,49.369,49.3594,49.3594,1715.05,1,0,0,JPM
20130625,805,49.369,49.369,49.3306,49.3306,1333.7,1,0,0,JPM
20130625,810,49.3306,49.3786,49.3306,49.3786,1567.09,1,0,0,JPM

I have the following code to read it into a DataFrame in Pandas

import numpy as np
import scipy as sp
import pandas as pd
import datetime as dt
fname  = 'bindat.csv'
df     = pd.read_csv(fname, header=0, sep=',')

The problem is that the date and time columns are read in as int64. I would like to merge these two to a single timestamp such as: 2013-06-25 07:15:00.

I am struggling to even get the time read in properly using:

df['date'] = pd.to_datetime(df['date'].astype(str))
df['time'] = pd.to_datetime(df['time'].astype(str))

The first command works to convert, but the time seems weird.

df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 9999 entries, 0 to 9998
Data columns (total 11 columns):
date         9999 non-null datetime64[ns]
time         9999 non-null object
open         9999 non-null float64
high         9999 non-null float64
low          9999 non-null float64
close        9999 non-null float64
volume       9999 non-null float64
splits       9999 non-null float64
earnings     9999 non-null int64
dividends    9999 non-null float64
sym          9999 non-null object
dtypes: datetime64[ns](1), float64(7), int64(1), object(2)None

And then I'll want to merge into a single DatetimeIndex.

Any suggestions are greatly appreciated.

Cheers!

Answer

There are quite a few ways to do this. One way to do it during read_csv would be to use the parse_dates and date_parser arguments, telling parse_dates to combine the date and time columns and defining an inline function to parse the dates:

>>> df = pd.read_csv("bindat.csv", parse_dates=[["date", "time"]],
date_parser=lambda x: pd.to_datetime(x, format="%Y%m%d %H%M"), 
index_col="date_time")
>>> dfopen     high      low    close    volume  splits  earnings  dividends  sym
date_time                                                                                          
2013-06-25 07:15:00  49.2634  49.2634  49.2634  49.2634   156.293       1         0          0  JPM
2013-06-25 07:30:00  49.2730  49.2730  49.2730  49.2730   208.390       1         0          0  JPM
2013-06-25 07:40:00  49.1866  49.1866  49.1866  49.1866   224.019       1         0          0  JPM
2013-06-25 07:45:00  49.3210  49.3210  49.3210  49.3210   208.390       1         0          0  JPM
2013-06-25 07:50:00  49.3306  49.3690  49.3306  49.3690  4583.540       1         0          0  JPM
2013-06-25 07:55:00  49.3690  49.3690  49.3690  49.3690   416.780       1         0          0  JPM
2013-06-25 08:00:00  49.3690  49.3690  49.3594  49.3594  1715.050       1         0          0  JPM
2013-06-25 08:05:00  49.3690  49.3690  49.3306  49.3306  1333.700       1         0          0  JPM
2013-06-25 08:10:00  49.3306  49.3786  49.3306  49.3786  1567.090       1         0          0  JPM
2013-06-25 16:10:00  49.3306  49.3786  49.3306  49.3786  1567.090       1         0          0  JPM

where I added an extra row at the end to make sure that hours were behaving.

https://en.xdnf.cn/q/73181.html

Related Q&A

Pydantic - Dynamically create a model with multiple base classes?

From the pydantic docs I understand this: import pydanticclass User(pydantic.BaseModel):id: intname: strclass Student(pydantic.BaseModel):semester: int# this works as expected class Student_User(User, …

Handling nested elements with Python lxml

Given the simple XML data below:<book><title>My First Book</title><abstract><para>First paragraph of the abstract</para><para>Second paragraph of the abstract&…

Easiest way to plot data on country map with python

Could not delete question. Please refer to question: Shade states of a country according to dictionary values with Basemap I want to plot data (number of sick people for a certain year) on each state o…

How to resize QMainWindow after removing all DockWidgets?

I’m trying to make an application consisting of a QMainWindow, the central widget of which is a QToolBar (it may not be usual, but for my purpose the toolbar’s well suited). Docks are allowed below o…

Python: sorting a list by column [duplicate]

This question already has answers here:How to sort a list/tuple of lists/tuples by the element at a given index(11 answers)Closed 8 years ago.How can I sort a list-of-lists by "column", i.e. …

How to make setuptools clone git dependencies recursively?

I want to let setuptools install Phoenix in my project and thus addedsetup(...dependency_links = ["git+https://github.com/wxWidgets/Phoenix.git#egg=Phoenix"],install_requires = ["Phoenix…

Stable sorting in Jinja2

It is possible to apply the sort filter in Jinja2 successively to sort a list first by one attribute, then by another? This seems like a natural thing to do, but in my testing, the preceeding sort is …

Factor to complex roots using sympy

I cant figure out how to factor an polynomial expression to its complex roots.>>> from sympy import * >>> s = symbol(s) >>> factor(s**2+1)2 s + 1

Using multiple custom classes with Pipeline sklearn (Python)

I try to do a tutorial on Pipeline for students but I block. Im not an expert but Im trying to improve. So thank you for your indulgence. In fact, I try in a pipeline to execute several steps in prepar…

Python equivalent of pointers

In python everything works by reference:>>> a = 1 >>> d = {a:a} >>> d[a] 1 >>> a = 2 >>> d[a] 1I want something like this>>> a = 1 >>> d =…