manipulating value of pandas dataframe cell based on value in previous row without iteration

2024/10/12 10:21:53

I have a pandas dataframe with~3900 rows and 6 columns compiled from Google Finance . One of these columns defines a time in unix format, specifically defining a time during the trading day for a market. In this case the DJIA from 930A EST to 4P EST. However, only the cell for the beginning of each day (930A) has the complete unix time stamp (prefixed with an 'a') and the others are the minutes after the first time of the day.

Here is an example of the raw data:

          Date   Close    High     Low    Open  Volume
0  a1450449000  173.87  173.87  173.83  173.87   46987
1            1  173.61  173.83  173.55  173.78   19275
2            2  173.37  173.63  173.37  173.60   16014
3            3  173.50  173.59  173.31  173.34   14198
4            4  173.50  173.57  173.46  173.52    7010Date   Close    High     Low    Open  Volume
388          388  171.16  171.27  171.15  171.26   11809
389          389  171.11  171.23  171.07  171.18   30449
390          390  170.89  171.16  170.89  171.09  163937
391  a1450708200  172.28  172.28  172.28  172.28   23880
392            1  172.27  172.27  172.00  172.06    2719

The change at index 391 is not contiguous such that a solution like @Stefan's would unfortunately not correctly adjust the Date value.

I can easily enough go through with a lambda and line by line remove the 'a' (if necessary) convert the values to an integer and convert the minutes past 930A into seconds with the following code:

import pandas as pd
import numpy as np
import datetimebars = pd.read_csv(r'http://www.google.com/finance/getprices?i=60&p=10d&f=d,o,h,l,c,v&df=cpct&q=DIA', skiprows=7, header=None, names=['Date', 'Close', 'High', 'Low', 'Open', 'Volume'])bars['Date'] = bars['Date'].map(lambda x: int(x[1:]) if x[0] == 'a' else int(x))    
bars['Date'] = bars['Date'].map(lambda u: u * 60 if u < 400 else u)

Now what I would like to do is, without iterating over the dataframe, determine if the value of bars['Date'] is not a unix time stamp (e.g. < 24000 in the terms of this data set). If so I want to add that value to the time stamp for that particular day to create a complete unix time stamp for each entry.

I know that I can compare the previous row via:

bars['Date'][:-1]>bars['Date'][1:]

I feel like that would be the way to go but I cant figure out a way to use this in a function as it returns a series.

Thanks in advance for any help!

Answer

You could add a new column that always contains the latest Timestamp and then add to the Date where necessary.

threshold = 24000
bars['Timestamp'] = bars[bars['Date']>threshold].loc[:, 'Date']
bars['Timestamp'] = bars['Timestamp'].fillna(method='ffill')
bars['Date'] = bars.apply(lambda x: x.Date + x.Timestamp if x.Date < threshold else x.Date, axis=1)
bars.drop('Timestamp', axis=1, inplace=True)

to get:

            Date   Close     High     Low    Open  Volume
0     1450449000  173.87  173.870  173.83  173.87   46987
1     1450449060  173.61  173.830  173.55  173.78   19275
2     1450449120  173.37  173.630  173.37  173.60   16014
3     1450449180  173.50  173.590  173.31  173.34   14198
4     1450449240  173.50  173.570  173.46  173.52    7010
5     1450449300  173.66  173.680  173.44  173.45   10597
6     1450449360  173.40  173.670  173.34  173.67   14270
7     1450449420  173.36  173.360  173.13  173.32   22485
8     1450449480  173.29  173.480  173.25  173.36   18542
https://en.xdnf.cn/q/118209.html

Related Q&A

Convert ctypes code to cython

Id like to convert some ctypes code to use cython instead, but Im struggling. Essentially, the ctypes code:copies the contents (floats) of two lists into C-compatible structs sends the structs to my b…

Enable PyROOT Ubuntu 14.04

I downloaded madpgraph5, but when I run it I get the following error:ERROR: ROOT file called ROOT.py or ROOT.pyc is not foundERROR: Please check that ROOT is properly installed.When I try locate ROOT.p…

pygal on windows - cannot access classes from pygal

I have such short script:import pygal if __name__ == __main__:bar_chart = pygal.Bar()and following error: AttributeError: module object has no attribute BarDo you have any idea what is wrong? Shall I …

Parsing table for a link

Ive been able to isolate a row in a html table using Beautiful Soup in Python 2.7. Been a learning experience, but happy to get that far. Unfortunately Im a bit stuck on this next bit.I need to get t…

build matrix from blocks

I have an object which is described by two quantities, A and B (in real case they can be more than two). Objects are correlated depending on the value of A and B. In particular I know the correlation m…

How to convert string with UTC offset

I have date as In [1]: a = "Sun 10 May 2015 13:34:36 -0700"When I try to convert it using strptime, its giving error.In [3]: datetime.strptime(a, "%a %d %b %Y %H:%M:%S %Z"...: ) ---…

regex multiline not working on repeated patterns

I am trying to get a regex expression to match multiple patterns with multilines but it keeps matching everything. For instance I want to match two times this code:STDMETHOD(MyFunc)( D2D1_SIZE_U size, …

Django Rest Framework slug_field error

I have this serializer and model. I want to show in my API the field username of User model, but I receive this error.AttributeError at /api/v1/client_share_requests/1/Profile object has no attribute u…

Django - 500 internal server error no module named django

django return 500 internal server error (apache 2.4.10, ubuntu 15.04, django 1.9.6)apache log:[wsgi:warn] mod_wsgi: Compiled for Python/3.4.2. [wsgi:warn] mod_wsgi: Runtime using Python/3.4.3. [mpm_eve…

Unable to connect to Google Bigtable using HBase REST api

Following this example, running the test script "python put_get_with_client.py" results in a 400 error (Bad Request).Bad requestjava.lang.ClassCastException: org.apache.hadoop.hbase.client.Bi…