Adding specific days in python table

2024/9/22 13:43:17

I have a dataset (Product_ID,date_time, Sold) which has products sold on various dates. The dates are not consistent and are given for 9 months with random 13 days or more from a month. I have to segregate the data in a such a way that the for each product how many products were sold on 1-3 given days, 4-7 given days, 8-15 given days and >16 given days. . So how can I code this in python using pandas and other packages

PRODUCT_ID DATE_LOCATION Sold 0E4234 01-08-16 0:00 2 0E4234 02-08-16 0:00 7 0E4234 04-08-16 0:00 3 0E4234 08-08-16 0:00 1 0E4234 09-08-16 0:00 2 . . (same product for 9 months sold data) . 0G2342 02-08-16 0:00 1 0G2342 03-08-16 0:00 2 0G2342 06-08-16 0:00 1 0G2342 09-08-16 0:00 1 0G2342 11-08-16 0:00 3 0G2342 15-08-16 0:00 3 . . .(goes for 64 products each with 9 months of data) .

I don't know even how to code for this in python The output needed is

PRODUCT_ID      Days   Sold
0E4234          1-3      94-7      38-15     16>16     (remaing values sum)
0G2342          1-3      34-7      18-15     7>16    (remaing values sum)
.
.(for 64 products)
.

Would be happy if at least someone posted a link to where to start

Answer

You can first convert dates to dtetimes and get days by dt.day:

df['DATE_LOCATION'] = pd.to_datetime(df['DATE_LOCATION'], dayfirst=True)
days = df['DATE_LOCATION'].dt.day

Then binning by cut:

rng = pd.cut(days, bins=[0,3,7,15,31], labels=['1-3', '4-7','8-15', '>=16'])
print (rng)
0      1-3
1      1-3
2      4-7
3     8-15
4     8-15
5      1-3
6      1-3
7      4-7
8     8-15
9     8-15
10    8-15
Name: DATE_LOCATION, dtype: category
Categories (4, object): [1-3 < 4-7 < 8-15 < >=16]

And aggregate sum by product and binned Series:

df = df.groupby(["PRODUCT_ID",rng])['Sold'].sum()
print (df)
PRODUCT_ID  DATE_LOCATION
0E4234      1-3              94-7              38-15             3
0G2342      1-3              34-7              18-15             7
Name: Sold, dtype: int64

If need also count per years:

df = df.groupby([df['DATE_LOCATION'].dt.year.rename('YEAR'), "PRODUCT_ID",rng])['Sold'].sum()
print (df)YEAR  PRODUCT_ID  DATE_LOCATION
2016  0E4234      1-3              94-7              38-15             30G2342      1-3              34-7              18-15             7
Name: Sold, dtype: int64
https://en.xdnf.cn/q/119129.html

Related Q&A

django how to following relationships backwards?

I am having some issue with following relationships backwards. From the parent page i want to be able to see what children belong to that parent. Heres what i got so farmodel.pyclass Parents(models.Mod…

Python File handling: Seaching for specific numbers

Im creating a document in which I need to record license plates of vehicles (its a practice exercise, nothing illegal) and calculate the speed they travel at and display all the vehicles that are trave…

How to convert token list into wordnet lemma list using nltk?

I have a list of tokens extracted out of a pdf source. I am able to pre process the text and tokenize it but I want to loop through the tokens and convert each token in the list to its lemma in the wor…

Script throws an error when it is made to run using multiprocessing

Ive written a script in python in combination with BeautifulSoup to extract the title of books which get populated upon providing some ISBN numbers in amazon search box. Im providing those ISBN numbers…

Efficiently pair random elements of list

I have a list of n elements say: foo = [a, b, c, d, e] I would like to randomly pair elements of this list to receive for example: bar = [[a, c], [b, e]] where the last element will be discarded if the…

ALL permutations of a list with repetition but not doubles

I have seen similar but not the same: here. I definitely want the permutations, not combinations, of all list elements. Mine is different because itertools permutation of a,b,c returns abc but not aba …

NameError: name current_portfolio is not defined

I am getting NameError: name current_portfolio is not defineddef initialize(context): context.sym = symbol(xxx) context.i = 0def handle_data(context, data):context.i += 1 if context.i < 60:returnsma…

Scrape an Ajax form with .submit() with Python and Selenium

I am trying to get the link from a web page. The web page sends the request using javascript, then the server sends a response which goes directly to download a PDF. This new PDF is automatically downl…

How to process break an array in Python?

I would like to use a double array. But I still fail to do it. This what I did. Folder = "D:\folder" Name = [gadfg5, 546sfdgh] Ver = [None, hhdt5463]for dn in Name :for dr in Ver :if dr is No…

Why am I getting replacement index 1 out of range for positional args tuple error

I keep getting this error: Replacement index 1 out of range for positional args tuple on this line of code: print("{1}, {2}, {3}, {4}".format(question[3]), question[4], question[5], question[…