Pandas Random Weighted Choice

2024/10/4 3:27:50

I would like to randomly select a value in consideration of weightings using Pandas.

df:

   0  1  2  3  4  5
0  40  5 20 10 35 25
1  24  3 12  6 21 15
2  72  9 36 18 63 45
3  8   1  4  2  7 5
4  16  2  8  4 14 10
5  48  6 24 12 42 30

I am aware of using np.random.choice, e.g:

x = np.random.choice(['0-0','0-1',etc.], 1,p=[0.4,0.24 etc.]
)

And so, I would like to get an output, in a similar style/alternative method to np.random.choice from df, but using Pandas. I would like to do so in a more efficient way in comparison to manually inserting the values as I have done above.

Using np.random.choice I am aware that all values must add up to 1. I'm not sure as to how to go about solving this, nor randomly selecting a value based on weightings using Pandas.

When referring to an output, if the randomly selected weight was for example, 40, then the output would be 0-0 since it is located in that column 0, row 0 and so on.

Answer

Stack the DataFrame:

stacked = df.stack()

Normalize the weights (so that they add up to 1):

weights = stacked / stacked.sum()
# As GeoMatt22 pointed out, this part is not necessary. See the other comment.

And then use sample:

stacked.sample(1, weights=weights)
Out: 
1  2    12
dtype: int64# Or without normalization, stacked.sample(1, weights=stacked)

DataFrame.sample method allows you to either sample from rows or from columns. Consider this:

df.sample(1, weights=[0.4, 0.3, 0.1, 0.1, 0.05, 0.05])
Out: 0  1   2  3   4   5
1  24  3  12  6  21  15

It selects one row (the first row with 40% chance, the second with 30% chance etc.)

This is also possible:

df.sample(1, weights=[0.4, 0.3, 0.1, 0.1, 0.05, 0.05], axis=1)
Out: 1
0  5
1  3
2  9
3  1
4  2
5  6

Same process but 40% chance is associated with the first column and we are selecting from columns. However, your question seems to imply that you don't want to select rows or columns - you want to select the cells inside. Therefore, I changed the dimension from 2D to 1D.

df.stack()Out: 
0  0    401     52    203    104    355    25
1  0    241     32    123     64    215    15
2  0    721     92    363    184    635    45
3  0     81     12     43     24     75     5
4  0    161     22     83     44    145    10
5  0    481     62    243    124    425    30
dtype: int64

So if I now sample from this, I will both sample a row and a column. For example:

df.stack().sample()
Out: 
1  0    24
dtype: int64

selects row 1 and column 0.

https://en.xdnf.cn/q/70653.html

Related Q&A

Matplotlib TypeError: NoneType object is not callable

Ive run this code many times but now its failing. Matplotlib wont work for any example, even the most trivial. This is the error Im getting, but Im not sure what to make of it. I know this is vague and…

Resize image faster in OpenCV Python

I have a lot of image files in a folder (5M+). These images are of different sizes. I want to resize these images to 128x128. I used the following function in a loop to resize in Python using OpenCVdef…

How to install Yandex CatBoost on Anaconda x64?

Iv successfully installed CatBoost via pip install catboostBut Iv got errors, when I tried sample python script in Jupiter Notebookimport numpy as np from catboost import CatBoostClassifierImportError:…

pyspark returns a no module named error for a custom module

I would like to import a .py file that contains some modules. I have saved the files init.py and util_func.py under this folder:/usr/local/lib/python3.4/site-packages/myutilThe util_func.py contains al…

Perform a conditional operation on a pandas column

I know that this should be simple, but I want to take a column from a pandas dataframe, and for only the entries which meet some condition (say less than 1), multiply by a scalar (say 2).For example, i…

How to programmatically get SVN revision number?

Like this question, but without the need to actually query the SVN server. This is a web-based project, so I figure Ill just use the repository as the public view (unless someone can advise me why this…

Convert fractional years to a real date in Python

How do I convert fractional years to a real date by using Python? E. g. I have an array [2012.343, 2012.444, 2012.509] containing fractional years and I would like to get "yyyy-mm-dd hh:mm".

Django template: Translate include with variable

I have a template in which you can pass a text variable. I want to include this template into another one but with a translated text as its variable. How can you achieve this?I would like something li…

Pandas - Creating a New Column

I have always made new columns in pandas using the following:df[new_column] = valueI am using this method, however, am receiving the warning for setting a copy.What is the way to make a new column with…

Adding an extra column to (big) SQLite database from Pandas dataframe

I feel like Im overlooking something really simple, but I cant make it work. Im using SQLite now, but a solution in SQLAlchemy would also be very helpful.Lets create our original dataset:### This is ju…