Question 1

I would like to randomly select a value in consideration of weightings using Pandas.

df:

   0  1  2  3  4  5
0  40  5 20 10 35 25
1  24  3 12  6 21 15
2  72  9 36 18 63 45
3  8   1  4  2  7 5
4  16  2  8  4 14 10
5  48  6 24 12 42 30

I am aware of using np.random.choice, e.g:

x = np.random.choice(['0-0','0-1',etc.], 1,p=[0.4,0.24 etc.]
)

And so, I would like to get an output, in a similar style/alternative method to np.random.choice from df, but using Pandas. I would like to do so in a more efficient way in comparison to manually inserting the values as I have done above.

Using np.random.choice I am aware that all values must add up to 1. I'm not sure as to how to go about solving this, nor randomly selecting a value based on weightings using Pandas.

When referring to an output, if the randomly selected weight was for example, 40, then the output would be 0-0 since it is located in that column 0, row 0 and so on.

Question 2

Stack the DataFrame:

stacked = df.stack()

Normalize the weights (so that they add up to 1):

weights = stacked / stacked.sum()
# As GeoMatt22 pointed out, this part is not necessary. See the other comment.

And then use sample:

stacked.sample(1, weights=weights)
Out: 
1  2    12
dtype: int64# Or without normalization, stacked.sample(1, weights=stacked)

DataFrame.sample method allows you to either sample from rows or from columns. Consider this:

df.sample(1, weights=[0.4, 0.3, 0.1, 0.1, 0.05, 0.05])
Out: 0  1   2  3   4   5
1  24  3  12  6  21  15

It selects one row (the first row with 40% chance, the second with 30% chance etc.)

This is also possible:

df.sample(1, weights=[0.4, 0.3, 0.1, 0.1, 0.05, 0.05], axis=1)
Out: 1
0  5
1  3
2  9
3  1
4  2
5  6

Same process but 40% chance is associated with the first column and we are selecting from columns. However, your question seems to imply that you don't want to select rows or columns - you want to select the cells inside. Therefore, I changed the dimension from 2D to 1D.

df.stack()Out: 
0  0    401     52    203    104    355    25
1  0    241     32    123     64    215    15
2  0    721     92    363    184    635    45
3  0     81     12     43     24     75     5
4  0    161     22     83     44    145    10
5  0    481     62    243    124    425    30
dtype: int64

So if I now sample from this, I will both sample a row and a column. For example:

df.stack().sample()
Out: 
1  0    24
dtype: int64

selects row 1 and column 0.

Pandas Random Weighted Choice

Related Q&A

Matplotlib TypeError: NoneType object is not callable

Resize image faster in OpenCV Python

How to install Yandex CatBoost on Anaconda x64?

pyspark returns a no module named error for a custom module

Perform a conditional operation on a pandas column

How to programmatically get SVN revision number?

Convert fractional years to a real date in Python

Django template: Translate include with variable

Pandas - Creating a New Column

Adding an extra column to (big) SQLite database from Pandas dataframe