How to get a random (bootstrap) sample from pandas multiindex

2024/10/3 4:38:33

I'm trying to create a bootstrapped sample from a multiindex dataframe in Pandas. Below is some code to generate the kind of data I need.

from itertools import product
import pandas as pd
import numpy as npdf = pd.DataFrame({'group1': [1, 1, 1, 2, 2, 3],'group2': [13, 18, 20, 77, 109, 123],'value1': [1.1, 2, 3, 4, 5, 6],'value2': [7.1, 8, 9, 10, 11, 12]})
df = df.set_index(['group1', 'group2'])print df

The df dataframe looks like:

                   value1  value2
group1 group2                
1      13         1.1     7.118         2.0     8.020         3.0     9.0
2      77         4.0    10.0109        5.0    11.0
3      123        6.0    12.0

I want to get a random sample from the first index. For example let's say the random values np.random.randint(3,size=3) produces [3,2,2]. I'd like the resultant dataframe to look like:

                   value1  value2
group1 group2                
3      123        6.0    12.0
2      77         4.0    10.0109        5.0    11.0
2      77         4.0    10.0109        5.0    11.0

I've spent a lot of time researching this and I've been unable to find a similar example where the multiindex values are integers, the secondary index is of variable length, and the primary index samples are repeating. This is how I think an appropriate implementation for bootstrapping would work.

Answer

Try:

df.unstack().sample(3, replace=True).stack()

enter image description here

https://en.xdnf.cn/q/70772.html

Related Q&A

Python Regex - replace a string not located between two specific words

Given a string, I need to replace a substring with another in an area not located between two given words.For example:substring: "ate" replace to "drank", 1st word - "wolf"…

Vectorized Lookups of Pandas Series to a Dictionary

Problem Statement:A pandas dataframe column series, same_group needs to be created from booleans according to the values of two existing columns, row and col. The row needs to show True if both cells …

Why cant I get my static dir to work with django 1.3?

This problem is very simple, but I just cant figure it outadded to my urlpatternsurl(r^static/(?P<path>.*)$, django.views.static.serve, {document_root: /home/user/www/site/static})where my main.…

Desktop Launcher for Python Script Starts Program in Wrong Path

I can not launch a python script from a .desktop launcher created on Linux Mint 17.1 Cinnamon.The problem is that the script will be launched in the wrong path - namely the home folder instead of the d…

How does numpy broadcasting perform faster?

In the following question, https://stackoverflow.com/a/40056135/5714445Numpys broadcasting provides a solution thats almost 6x faster than using np.setdiff1d() paired with np.view(). How does it manage…

python check if utf-8 string is uppercase

I am having trouble with .isupper() when I have a utf-8 encoded string. I have a lot of text files I am converting to xml. While the text is very variable the format is static. words in all caps should…

Failed building wheel for Twisted in Windows 10 python 3

Im trying to install rasa-core on my windows 10 machine.When installing with pip install, I get: Failed building wheel for TwistedThe same error appears when trying to install Twisted separately.How co…

How to set a fill color between two vertical lines

Using matplotlib, we can "trivially" fill the area between two vertical lines using fill_between() as in the example: https://matplotlib.org/3.2.1/gallery/lines_bars_and_markers/fill_between_…

Pythonic way to read file line by line?

Whats the Pythonic way to go about reading files line by line of the two methods below?with open(file, r) as f:for line in f:print lineorwith open(file, r) as f:for line in f.readlines():print lineOr …

Python Selenium clicking next button until the end

This is a follow up question from my first question, and I am trying to scrape a website and have Selenium click on next (until it cant be clicked) and collect the results as well.This is the html tag …