Python/Pandas - partitioning a pandas DataFrame in 10 disjoint, equally-sized subsets

2024/9/28 23:28:17

I want to partition a pandas DataFrame into ten disjoint, equally-sized, randomly composed subsets.

I know I can randomly sample one tenth of the original pandas DataFrame using:

partition_1 = pandas.DataFrame.sample(frac=(1/10))

However, how can I obtain the other nine partitions? If I'd do pandas.DataFrame.sample(frac=(1/10)) again, there exists the possibility that my subsets are not disjoint.

Thanks for the help!

Answer

Starting with this.

 dfm = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',  'foo', 'bar', 'foo', 'foo']*2,'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three']*2}) A      B
0   foo    one
1   bar    one
2   foo    two
3   bar  three
4   foo    two
5   bar    two
6   foo    one
7   foo  three
8   foo    one
9   bar    one
10  foo    two
11  bar  three
12  foo    two
13  bar    two
14  foo    one
15  foo  threeUsage: 
Change "4" to "10", use [i] to get the slices.  np.random.seed(32) # for reproducible results.
np.array_split(dfm.reindex(np.random.permutation(dfm.index)),4)[1]A    B
2   foo  two
5   bar  two
10  foo  two
12  foo  twonp.array_split(dfm.reindex(np.random.permutation(dfm.index)),4)[3]A      B
13  foo    two
11  bar  three
0   foo    one
7   foo  three
https://en.xdnf.cn/q/71285.html

Related Q&A

How to fix pylint error Unnecessary use of a comprehension

With python 3.8.6 and pylint 2.4.4 the following code produces a pylint error (or recommendation) R1721: Unnecessary use of a comprehension (unnecessary-comprehension)This is the code: dict1 = {"A…

conv2d_transpose is dependent on batch_size when making predictions

I have a neural network currently implemented in tensorflow, but I am having a problem making predictions after training, because I have a conv2d_transpose operations, and the shapes of these ops are d…

How SelectKBest (chi2) calculates score?

I am trying to find the most valuable features by applying feature selection methods to my dataset. Im using the SelectKBest function for now. I can generate the score values and sort them as I want, b…

Refer to multiple Models in View/Template in Django

Im making my first steps with Python/Django and wrote an example application with multiple Django apps in one Django project. Now I added another app called "dashboard" where Id like to displ…

Can I use a machine learning model as the objective function in an optimization problem?

I have a data set for which I use Sklearn Decision Tree regression machine learning package to build a model for prediction purposes. Subsequently, I am trying to utilize scipy.optimize package to solv…

How to store data like Freebase does?

I admit that this is basically a duplicate question of Use freebase data on local server? but I need more detailed answers than have already been given thereIve fallen absolutely in love with Freebase…

Django-celery : Passing request Object to worker

How can i pass django request object to celery worker. When try to pass the request object it throws a Error Cant Pickle Input ObjectsIt seems that celery serialize any arguments passed to worker. I tr…

How to get ROC curve for decision tree?

I am trying to find ROC curve and AUROC curve for decision tree. My code was something likeclf.fit(x,y) y_score = clf.fit(x,y).decision_function(test[col]) pred = clf.predict_proba(test[col]) print(skl…

pandas - stacked bar chart with timeseries data

Im trying to create a stacked bar chart in pandas using time series data:DATE TYPE VOL0 2010-01-01 Heavy 932.6129031 2010-01-01 Light 370.6129032 2010-01-01 Medium 569.4516133 …

Get element at position with Selenium

Is it possible to either run or get the same functionality provided by document.elementFromPoint using a Selenium webdriver?