generating correlated numbers in numpy / pandas

2024/9/25 7:13:25

I’m trying to generate simulated student grades in 4 subjects, where a student record is a single row of data. The code shown here will generate normally distributed random numbers with a mean of 60 and a standard deviation of 15.

df = pd.DataFrame(15 * np.random.randn(5, 4) + 60, columns=['Math', 'Science', 'History', 'Art'])

What I can’t figure out is how to make it so that a student’s Science mark is highly correlated to their Math mark, and that their History and Art marks are less so, but still somewhat correlated to the Math mark.

I’m neither a statistician or an expert programmer, so a less sophisticated but more easily understood solution is what I’m hoping for.

Answer

Let's put what has been suggested by @Daniel into code.

Step 1

Let's import multivariate_normal:

import numpy as np
from scipy.stats import multivariate_normal as mvn

Step 2

Let's construct covariance data and generate data:

cov = np.array([[1, 0.8,.7, .6],[.8,1.,.5,.5],[0.7,.5,1.,.5],[0.6,.5,.5,1]])
covarray([[ 1. ,  0.8,  0.7,  0.6],[ 0.8,  1. ,  0.5,  0.5],[ 0.7,  0.5,  1. ,  0.5],[ 0.6,  0.5,  0.5,  1. ]])

This is the key step. Note, that covariance matrix has 1's in diagonal, and the covariances decrease as you step from left to right.

Now we are ready to generate data, let's sat 1'000 points:

scores = mvn.rvs(mean = [60.,60.,60.,60.], cov=cov, size = 1000)

Sanity check (from covariance matrix to simple correlations):

np.corrcoef(scores.T):array([[ 1.        ,  0.78886583,  0.70198586,  0.56810058],[ 0.78886583,  1.        ,  0.49187904,  0.45994833],[ 0.70198586,  0.49187904,  1.        ,  0.4755558 ],[ 0.56810058,  0.45994833,  0.4755558 ,  1.        ]])

Note, that np.corrcoef expects your data in rows.

Finally, let's put your data into Pandas' DataFrame:

df = pd.DataFrame(data = scores, columns = ["Math", "Science","History", "Art"])
df.head()Math        Science     History     Art
0   60.629673   61.238697   61.805788   61.848049
1   59.728172   60.095608   61.139197   61.610891
2   61.205913   60.812307   60.822623   59.497453
3   60.581532   62.163044   59.277956   60.992206
4   61.408262   59.894078   61.154003   61.730079

Step 3

Let's visualize some data that we've just generated:

ax = df.plot(x = "Math",y="Art", kind="scatter", color = "r", alpha = .5, label = "Art, $corr_{Math}$ = .6")
df.plot(x = "Math",y="Science", kind="scatter", ax = ax, color = "b", alpha = .2, label = "Science, $corr_{Math}$ = .8")
ax.set_ylabel("Art and Science");

enter image description here

https://en.xdnf.cn/q/71602.html

Related Q&A

AttributeError: list object has no attribute split

Using Python 2.7.3.1I dont understand what the problem is with my coding! I get this error: AttributeError: list object has no attribute splitThis is my code:myList = [hello]myList.split()

Managing multiple Twisted client connections

Im trying to use Twisted in a sort of spidering program that manages multiple client connections. Id like to maintain of a pool of about 5 clients working at one time. The functionality of each clien…

using a conditional and lambda in map

If I want to take a list of numbers and do something like this:lst = [1,2,4,5] [1,2,4,5] ==> [lower,lower,higher,higher]where 3 is the condition using the map function, is there an easy way?Clearly…

Tkinter: What are the correct values for the anchor option in the message widget?

I have been learning tkinter through Message widget in Tkinter at Python Courses and Tutorials. I keep getting an error when I add the anchor option with the options presented on the site. I am being t…

Why isnt Pickle calling __new__ like the documentation says?

The documentation for Pickle specifically says:Instances of a new-style class C are created using:obj = C.__new__(C, *args)Attempting to take advantage of this, I created a singleton with no instance a…

Remove more than one key from Python dict

Is there any efficient shortcut method to delete more than one key at a time from a python dictionary?For instance;x = {a: 5, b: 2, c: 3} x.pop(a, b) print x {c: 3}

Install poppler in AWS base python image for Lambda

I am trying to deploy my docker container on AWS Lambda. However, I use pdf2image package in my code which depends on poppler. To install poppler, I need to insert the following line in the Dockerfile.…

Cant change state of checkable QListViewItem with custom widget

I have a QListWidget where I want to add a bunch of items with a custom widget:listWidget = QListWidget()item = QListWidgetItem()item.setFlags(item.flags() | Qt.ItemIsUserCheckable)item.setCheckState(Q…

Custom table for str.translate in Python 3

If I run this code:s.translate(str.maketrans({as: dfg, 1234: qw}))I will get:ValueError: string keys in translate table must be of length 1Is there a way to replace multiple characters at once using st…

Sending form data to aspx page

There is a need to do a search on the websiteurl = rhttp://www.cpso.on.ca/docsearch/this is an aspx page (Im beginning this trek as of yesterday, sorry for noob questions)using BeautifulSoup, I can get…