Multiple linear regression with python

2024/10/7 4:06:13

I would like to calculate multiple linear regression with python. I found this code for simple linear regression

import numpy as npfrom matplotlib.pyplot import *x = np.array([1, 2, 3, 4, 5])y = np.array([2, 3, 4, 4, 5])n = np.max(x.shape)    X = np.vstack([np.ones(n), x]).Ta = np.linalg.lstsq(X, y)[0]

So, a is the coefficient, but I don't see what [0] means ?

And how can I change the code to obtain multiple linear regressions ?

Answer

To implement multiple linear regression with python you can use any of the following options:

1) Use normal equation method (that uses matrix inverse)
2) Numpy's least-squares numpy.linalg.lstsq tool
3) Numpy's np.linalg.solve tool

For normal equations method you can use this formula: enter image description here
In above formula X is feature matrix and y is label vector.

As for Numpy's numpy.linalg.lstsq or np.linalg.solve tools you just use them out of the box.

Here I provide a link for sample data that you can use for tests: https://drive.google.com/file/d/0BzzUvSbpsTAvN1UxTkxXd2U0eVE/view

Alternative: https://www.dropbox.com/s/e3pd7fp0rfm1cfs/DB2.csv?dl=0

Data preparation code:

import pandas as pd
import numpy as nppath = 'DB2.csv'  
data = pd.read_csv(path, header=None, delimiter=";")data.insert(0, 'Ones', 1)
cols = data.shape[1]X = data.iloc[:,0:cols-1]  
y = data.iloc[:,cols-1:cols] IdentitySize = X.shape[1]
IdentityMatrix= np.zeros((IdentitySize, IdentitySize))
np.fill_diagonal(IdentityMatrix, 1)

For least squares method you use Numpy's numpy.linalg.lstsq. Here is Python code:

lamb = 1
th = np.linalg.lstsq(X.T.dot(X) + lamb * IdentityMatrix, X.T.dot(y))[0]            

Also you can use np.linalg.solve tool of numpy:

lamb = 1
XtX_lamb = X.T.dot(X) + lamb * IdentityMatrix
XtY = X.T.dot(y)
x = np.linalg.solve(XtX_lamb, XtY);

For normal equation method use:

lamb = 1
xTx = X.T.dot(X) + lamb * IdentityMatrix
XtX = np.linalg.inv(xTx)
XtX_xT = XtX.dot(X.T)
theta = XtX_xT.dot(y)

In all methods regularization is used. Here is results (theta coefficients) to see difference between these three approaches:

Normal equation:        np.linalg.lstsq         np.linalg.solve
[-27551.99918303]       [-27551.95276154]       [-27551.9991855]
[-940.27518383]         [-940.27520138]         [-940.27518383]
[-9332.54653964]        [-9332.55448263]        [-9332.54654461]
[-3149.02902071]        [-3149.03496582]        [-3149.02900965]
[-1863.25125909]        [-1863.2631435]         [-1863.25126344]
[-2779.91105618]        [-2779.92175308]        [-2779.91105347]
[-1226.60014026]        [-1226.61033117]        [-1226.60014192]
[-920.73334259]         [-920.74331432]         [-920.73334194]
[-6278.44238081]        [-6278.45496955]        [-6278.44237847]
[-2001.48544938]        [-2001.49566981]        [-2001.48545349]
[-715.79204971]         [-715.79664124]         [-715.79204921]
[ 4039.38847472]        [ 4039.38302499]        [ 4039.38847515]
[-2362.54853195]        [-2362.55280478]        [-2362.54853139]
[-12730.8039209]        [-12730.80866036]       [-12730.80392076]
[-24872.79868125]       [-24872.80203459]       [-24872.79867954]
[-3402.50791863]        [-3402.5140501]         [-3402.50793382]
[ 253.47894001]         [ 253.47177732]         [ 253.47892472]
[-5998.2045186]         [-5998.20513905]        [-5998.2045184]
[ 198.40560401]         [ 198.4049081]          [ 198.4056042]
[ 4368.97581411]        [ 4368.97175688]        [ 4368.97581426]
[-2885.68026222]        [-2885.68154407]        [-2885.68026205]
[ 1218.76602731]        [ 1218.76562838]        [ 1218.7660275]
[-1423.73583813]        [-1423.7369068]         [-1423.73583793]
[ 173.19125007]         [ 173.19086525]         [ 173.19125024]
[-3560.81709538]        [-3560.81650156]        [-3560.8170952]
[-142.68135768]         [-142.68162508]         [-142.6813575]
[-2010.89489111]        [-2010.89601322]        [-2010.89489092]
[-4463.64701238]        [-4463.64742877]        [-4463.64701219]
[ 17074.62997704]       [ 17074.62974609]       [ 17074.62997723]
[ 7917.75662561]        [ 7917.75682048]        [ 7917.75662578]
[-4234.16758492]        [-4234.16847544]        [-4234.16758474]
[-5500.10566329]        [-5500.106558]          [-5500.10566309]
[-5997.79002683]        [-5997.7904842]         [-5997.79002634]
[ 1376.42726683]        [ 1376.42629704]        [ 1376.42726705]
[ 6056.87496151]        [ 6056.87452659]        [ 6056.87496175]
[ 8149.0123667]         [ 8149.01209157]        [ 8149.01236827]
[-7273.3450484]         [-7273.34480382]        [-7273.34504827]
[-2010.61773247]        [-2010.61839251]        [-2010.61773225]
[-7917.81185096]        [-7917.81223606]        [-7917.81185084]
[ 8247.92773739]        [ 8247.92774315]        [ 8247.92773722]
[ 1267.25067823]        [ 1267.24677734]        [ 1267.25067832]
[ 2557.6208133]         [ 2557.62126916]        [ 2557.62081337]
[-5678.53744654]        [-5678.53820798]        [-5678.53744647]
[ 3406.41697822]        [ 3406.42040997]        [ 3406.41697836]
[-8371.23657044]        [-8371.2361594]         [-8371.23657035]
[ 15010.61728285]       [ 15010.61598236]       [ 15010.61728304]
[ 11006.21920273]       [ 11006.21711213]       [ 11006.21920284]
[-5930.93274062]        [-5930.93237071]        [-5930.93274048]
[-5232.84459862]        [-5232.84557665]        [-5232.84459848]
[ 3196.89304277]        [ 3196.89414431]        [ 3196.8930428]
[ 15298.53309912]       [ 15298.53496877]       [ 15298.53309919]
[ 4742.68631183]        [ 4742.6862601]         [ 4742.68631172]
[ 4423.14798495]        [ 4423.14765013]        [ 4423.14798546]
[-16153.50854089]       [-16153.51038489]       [-16153.50854123]
[-22071.50792741]       [-22071.49808389]       [-22071.50792408]
[-688.22903323]         [-688.2310229]          [-688.22904006]
[-1060.88119863]        [-1060.8829114]         [-1060.88120546]
[-101.75750066]         [-101.75776411]         [-101.75750831]
[ 4106.77311898]        [ 4106.77128502]        [ 4106.77311218]
[ 3482.99764601]        [ 3482.99518758]        [ 3482.99763924]
[-1100.42290509]        [-1100.42166312]        [-1100.4229119]
[ 20892.42685103]       [ 20892.42487476]       [ 20892.42684422]
[-5007.54075789]        [-5007.54265501]        [-5007.54076473]
[ 11111.83929421]       [ 11111.83734144]       [ 11111.83928704]
[ 9488.57342568]        [ 9488.57158677]        [ 9488.57341883]
[-2992.3070786]         [-2992.29295891]        [-2992.30708529]
[ 17810.57005982]       [ 17810.56651223]       [ 17810.57005457]
[-2154.47389712]        [-2154.47504319]        [-2154.47390285]
[-5324.34206726]        [-5324.33913623]        [-5324.34207293]
[-14981.89224345]       [-14981.8965674]        [-14981.89224973]
[-29440.90545197]       [-29440.90465897]       [-29440.90545704]
[-6925.31991443]        [-6925.32123144]        [-6925.31992383]
[ 104.98071593]         [ 104.97886085]         [ 104.98071152]
[-5184.94477582]        [-5184.9447972]         [-5184.94477792]
[ 1555.54536625]        [ 1555.54254362]        [ 1555.5453638]
[-402.62443474]         [-402.62539068]         [-402.62443718]
[ 17746.15769322]       [ 17746.15458093]       [ 17746.15769074]
[-5512.94925026]        [-5512.94980649]        [-5512.94925267]
[-2202.8589276]         [-2202.86226244]        [-2202.85893056]
[-5549.05250407]        [-5549.05416936]        [-5549.05250669]
[-1675.87329493]        [-1675.87995809]        [-1675.87329255]
[-5274.27756529]        [-5274.28093377]        [-5274.2775701]
[-5424.10246845]        [-5424.10658526]        [-5424.10247326]
[-1014.70864363]        [-1014.71145066]        [-1014.70864845]
[ 12936.59360437]       [ 12936.59168749]       [ 12936.59359954]
[ 2912.71566077]        [ 2912.71282628]        [ 2912.71565599]
[ 6489.36648506]        [ 6489.36538259]        [ 6489.36648021]
[ 12025.06991281]       [ 12025.07040848]       [ 12025.06990358]
[ 17026.57841531]       [ 17026.56827742]       [ 17026.57841044]
[ 2220.1852193]         [ 2220.18531961]        [ 2220.18521579]
[-2886.39219026]        [-2886.39015388]        [-2886.39219394]
[-18393.24573629]       [-18393.25888463]       [-18393.24573872]
[-17591.33051471]       [-17591.32838012]       [-17591.33051834]
[-3947.18545848]        [-3947.17487999]        [-3947.18546459]
[ 7707.05472816]        [ 7707.05577227]        [ 7707.0547217]
[ 4280.72039079]        [ 4280.72338194]        [ 4280.72038435]
[-3137.48835901]        [-3137.48480197]        [-3137.48836531]
[ 6693.47303443]        [ 6693.46528167]        [ 6693.47302811]
[-13936.14265517]       [-13936.14329336]       [-13936.14267094]
[ 2684.29594641]        [ 2684.29859601]        [ 2684.29594183]
[-2193.61036078]        [-2193.63086307]        [-2193.610366]
[-10139.10424848]       [-10139.11905454]       [-10139.10426049]
[ 4475.11569903]        [ 4475.12288711]        [ 4475.11569421]
[-3037.71857269]        [-3037.72118246]        [-3037.71857265]
[-5538.71349798]        [-5538.71654224]        [-5538.71349794]
[ 8008.38521357]        [ 8008.39092739]        [ 8008.38521361]
[-1433.43859633]        [-1433.44181824]        [-1433.43859629]
[ 4212.47144667]        [ 4212.47368097]        [ 4212.47144686]
[ 19688.24263706]       [ 19688.2451694]        [ 19688.2426368]
[ 104.13434091]         [ 104.13434349]         [ 104.13434091]
[-654.02451175]         [-654.02493111]         [-654.02451174]
[-2522.8642551]         [-2522.88694451]        [-2522.86424254]
[-5011.20385919]        [-5011.22742915]        [-5011.20384655]
[-13285.64644021]       [-13285.66951459]       [-13285.64642763]
[-4254.86406891]        [-4254.88695873]        [-4254.86405637]
[-2477.42063206]        [-2477.43501057]        [-2477.42061727]
[ 0.]                   [  1.23691279e-10]      [ 0.]
[-92.79470071]          [-92.79467095]          [-92.79470071]
[ 2383.66211583]        [ 2383.66209637]        [ 2383.66211583]
[-10725.22892185]       [-10725.22889937]       [-10725.22892185]
[ 234.77560283]         [ 234.77560254]         [ 234.77560283]
[ 4739.22119578]        [ 4739.22121432]        [ 4739.22119578]
[ 43640.05854156]       [ 43640.05848841]       [ 43640.05854157]
[ 2592.3866707]         [ 2592.38671547]        [ 2592.3866707]
[-25130.02819215]       [-25130.05501178]       [-25130.02819515]
[ 4966.82173096]        [ 4966.7946407]         [ 4966.82172795]
[ 14232.97930665]       [ 14232.9529959]        [ 14232.97930363]
[-21621.77202422]       [-21621.79840459]       [-21621.7720272]
[ 9917.80960029]        [ 9917.80960571]        [ 9917.80960029]
[ 1355.79191536]        [ 1355.79198092]        [ 1355.79191536]
[-27218.44185748]       [-27218.46880642]       [-27218.44185719]
[-27218.04184348]       [-27218.06875423]       [-27218.04184318]
[ 23482.80743869]       [ 23482.78043029]       [ 23482.80743898]
[ 3401.67707434]        [ 3401.65134677]        [ 3401.67707463]
[ 3030.36383274]        [ 3030.36384909]        [ 3030.36383274]
[-30590.61847724]       [-30590.63933424]       [-30590.61847706]
[-28818.3942685]        [-28818.41520495]       [-28818.39426833]
[-25115.73726772]       [-25115.7580278]        [-25115.73726753]
[ 77174.61695995]       [ 77174.59548773]       [ 77174.61696016]
[-20201.86613672]       [-20201.88871113]       [-20201.86613657]
[ 51908.53292209]       [ 51908.53446495]       [ 51908.53292207]
[ 7710.71327865]        [ 7710.71324194]        [ 7710.71327865]
[-16206.9785119]        [-16206.97851993]       [-16206.9785119]

As you can see normal equation, least squares and np.linalg.solve tool methods give to some extent different results.

https://en.xdnf.cn/q/70166.html

Related Q&A

How to find source of error in Python Pickle on massive object

Ive taken over somebodys code for a fairly large project. Im trying to save program state, and theres one massive object which stores pretty much all the other objects. Im trying to pickle this object,…

What is the most idiomatic way to index an object with a boolean array in pandas?

I am particularly talking about Pandas version 0.11 as I am busy replacing my uses of .ix with either .loc or .iloc. I like the fact that differentiating between .loc and .iloc communicates whether I a…

Error: Command failed with rc=65536 python and mod_wsgi

im having this problem: im runing pythonbrew to get python2.7, and so i re-compiled mod_wsgi to use the 2.7 python. to that end, i followed this tutorial: code.google.com/p/modwsgi/wiki/QuickInstallati…

Why cant I freeze_panes on the xlsxwriter object pandas is creating for me?

I have a class whose objects contain pandas dataframes (self.before, and self.after below) and a save() method which uses xlsxwriter to export the data (which has two worksheets, "before" and…

Python multiprocessing pipe recv() doc unclear or did I miss anything?

I have been learning how to use the Python multiprocessing module recently, and reading the official doc. In 16.6.1.2. Exchanging objects between processes there is a simple example about using pipe t…

How would you unit test this SQLAlchemy Core query/function?

Im working on learning how to unit test properly. Given this function...def get_user_details(req_user_id):users = sa.Table(users, db.metadata, autoload=True)s = sa.select([users.c.username,users.c.favo…

Retrieve wall-time in Python using the standard library?

How can I retrieve wall-time in Python using the standard library?This question, and this question would suggest that something like clock_gettime(CLOCK_MONOTONIC_RAW) or /proc/uptime are most appropr…

NLTK: Package Errors? punkt and pickle?

Basically, I have no idea why Im getting this error. Just to have more than an image, here is a similar message in code format. As it is more recent, the answer of this thread has already been mentione…

Is there a bit-wise trick for checking the divisibility of a number by 2 or 3?

I am looking for a bit-wise test equivalent to (num%2) == 0 || (num%3) == 0.I can replace num%2 with num&1, but Im still stuck with num%3 and with the logical-or.This expression is also equivalent …

Check image urls using python-markdown

On a website Im creating Im using Python-Markdown to format news posts. To avoid issues with dead links and HTTP-content-on-HTTPS-page problems Im requiring editors to upload all images to the site and…