Question 1

I am trying to do multiple variables linear regression. But I find that the sklearn.linear_model working very weird. Here's my code:

import numpy as np
from sklearn import linear_modelb = np.array([3,5,7]).transpose() ## the right answer I am expecting
x = np.array([[1,6,9],   ## 1*3 + 6*5 + 7*9 = 96[2,7,7],   ## 2*3 + 7*5 + 7*7 = 90[3,4,5]])  ## 3*3 + 4*5 + 5*7 = 64
y = np.array([96,90,64]).transpose()clf = linear_model.LinearRegression()
clf.fit([[1,6,9],[2,7,7],[3,4,5]], [96,90,64])
print clf.coef_ ## <== it gives me [-2.2  5  4.4] NOT [3, 5, 7]
print np.dot(x, clf.coef_) ## <== it gives me [ 67.4  61.4  35.4]

Question 2

In order to find your initial coefficients back you need to use the keyword fit_intercept=False when construction the linear regression.

import numpy as np
from sklearn import linear_modelb = np.array([3,5,7])
x = np.array([[1,6,9],  [2,7,7],   [3,4,5]])  
y = np.array([96,90,64])clf = linear_model.LinearRegression(fit_intercept=False)
clf.fit(x, y)
print clf.coef_
print np.dot(x, clf.coef_)

Using fit_intercept=False prevents the LinearRegression object from working with x - x.mean(axis=0), which it would otherwise do (and capture the mean using a constant offset y = xb + c) - or equivalently by adding a column of 1 to x.

As a side remark, calling transpose on a 1D array doesn't have any effect (it reverses the order of your axes, and you only have one).

Python: Sklearn.linear_model.LinearRegression working weird

Related Q&A

Implementation of Gaussian Process Regression in Python y(n_samples, n_targets)

Converting a list of points to an SVG cubic piecewise Bezier curve

Python Class Inheritance AttributeError - why? how to fix?

Is it possible to display pandas styles in the IPython console?

HEAD method not allowed after upgrading to django-rest-framework 3.5.3

How do I specify server options?

How to find collocations in text, python

How to set size of a Gtk Image in Python

Numpy Vectorized Function Over Successive 2d Slices

MySQL and Python Select Statement Issues