Question 1

This is a scikit-learn error that I get when I do

my_estimator = LassoLarsCV(fit_intercept=False, normalize=False, positive=True, max_n_alphas=1e5)

Note that if I decrease max_n_alphas from 1e5 down to 1e4 I do not get this error any more.

Anyone has an idea on what's going on?

The error happens when I call

my_estimator.fit(x, y)

I have 40k data points in 40 dimensions.

The full stack trace looks like this

  File "/usr/lib64/python2.7/site-packages/sklearn/linear_model/least_angle.py", line 1113, in fitaxis=0)(all_alphas)File "/usr/lib64/python2.7/site-packages/scipy/interpolate/polyint.py", line 79, in __call__y = self._evaluate(x)File "/usr/lib64/python2.7/site-packages/scipy/interpolate/interpolate.py", line 498, in _evaluateout_of_bounds = self._check_bounds(x_new)File "/usr/lib64/python2.7/site-packages/scipy/interpolate/interpolate.py", line 525, in _check_boundsraise ValueError("A value in x_new is below the interpolation "
ValueError: A value in x_new is below the interpolation range.

Question 2

There must be something particular to your data. LassoLarsCV() seems to be working correctly with this synthetic example of fairly well-behaved data:

import numpy
import sklearn.linear_model# create 40000 x 40 sample data from linear model with a bit of noise
npoints = 40000
ndims = 40
numpy.random.seed(1)
X = numpy.random.random((npoints, ndims))
w = numpy.random.random(ndims)
y = X.dot(w) + numpy.random.random(npoints) * 0.1clf = sklearn.linear_model.LassoLarsCV(fit_intercept=False, normalize=False, max_n_alphas=1e6)
clf.fit(X, y)# coefficients are almost exactly recovered, this prints 0.00377
print max(abs( clf.coef_ - w ))# alphas actually used are 41 or ndims+1
print clf.alphas_.shape

This is in sklearn 0.16, I don't have positive=True option.

I'm not sure why you would want to use a very large max_n_alphas anyway. While I don't know why 1e+4 works and 1e+5 doesn't in your case, I suspect the paths you get from max_n_alphas=ndims+1 and max_n_alphas=1e+4 or whatever would be identical for well behaved data. Also the optimal alpha that is estimated by cross-validation in clf.alpha_ is going to be identical. Check out Lasso path using LARS example for what alpha is trying to do.

Also, from the LassoLars documentation

alphas_ array, shape (n_alphas + 1,)
Maximum of covariances (inabsolute value) at each iteration. n_alphas is either max_iter,n_features, or the number of nodes in the path with correlationgreater than alpha, whichever is smaller.

so it makes sense that we end with alphas_ of size ndims+1 (ie n_features+1) above.

P.S. Tested with sklearn 0.17.1 and positive=True as well, also tested with some positive and negative coefficients, same result: alphas_ is ndims+1 or less.

ValueError: A value in x_new is below the interpolation range

Related Q&A

Parsing Python function calls to get argument positions

Is there a proper way to subclass Tensorflows Dataset?

Install pyserial Mac OS 10.10?

Binning frequency distribution in Python

R style data-axis buffer in matplotlib

Python code for the coin toss issues

Preprocess a Tensorflow tensor in Numpy

Os.path : can you explain this behavior?

admin.py for project, not app

Python Socket Receive/Send Multi-threading