ValueError: A value in x_new is below the interpolation range

2024/10/2 12:26:17

This is a scikit-learn error that I get when I do

my_estimator = LassoLarsCV(fit_intercept=False, normalize=False, positive=True, max_n_alphas=1e5)

Note that if I decrease max_n_alphas from 1e5 down to 1e4 I do not get this error any more.

Anyone has an idea on what's going on?

The error happens when I call

my_estimator.fit(x, y)

I have 40k data points in 40 dimensions.

The full stack trace looks like this

  File "/usr/lib64/python2.7/site-packages/sklearn/linear_model/least_angle.py", line 1113, in fitaxis=0)(all_alphas)File "/usr/lib64/python2.7/site-packages/scipy/interpolate/polyint.py", line 79, in __call__y = self._evaluate(x)File "/usr/lib64/python2.7/site-packages/scipy/interpolate/interpolate.py", line 498, in _evaluateout_of_bounds = self._check_bounds(x_new)File "/usr/lib64/python2.7/site-packages/scipy/interpolate/interpolate.py", line 525, in _check_boundsraise ValueError("A value in x_new is below the interpolation "
ValueError: A value in x_new is below the interpolation range.
Answer

There must be something particular to your data. LassoLarsCV() seems to be working correctly with this synthetic example of fairly well-behaved data:

import numpy
import sklearn.linear_model# create 40000 x 40 sample data from linear model with a bit of noise
npoints = 40000
ndims = 40
numpy.random.seed(1)
X = numpy.random.random((npoints, ndims))
w = numpy.random.random(ndims)
y = X.dot(w) + numpy.random.random(npoints) * 0.1clf = sklearn.linear_model.LassoLarsCV(fit_intercept=False, normalize=False, max_n_alphas=1e6)
clf.fit(X, y)# coefficients are almost exactly recovered, this prints 0.00377
print max(abs( clf.coef_ - w ))# alphas actually used are 41 or ndims+1
print clf.alphas_.shape

This is in sklearn 0.16, I don't have positive=True option.

I'm not sure why you would want to use a very large max_n_alphas anyway. While I don't know why 1e+4 works and 1e+5 doesn't in your case, I suspect the paths you get from max_n_alphas=ndims+1 and max_n_alphas=1e+4 or whatever would be identical for well behaved data. Also the optimal alpha that is estimated by cross-validation in clf.alpha_ is going to be identical. Check out Lasso path using LARS example for what alpha is trying to do.

Also, from the LassoLars documentation

alphas_ array, shape (n_alphas + 1,)

Maximum of covariances (inabsolute value) at each iteration. n_alphas is either max_iter,n_features, or the number of nodes in the path with correlationgreater than alpha, whichever is smaller.

so it makes sense that we end with alphas_ of size ndims+1 (ie n_features+1) above.

P.S. Tested with sklearn 0.17.1 and positive=True as well, also tested with some positive and negative coefficients, same result: alphas_ is ndims+1 or less.

https://en.xdnf.cn/q/70857.html

Related Q&A

Parsing Python function calls to get argument positions

I want code that can analyze a function call like this:whatever(foo, baz(), puppet, 24+2, meow=3, *meowargs, **meowargs)And return the positions of each and every argument, in this case foo, baz(), pup…

Is there a proper way to subclass Tensorflows Dataset?

I was looking at different ways that one can do custom Tensorflow datasets, and I was used to looking at PyTorchs datasets, but when I went to look at Tensorflows datasets, I saw this example: class Ar…

Install pyserial Mac OS 10.10?

Attempting to communicate with Arduino serial ports using Python 2.7. Have downloaded pyserial 2.7 (unzipped and put folder pyserial folder in python application folder). Didnt work error message. &quo…

Binning frequency distribution in Python

I have data in the two lists value and freq like this:value freq 1 2 2 1 3 3 6 2 7 3 8 3 ....and I want the output to be bin freq 1-3 6 4-6 2 7-9 6 ...I can write fe…

R style data-axis buffer in matplotlib

R plots automatically set the x and y limits to put some space between the data and the axes. I was wondering if there is a way for matplotlib to do the same automatically. If not, is there a good form…

Python code for the coin toss issues

Ive been writing a program in python that simulates 100 coin tosses and gives the total number of tosses. The problem is that I also want to print the total number of heads and tails.Heres my code:impo…

Preprocess a Tensorflow tensor in Numpy

I have set up a CNN in Tensorflow where I read my data with a TFRecordReader. It works well but I would like to do some more preprocessing and data augmentation than offered by the tf.image functions. …

Os.path : can you explain this behavior?

I love Python because it comes batteries included, and I use built-in functions, a lot, to do the dirty job for me.I have always been using happily the os.path module to deal with file path but recentl…

admin.py for project, not app

How can I specify a project level admin.py?I asked this question some time ago and was just awarded the Tumbleweed award because of the lack of activity on the question! >_<Project:settings.py a…

Python Socket Receive/Send Multi-threading

I am writing a Python program where in the main thread I am continuously (in a loop) receiving data through a TCP socket, using the recv function. In a callback function, I am sending data through the …