Fitting curve: why small numbers are better?

2024/10/18 15:15:18

I spent some time these days on a problem. I have a set of data:

y = f(t), where y is very small concentration (10^-7), and t is in second. t varies from 0 to around 12000.

The measurements follow an established model:

y = Vs * t - ((Vs - Vi) * (1 - np.exp(-k * t)) / k)

And I need to find Vs, Vi, and k. So I used curve_fit, which returns the best fitting parameters, and I plotted the curve.

And then I used a similar model:

y = (Vs * t/3600 - ((Vs - Vi) * (1 - np.exp(-k * t/3600)) / k)) * 10**7

By doing that, t is a number of hour, and y is a number between 0 and about 10. The parameters returned are of course different. But when I plot each curve, here is what I get:

https://i.stack.imgur.com/AQNpI.png

The green fit is the first model, the blue one with the "normalized" model. And the red dots are the experimental values.

The fitting curves are different. I think it's not expected, and I don't understand why. Are the calculations more accurate if the numbers are "reasonnable" ?

Answer

The docstring for optimize.curve_fit says,

p0 : None, scalar, or M-length sequenceInitial guess for the parameters.  If None, then the initialvalues will all be 1 (if the number of parameters for the functioncan be determined using introspection, otherwise a ValueErroris raised).

Thus, to begin with, the initial guess for the parameters is by default 1.

Moreover, curve fitting algorithms have to sample the function for various values of the parameters. The "various values" are initially chosen with an initial step size on the order of 1. The algorithm will work better if your data varies somewhat smoothly with changes in the parameter values that on the order of 1.

If the function varies wildly with parameter changes on the order of 1, then the algorithm may tend to miss the optimum parameter values.

Note that even if the algorithm uses an adaptive step size when it tweaks the parameter values, if the initial tweak is so far off the mark as to produce a big residual, and if tweaking in some other direction happens to produce a smaller residual, then the algorithm may wander off in the wrong direction and miss the local minimum. It may find some other (undesired) local minimum, or simply fail to converge. So using an algorithm with an adaptive step size won't necessarily save you.

The moral of the story is that scaling your data can improve the algorithm's chances of of finding the desired minimum.


Numerical algorithms in general all tend to work better when applied to data whose magnitude is on the order of 1. This bias enters into the algorithm in numerous ways. For instance, optimize.curve_fit relies on optimize.leastsq, and the call signature for optimize.leastsq is:

def leastsq(func, x0, args=(), Dfun=None, full_output=0,col_deriv=0, ftol=1.49012e-8, xtol=1.49012e-8,gtol=0.0, maxfev=0, epsfcn=None, factor=100, diag=None):

Thus, by default, the tolerances ftol and xtol are on the order of 1e-8. If finding the optimum parameter values require much smaller tolerances, then these hard-coded default numbers will cause optimize.curve_fit to miss the optimize parameter values.

To make this more concrete, suppose you were trying to minimize f(x) = 1e-100*x**2. The factor of 1e-100 squashes the y-values so much that a wide range of x-values (the parameter values mentioned above) will fit within the tolerance of 1e-8. So, with un-ideal scaling, leastsq will not do a good job of finding the minimum.


Another reason to use floats on the order of 1 is because there are many more (IEEE754) floats in the interval [-1,1] than there are far away from 1. For example,

import struct
def floats_between(x, y):"""http://stackoverflow.com/a/3587987/190597 (jsbueno)"""a = struct.pack("<dd", x, y)b = struct.unpack("<qq", a)return b[1] - b[0]In [26]: floats_between(0,1) / float(floats_between(1e6,1e7))
Out[26]: 311.4397707054894

This shows there are over 300 times as many floats representing numbers between 0 and 1 than there are in the interval [1e6, 1e7]. Thus, all else being equal, you'll typically get a more accurate answer if working with small numbers than very large numbers.

https://en.xdnf.cn/q/73319.html

Related Q&A

Fast numpy roll

I have a 2d numpy array and I want to roll each row in an incremental fashion. I am using np.roll in a for loop to do so. But since I am calling this thousands of times, my code is really slow. Can you…

IndexError: fail to coerce slice entry of type tensorvariable to integer

I run "ipython debugf.py" and it gave me error message as belowIndexError Traceback (most recent call last) /home/ml/debugf.py in <module>() 8 fff = …

How to detect lines in noisy line images?

I generate noisy images with certain lines in them, like this one:Im trying to detect the lines using OpenCV, but something is going wrong.Heres my code so far, including the code to generate the noisy…

How can I connect a StringVar to a Text widget in Python/Tkinter?

Basically, I want the body of a Text widget to change when a StringVar does.

python csv writer is adding quotes when not needed

I am having issues with writing json objects to a file using csv writer, the json objects seem to have multiple double quotes around them thus causing the json objects to become invalid, here is the re…

How to install google.cloud automl_v1beta1 for python using anaconda?

Google Cloud AutoML has python example code for detection, but I have error when importing these modulesfrom google.cloud import automl_v1beta1 from google.cloud.automl_v1beta1.proto import service_pb2…

Python3.8 - FastAPI and Serverless (AWS Lambda) - Unable to process files sent to api endpoint

Ive been using FastAPI with Serverless through AWS Lambda functions for a couple of months now and it works perfectly.Im creating a new api endpoint which requires one file to be sent.It works perfectl…

How to create a function for recursively generating iterating functions

I currently have a bit of Python code that looks like this:for set_k in data:for tup_j in set_k:for tup_l in tup_j:The problem is, Id like the number of nested for statements to differ based on user in…

How to get molecular weight of a compound in python?

User inputs a formula, for example: C12H2COOHWe have to calculate its molecular weight given that C = 12.01, H = 1.008 and O = 16. We were told to be careful of elements with double digits after it and…

How to install Numpy and Pandas for AWS Lambdas?

Problem: I wanted to use Numpy and Pandas in my AWS lambda function. I am working on Windows 10 with PyCharm. My function compiles and works fine on local machine, however, as soon as package it up and…