|
|
Suppose we want to fit the power model y = a tb to our small four-point data set from Part 3. Let's use the notation
f(t; a,b) = a tb
in order to explicitly emphasize the dependence of the model on the parameters a and b, given some fixed value of t.
We want to find the optimal values of the parameters a and b yielding the best fitting power function. Suppose we have an initial guess (a0,b0) for the parameters. For a fixed value t, we can expand f in a Taylor series about (a0,b0):
f(t; a,b) = f(t; a0,b0) + fa(t; a0,b0)
(a - a0) + fb(t; a0,b0) (b - b0)
+ higher order
terms.
Here the partial derivatives are
fa(t; a,b) = tb
fb(t; a,b) = a tb ln(t)
If we let da = (a - a0) and db = (b - b0) and drop higher order terms in da and db from our Tayor expansion, we get the first order approximation
f(t; a,b) - f(t; a0,b0) = fa(t; a0,b0) da + fb(t; a0,b0) db.
Our data set is
(T1,Y1), (T2,Y2), (T3,Y3), (T4,Y4).
We would like to solve for a and b so that Yi = f(Ti, a,b) for i = 1, ... 4. Thus, we would like to solve the following system for da and db (and hence a = a0 + da and b = b0 + db):
Y1 - f(T1; a0,b0) = fa(T1;
a0,b0) da + fb(T1; a0,b0)
db
Y2 - f(T2; a0,b0) = fa(T2;
a0,b0) da + fb(T2; a0,b0)
db
Y3 - f(T3; a0,b0) = fa(T3;
a0,b0) da + fb(T3; a0,b0)
db
Y4 - f(T4; a0,b0) = fa(T4;
a0,b0) da + fb(T4; a0,b0)
db
There are four equations in the two unknowns da and db. We can't find a solution to such an overdetermined system, but we can solve it in the least squares sense. Form the sum of squares of the differences between the left and right sides of each equation. Then solve this linear least squares problem for the da and db values that minimize the sum of squares by solving the normal equations.
y* = (Y1 - f(T1;a0,b0), Y2 - f(T2;a0,b0), ... , Y4 - f(T4;a0,b0) )T
fa = ( fa(T1; a0,b0), fa(T2; a0,b0), ... , fa(T4; a0,b0) )T
Our least squares problem is equivalent to finding the closest vector
to y* that lies within the two-dimensional subspace W = span(fa,fb).
Solve the normal equations to find the least squares solution values da
and db.
Use your solution values da and db to update a and b:
a = a0 + da
b = b0 + db
Repeat steps 2 and 3 again using your newest estimate of a and
b as your initial guess. Do your values of a and b seem to be converging
to the claimed optimal values of a = 0.848 and b= 2.935?
Since steps 2 and 3 should be repeated until convergence is achieved,
your helper application worksheet has a looping structure set up to do
this automatically. Execute the loop and watch the convergence.
How much accuracy is achieved in the optimal values of a and b? How
many iterations are required? Compute the residuals
Yi - f(Ti,a,b),
for i = 1, ... , 4
corresponding to the optimal fit. Also compute the sum of
squares of these residuals.
Let's now take up the ambitious task of fitting the logistic growth model
to the U.S. population data of Part 1.
Use your helper application to find the partial derivative
of f with respect to each parameter: P0, K, and r.
Using the initial quess P0 = 78, K = 700, and r = 0.0168, form the following vectors in R10:
y* = (Y1 - f(T1; P0,K,r), Y2 - f(T2;P0,K,r), ... , Y10 - f(T10;P0,K,r) )T
fPo = ( fPo(T1; P0,K,r), fa(T2; P0,K,r), ... , fa(T10; P0,K,r) )T
fK = ( fK(T1; P0,K,r), fb(T2; P0,K,r), ... , fb(T10; P0,K,r) )T
Form the least square matrix X and solve the normal equations for dP0,
dK, and dr. Use these values to update P0, K, and r.
Now use the looping structure in your helper application worksheet to iteratively
solve for the optimal P0, K, and r. How many iterations
are needed?
Plot the least squares logistic curve that you just found
together with a scatter plot of the U.S. population data. How good
is the fit?
Make a residual plot for your optimal logistic fit. Compare
the logistic fit to the quadratic fit from Part 1 and the exponential fit
from Part 2.
|
|