G22.2750/G63.2031....... Nonlinear Optimization ....... M. Overton
Homework 1
Assigned: Thurs Jan 27......Due: Thu Feb 10
Homework is due at midnight on the date indicated. Homework will be accepted
up to one week late with a 20% penalty. Homework is not
normally accepted more a week late. Homework should be given to me,
to Celeste in Rm 524, or put under my office door, not left in my mailbox.
- Exercises from the text
- Ex 15 on p.30, Chapter 2
- Ex 16 on p.30, Chapter 2.
- Ex 7 on p.65, Chapter 3.
- The line search algorithm (Algorithms 3.6 and 3.7)
of Chapter 3 is, in some respects, the most
unappealing algorithm we will see in this course.
It takes quite some effort to understand it in detail
and one is left with the lingering feeling that it should be possible to
write one that is just as good but less complicated. This assignment asks
you to examine the algorithm carefully and decide for yourself whether it
could be improved, and whether its complexity is justified.
- Carefully review the Matlab codes
that I have written. The first two are my implementations of the line search
algorithm, and these call a third (to do cubic interpolation) and fourth (to check
if the points generated lie inside an interval).
The fifth is a sample test function, and the sixth is a demo script.
You should particularly notice and adopt
the crucial idea of writing a function (not a script, except for
the demo) to do a specific job,
carefully documenting all input and output parameters, and writing it in
a way that will be generally useful subsequently. Notice in particular the
use of structures as parameters, so lots of things can be
collected in the fields of a single structure without making for a
complicated list of parameters. These fields include the name
of the function, which is a Matlab string. The objective function
is called inside "linesch" and "zoom" by "feval(pars.fgname, x, pars)"
(type "help feval" for more info).
Either convince yourself that linesch.m and lszoom.m are
correct implementations of Algorithms 3.6 and 3.7, or find fault with them,
document it and correct it.
- Write and document another Matlab function to
implement the steepest descent algorithm for minimizing a function of
several variables, using the line search code. The function should have
two parameters, "pars" and "options". The "pars" parameter should
be a structure defining the objective function, including the
name in the string field pars.fgname and any other parameters needed
to evaluate the function, just as in "lineschdemo". The
"options" parameter should be a structure used to pass options to
your steepest descent code, such as "gradnormtol", "maxiter",
"wolfeparamc1", "wolfeparamc2", with obvious meanings (and perhaps
others too).
An important part of the assignment
is to write readable, commented code, so that someone else (such
as the grader) can look at the code and understand it. Don't forget to
use indentation of code appropriately, such as inside a for loop.
Look at my codes as examples of hopefully good programming practice.
Test the algorithm on the quadratic function coded
in samplefun.m, perhaps with a smaller n.
Are the results consistent with what was predicted by the theorem
we proved on convergence of steepest descent on a quadratic?
Does the line search automatically do an exact line search in one
step, and if so why? Note that different calls to lineschdemo will
result in different matrices in pars.A and therefore different
condition numbers (the ratio of largest to smallest eigenvalue).
-
Extend your experiments to two non-quadratic functions, namely
- Rosenbrock's function (see Exercise 2.1 on p.29; you should implement this
as another function file, following the model in samplefun.m,
returning both function and gradient values)
- a third function of two or more variables that you make up yourself
What happens now? Does the choice
of Wolfe parameters have much effect? In particular, does choosing the
second Wolfe parameter c2 to be small, e.g. 1e-6, reduce
the number of steepest descent iterations required to achieve a given accuracy
(or, if you prefer, the gradient norm or function value achieved in a fixed
number of iterations)?
Try a number of different choices of the parameters and compare the results,
starting from the same starting point.
- Is it possible that the cubic interpolation routine might compute the
square root of a negative number or divide by 0? What would happen if in
either case as a result? If you are not sure, try it.
- How important is the cubic interpolation? Add an option (this will
require editing the line search code) that replaces
it by simple bisection, and see if this does the job just as well. Compare the
two on a case where you again make the second Wolfe parameter c2
very small, e.g. 1.0e-6. This has the effect of making the
line search closely approximate the one-dimensional minimizer
along the line at each step of steepest descent.
The code with cubic interpolation should converge much more quickly than
bisection, and so should require signficantly less function evaluations
per line search. Verify whether or not this is the case.
- Add code to print information that tells us the
Q-rate of convergence of the cubic interpolation procedure to the
one-dimensional minimizer
along the line, again forcing it be closely approximated by choosing
the second Wolfe parameter to be very small. What rate do you observe, experimentally?
- Code a much simpler line search that just starts with a step of
one and repeatedly bisects until a reduction in f is achieved, ignoring
the Wolfe conditions. Does this work just as well? You may find to your
surprise that it does. If so, try to cook up examples where it does not.
Hand in written answers to all these questions along with supporting output from
Matlab, selected judiciously. Be sure to include listings of any functions
you write, particularly your steepest descent code, but please do not
submit a lot of computer output. You may find ``diary" to be helpful.