Limit-memory BFGS Method
Slide: Starting with the classical Newton's method, we will examine its limitations and the improvements introduced by quasi-Newton methods. Finally, we will delve into the L-BFGS algorithm, which is particularly suited for large-scale problems. Inspired by Dr. Zhepei Wang's Lecture "Numerical Optimization for Robotics".
Introduction
Unconstrained Optimization
Consider a smooth and twice-differentiable unconstrained optimization problem:
Descent methods provide an iterative solution:
where
Newton's Method
By second-order Taylor expansion,
Minimizing quadratic approximation
For

Courtesy: Ardian Umam
Damped Newton Method
For
If the function is convex,
If the function is nonconvex,
Finally, direction is solved from
Practical Newton Method
Moreover, we can select
where

Courtesy: Cornell University
Quasi-Newton Methods
Newton's Method: Limitations
- High Cost: Computing the Hessian and its inverse requires
operations, impractical for large problems. - Indefinite Hessian: In nonconvex cases, the Hessian may lead to steps toward saddle points.
- Ill-Conditioning: Poorly conditioned Hessians amplify errors and hinder convergence.
- Inaccurate Model: Local quadratic approximations may fail for complex functions, causing inefficiency or divergence.

Quasi-Newton Approximation
Newton approximation:
Quasi-Newton approximation:
The matrix
- Avoid full second-order derivatives.
- Have a closed-form solution for linear equations.
- Retain first-order curvature information.
- Preserve the descent direction.
BFGS Method
Descent Direction
Search direction
Courtesy: Active Calculus
Curvature Information
At the point

The Optimal ?
Infinitely many
In BFGS, weight matrix is select to be:
Closed-form Solution for
To derive the optimal
Taking the derivative of the Lagrangian with respect to
Rearranging the terms, we express
Substituting this expression into the secant condition
Solving for
Finally, substituting
Broyden-Fletcher-Goldfarb-Shanno (BFGS)
Given the initial value
with
For computational efficiency, we often work with the inverse of
Guaranteeing PD of
To ensure that
For any nonzero vector
Equalities hold only when
Guranteeing
Armijo Condition (AC) cannot gurantee curvature, we need curvature condition (CC):
Typically,

Courtesy: Ján Kopačka
L-BFGS Method
Lewis-Overton Line Search
The Lewis-Overton line search is a sophisticated backtracking line search designed specifically for quasi-Newton methods:
- Given search direction
, current point and gradient - Initialize trial step
, , - Repeat
- Update bounds:
- If
fails, set - Else if
fails, set - Otherwise, return
- If
- Update
: - If
, set or - Otherwise, set
or
- If
- Ensure
- Update bounds:
Cautious Update
Sometimes, when line search is inexact or the function is poorly conditioned,
Skip update condition: If the curvature condition
Powell's Damping: If the curvature condition
Cautious updates guaranteed to have its iterates converge to a critical point if the function has bounded sublevel sets and a Lipschitz continuous gradient.
Two-Loop Recursion
L-BFGS uses a two-loop recursion to compute the search direction without explicitly forming the Hessian approximation. The algorithm maintains a history of the most recent
- Initialize an empty array
of length , - For
: - For
: - Return
This approach reduces the storage requirement from
Algorithm Summary
The complete L-BFGS algorithm with cautious update and Lewis-Overton line search:
- Initialize
, choose - For
until convergence: - Compute search direction:
using L-BFGS two-loop recursion - Find step size
using Lewis-Overton line search - Update:
- Compute
, - Apply cautious update to
- Compute search direction: