Skip to content

Linear Quadratic Regulator in 3 Ways

Slide: Explore the Linear Quadratic Regulator (LQR) through the Indirect Shooting Method (Pontryagin's Principle), the Optimization Approach (Quadratic Programming), and the Recursive Solution (Riccati Equation). Inspired by Zachary Manchester's Spring 2024-25 lecture "Optimal Control and Reinforcement Learning".

Introduction

Problem Formulation

Consider a discrete-time linear system:

xn+1=Anxn+Bnun

Quadratic cost function:

minx1:N,u1:N1J=n=1N1[12xnQnxn+12unRnun]running cost+12xNQNxNterminal cost

Assumptions:

  • (An,Bn) is controllable and (An,Cn) is observable
  • Qn0,Rn0,QN0

Indirect Shooting: PMP Perspective

Problem Formulation and Optimality Conditions

Consider the deterministic discrete-time optimal control problem:

minx1:N,u1:N1n=1N1l(xn,un)+lF(xN)s.t.  xn+1=f(xn,un)unU

The first-order necessary conditions for optimality can be derived using:

  • The Lagrangian framework (special case of KKT conditions)
  • Pontryagin's Minimum Principle (PMP)

Lagrangian Formulation

Form the Lagrangian:

L=n=1N1l(xn,un)+λn+1(f(xn,un)xn+1)+lF(xN)

Define the Hamiltonian:

H(xn,un,λn+1)=l(xn,un)+λn+1f(xn,un)

Rewrite the Lagrangian using the Hamiltonian:

L=H(x1,u1,λ2)+[n=2N1H(xn,un,λn+1)λnxn]+lF(xN)λNxN

Optimality Conditions

Take derivatives with respect to x and λ:

Lλn=Hλnxn+1=f(xn,un)xn+1=0Lxn=Hxnλn=lxn+λn+1fxnλn=0LxN=lFxNλN=0

For u, we write the minimization explicitly to handle constraints:

un=argminuH(xn,u,λn+1)s.t. uU

Summary of Necessary Conditions

The first-order necessary conditions can be summarized as:

xn+1=λH(xn,un,λn+1)λn=xH(xn,un,λn+1)un=argminuH(xn,u,λn+1),s.t. uUλN=lFxN

In continuous time, these become:

x˙=λH(x,u,λ)λ˙=xH(x,u,λ)u=argminu~H(x,u~,λ),s.t. u~Uλ(tF)=lFx

Application to LQR Problems

For LQR problems with quadratic cost and linear dynamics:

l(xn,un)=12(xnQnxn+unRnun)lF(xN)=12xNQNxNf(xn,un)=Anxn+Bnun

The necessary conditions simplify to:

xn+1=Anxn+Bnunλn=Qnxn+Anλn+1λN=QNxNun=Rn1Bnλn+1

This forms a linear two-point boundary value problem.

Indirect Shooting Algorithm for LQR

Procedure:

  1. Make initial guess for control sequence u1:N1
  2. Forward pass: Simulate dynamics to get state trajectory x1:N
  3. Backward pass:
    • Set terminal costate: λN=QNxN
    • Compute costate trajectory: λn=Qnxn+Anλn+1
    • Compute control adjustment: Δun=Rn1Bnλn+1un
  4. Line search: Update controls unun+αΔun
  5. Iterate until convergence

Direct Approach: QP Perspective

LQR as Quadratic Programming Problem

Assume x1 is given, define the decision variable vector and the block-diagonal matrix:

z=[u1x2u2xN],H=[R1Q2R2QN]

The dynamics constraints can be expressed as

[B1IA2B2IAN1BN1I]C[u1x2xN]=[A1x100]d

QP Formulation and KKT Conditions

The LQR problem becomes the QP:

minzJ=12zHzsubject toCz=d

The Lagrangian of this QP is:

L(z,λ)=12zHz+λ(Czd)

The KKT conditions are:

zL=Hz+Cλ=0λL=Czd=0

This leads to the linear system:

[HCC0][zλ]=[0d]

We get the exact solution by solving one linear system!

Riccati Equation Solution

KKT System Structure for LQR

The KKT system for LQR has a highly structured sparse form, consider an N=4 case:

[R1B1TQ2IA2TR2B2TQ3IA3TR3B3TQ4IB1IA2B2IA3B3I][u1x2u2x3u3x4λ2λ3λ4]=[000000A1x100]

Deriving the Riccati Recursion

Start from the terminal condition (blue equation):

Q4x4λ4=0λ4=Q4x4

Move to the previous equation (red equation):

R3u3+B3λ4=R3u3+B3Q4x4=0

Substitute x4=A3x3+B3u3:

R3u3+B3Q4(A3x3+B3u3)=0

Solve for u3:

u3=(R3+B3Q4B3)1B3Q4A3K3x3

Deriving the Riccati Recursion (Cont'd)

Now consider the green equation:

Q3x3λ3+A3λ4=0

Substitute λ4=Q4x4 and x4=A3x3+B3u3:

Q3x3λ3+A3Q4(A3x3+B3u3)=0

Substitute u3=K3x3:

Q3x3λ3+A3Q4(A3x3B3K3x3)=0

Solve for λ3:

λ3=(Q3+A3Q4(A3B3K3))P3x3

Riccati Recursion Formula

We now have a recursive relationship. Generalizing:

PN=QNKk=(Rk+BkPk+1Bk)1BkPk+1AkPk=Qk+AkPk+1(AkBkKk)

This is the celebrated Riccati equation.

The solution process involves:

  1. A backward Riccati pass to compute Pk and Kk for k=N1,,1
  2. A forward rollout to compute x1:N and u1:N1 using uk=Kkxk

Computational Complexity

Naive QP Solution: Treats problem as one big least-squares.

  • Computational cost: O[N3(n+m)3]
  • Must be re-solved from scratch for any change.

Riccati Recursion: Exploits the temporal structure.

  • Computational cost: O[N(n+m)3]
  • Exponentially faster for long horizons (large N).

The Riccati Solution is More Than Just Fast:

  • It provides a ready-to-use feedback policy: uk=Kkxk
  • This policy is adaptive: optimal for any initial state x1, not just a single one.
  • It enables real-time control by naturally rejecting disturbances.
  • And it delivers the exact same optimal solution as the QP.

Conclusion

Summary

Finite-Horizon Problems

  • Use Riccati recursion backward in time
  • Store gain matrices Kn
  • Apply time-varying feedback

Infinite-Horizon Problems

  • Solve algebraic Riccati equation offline
  • Use constant gain matrix K
  • Implement simple state feedback
  • Algebraic Riccati Equation (ARE):P=Q+APAAPB(R+BPB)1BPA