Skip to content

Kalman Filter in 3 Ways

Slide: Discover the Kalman Filter through the Geometric perspective of orthogonal projection, the Probabilistic perspective of Bayesian filtering, and the Optimization perspective of weighted least squares. Inspired by Ling Shi's 2024-25 Spring lecture "Networked Sensing, Estimation and Control".

Introduction

System Model and Assumptions

Consider a discrete-time linear Gaussian system with initial condition x0 and P0:

xk+1=Akxk+Bkuk+ωk,ωkN(0,Qk)yk=Ckxk+νk,νkN(0,Rk)

Assumptions:

  • (Ak,Bk) is controllable and (Ak,Ck) is observable
  • Qk0,Rk0,P00
  • ωk, νk and x0 are mutually uncorrelated
  • The future state of the system is conditionally independent of the past states given the current state

Goal: Find x^k|k=E[xk|y1:k] (MMSE estimator)

Geometric Perspective: Orthogonal Projection

Hilbert Space of Random Variables

Key Idea:

  • View random variables as vectors in Hilbert space
  • Inner product: ξ,η=E[ξη]
  • Orthogonality: ξηE[ξη]=0
  • Optimal estimate is orthogonal projection onto observation space

Geometric Interpretation:

Courtesy: https://math.stackexchange.com/users/117818/qqo

Time Update

State Prediction:

x^k|k1=E[xky1:k1]=E[Ak1xk1+Bk1uk1+wk1y1:k1]=Ak1x^k1|k1+Bk1uk1(since wk1y1:k1)

Covariance Prediction:

Pk|k1=cov(xkx^k|k1)=cov[Ak1(xk1x^k1|k1)+wk1]=Ak1cov(xk1x^k1|k1)Ak1+2Ak1cov(xkx^k|k1,ωk1)+cov(wk1)=Ak1Pk1|k1Ak1+Qk1

Innovation Process

Definition:

ek=yky^k|k1=ykprojYk1(yk)=ykprojYk1(Ckxk+νk)=ykCkprojYk1(xk)projYk1(νk)=ykCkx^k|k1

Properties:

  • Zero Mean: E[ek]=0
  • White Sequence: E[ekej]=0 for kj
  • Orthogonality Principle: E[ekyj]=0 for j<k

Measurement Update

State Update:

x^k|k=projYk(xk)=x^k|k1+Kkek=x^k|k1+Kk(ykCkx^k|k1)

Covariance Update:

Pk|k=cov(xkx^k|k)=cov(xkx^k|k1Kkek)=cov(xkx^k|k1)2Kkcov(xkx^k|k1,ek)+Kkcov(ek)Kk=cov(xkx^k|k1)2Kkcov(xkx^k|k1,ykCkx^k|k1)+Kkcov(ykCkx^k|k1)Kk=Pk|k1KkCkPk|k1Pk|k1CkK+Kk(CkPk|k1Ck+Rk)Kk

Kalman Gain Derivation

Optimal Kalman Gain:

tr(Pk|k)Kk=2Pk|k1Ck+2Kk(CkPk|k1Ck+Rk)=0Kk=Pk|k1Ck(CkPk|k1Ck+Rk)1

Covariance Derivation:

Pk|k=Pk|k1KkCkPk|k1=(Pk|k11+CkRk1Ck)1

Probabilistic Perspective: Bayesian Filtering

Bayesian Filtering Framework

p(xk|y1:k,u1:k)= p(xk|yk,y1:k1,u1:k)= p(yk|xk,y1:k1,u1:k)p(xk|y1:k1,u1:k)p(yk|y1:k1,u1:k)= ηp(yk|xk)p(xk|y1:k1,u1:k)= ηp(yk|xk)p(xk,xk1|y1:k1,u1:k) dxk1= ηp(yk|xk)p(xk|xk1,y1:k1,u1:k)p(xk1|y1:k1,u1:k) dxk1= ηp(yk|xk)observation modelp(xk|xk1,uk)motion modelp(xk1|y1:k1,u1:k1)previous belief dxk1

Prediction Step: Gaussian Propagation

p(xk|y1:k,u1:k)=ηN(yk;Ckxk,Rk)N(xk;Ak1xk1+Bk1uk1,Qk1)N(xk1;x^k1,Pk1) dxk1

Predicted Mean:

x^k|k1=E[Ak1xk1+Bk1uk1+wk1]=Ak1E[xk1]+Bk1uk1+E[wk1]=Ak1x^k1+Bk1uk1

Predicted Covariance:

Pk|k1=cov[Ak1xk1+Bk1uk1+wk1]=cov[Ak1xk1]+cov[wk1]=Ak1cov[xk1]Ak1+Qk1=Ak1Pk1Ak1+Qk1

Update Step: Gaussian Product

p(xk|y1:k,u1:k)=ηN(yk;Ckxk,Rk)N(xk;x^k|k1,Pk|k1)

Gaussian Product:

N(x;μ,Σ)N(x;μ1,Σ1)N(x,μ2,Σ2)Σ1=Σ11+Σ21μ=Σ(Σ11μ1+Σ21μ2)

Posterior Result:

x^k|k=x^k|k1+Kk(ykCkx^k|k1)Kk=Pk|k1Ck(CkPk|k1Ck+R)1Pk|k=(IKkCk)Pk|k1

Optimization Perspective: MAP Estimation

Maximum A Posteriori Formulation

MAP Estimation:

x^k|k=argmaxxkp(xky1:k)=argminxk[logp(xky1:k)]

Weighted Least Square:

E(x)=||Mxn||Σ2=xMΣ1Mx2nΣ1Mx+nΣ1nE=2MΣ1Mx2MΣ1nx^=(MΣ1M)1MΣ1n

MAP as Weighted Least Squares

Posterior Distribution:

p(xky1:k)p(ykxk)p(xky1:k1)

Assume Gaussian Distributions:

p(xky1:k1)=N(xk;x^k|k1,Pk|k1)p(ykxk)=N(yk;Ckxk,Rk)

Negative Log-Posterior:

logp(xky1:k)12ykCkxkRk12+12xkx^k|k1Pk|k112=12[CkI]xk[ykx^k|k1]Σ12

where Σ=[Rk00Pk|k1].

MAP Solution

Weighted Least Squares Form:

M=[CkI],n=[ykx^k|k1],Σ=[Rk00Pk|k1]

MAP Estimate:

x^k|k=(MΣ1M)1MΣ1n=(CkRk1Ck+Pk|k11)1(CkRk1yk+Pk|k11x^k|k1)

Equivalence Proof

Using Matrix Inversion Lemma:

x^k|k=(CkRk1Ck+Pk|k11)1(CkRk1yk+Pk|k11x^k|k1)=x^k|k1+Pk|k1Ck(CkPk|k1Ck+Rk)1(ykCkx^k|k1)

Proof:

(CkRk1Ck+Pk|k11)1CkRk1=Pk|k1Ck(CkPk|k1Ck+Rk)1

This shows the equivalence between the MAP solution and the Kalman update.

Conclusion

Theoretical Insights and Extensions

Key Insights:

  • Geometric: Reveals orthogonality principle and innovation process
  • Probabilistic: Shows optimality under Gaussian assumptions
  • Optimization: Connects to weighted least squares and regularization

Unified Algorithm: All approaches yield the same recursive equations:

time update {x^k|k1=Ak1x^k1|k1Pk|k1=Ak1Pk1|k1Ak1+Qmeasurement update {Kk=Pk|k1Ck(CkPk|k1Ck+R)1x^k|k=x^k|k1+Kk(ykCkx^k|k1)Pk|k=(IKkCk)Pk|k1

Extensions:

  • Nonlinear systems: EKF, UKF, particle filters
  • Non-Gaussian noise: robust Kalman filters