Kalman Filter in 3 Ways

Slide: Discover the Kalman Filter through the Geometric perspective of orthogonal projection, the Probabilistic perspective of Bayesian filtering, and the Optimization perspective of weighted least squares. Inspired by Ling Shi's 2024-25 Spring lecture "Networked Sensing, Estimation and Control".

Introduction

System Model and Assumptions

Consider a discrete-time linear Gaussian system with initial condition $x_{0}$ and $P_{0}$ :

\begin{aligned} x_{k + 1} & = A_{k} x_{k} + B_{k} u_{k} + ω_{k}, & ω_{k} \sim N (0, Q_{k}) \\ y_{k} & = C_{k} x_{k} + ν_{k}, & ν_{k} \sim N (0, R_{k}) \end{aligned}

Assumptions:

$(A_{k}, B_{k})$ is controllable and $(A_{k}, C_{k})$ is observable
$Q_{k} ⪰ 0, R_{k} ⪰ 0, P_{0} ⪰ 0$
$ω_{k}$ , $ν_{k}$ and $x_{0}$ are mutually uncorrelated
The future state of the system is conditionally independent of the past states given the current state

Goal: Find ${\hat{x}}_{k | k} = E [x_{k} | y_{1 : k}]$ (MMSE estimator)

Geometric Perspective: Orthogonal Projection

Hilbert Space of Random Variables

Key Idea:

View random variables as vectors in Hilbert space
Inner product: $⟨ ξ, η ⟩ = E [ξ η]$
Orthogonality: $ξ ⊥ η \Leftrightarrow E [ξ η] = 0$
Optimal estimate is orthogonal projection onto observation space

Geometric Interpretation:

Courtesy: https://math.stackexchange.com/users/117818/qqo

Time Update

State Prediction:

\begin{aligned} {\hat{x}}_{k | k - 1} & = E [x_{k} ∣ y_{1 : k - 1}] \\ = E [A_{k - 1} x_{k - 1} + B_{k - 1} u_{k - 1} + w_{k - 1} ∣ y_{1 : k - 1}] \\ = A_{k - 1} {\hat{x}}_{k - 1 | k - 1} + B_{k - 1} u_{k - 1} (since w_{k - 1} ⊥ y_{1 : k - 1}) \end{aligned}

Covariance Prediction:

\begin{aligned} P_{k | k - 1} & = cov (x_{k} - {\hat{x}}_{k | k - 1}) \\ = cov [A_{k - 1} (x_{k - 1} - {\hat{x}}_{k - 1 | k - 1}) + w_{k - 1}] \\ = A_{k - 1} \cdot cov (x_{k - 1} - {\hat{x}}_{k - 1 | k - 1}) \cdot A_{k - 1}^{⊤} + 2 A_{k - 1} \cdot cov (x_{k} - {\hat{x}}_{k | k - 1}, ω_{k - 1}) + cov (w_{k - 1}) \\ = A_{k - 1} P_{k - 1 | k - 1} A_{k - 1}^{⊤} + Q_{k - 1} \end{aligned}

Innovation Process

Definition:

\begin{aligned} e_{k} & = y_{k} - {\hat{y}}_{k | k - 1} \\ = y_{k} - {proj}_{Y_{k - 1}} (y_{k}) \\ = y_{k} - {proj}_{Y_{k - 1}} (C_{k} x_{k} + ν_{k}) \\ = y_{k} - C_{k} \cdot {proj}_{Y_{k - 1}} (x_{k}) - {proj}_{Y_{k - 1}} (ν_{k}) \\ = y_{k} - C_{k} {\hat{x}}_{k | k - 1} \end{aligned}

Properties:

Zero Mean: $E [e_{k}] = 0$
White Sequence: $E [e_{k} e_{j}^{⊤}] = 0$ for $k \neq j$
Orthogonality Principle: $E [e_{k} y_{j}^{⊤}] = 0$ for $j < k$

Measurement Update

State Update:

\begin{aligned} {\hat{x}}_{k | k} & = {proj}_{Y_{k}} (x_{k}) \\ = {\hat{x}}_{k | k - 1} + K_{k} e_{k} \\ = {\hat{x}}_{k | k - 1} + K_{k} (y_{k} - C_{k} {\hat{x}}_{k | k - 1}) \end{aligned}

Covariance Update:

\begin{aligned} P_{k | k} & = cov (x_{k} - {\hat{x}}_{k | k}) \\ = cov (x_{k} - {\hat{x}}_{k | k - 1} - K_{k} e_{k}) \\ = cov (x_{k} - {\hat{x}}_{k | k - 1}) - 2 K_{k} cov (x_{k} - {\hat{x}}_{k | k - 1}, e_{k}) + K_{k} cov (e_{k}) K_{k}^{⊤} \\ = cov (x_{k} - {\hat{x}}_{k | k - 1}) - 2 K_{k} cov (x_{k} - {\hat{x}}_{k | k - 1}, y_{k} - C_{k} {\hat{x}}_{k | k - 1}) + K_{k} cov (y_{k} - C_{k} {\hat{x}}_{k | k - 1}) K_{k}^{⊤} \\ = P_{k | k - 1} - K_{k} C_{k} P_{k | k - 1} - P_{k | k - 1} C_{k}^{⊤} K^{⊤} + K_{k} (C_{k} P_{k | k - 1} C_{k}^{⊤} + R_{k}) K_{k}^{⊤} \end{aligned}

Kalman Gain Derivation

Optimal Kalman Gain:

\frac{\partial tr (P_{k | k})}{\partial K_{k}} = - 2 P_{k | k - 1} C_{k}^{⊤} + 2 K_{k} (C_{k} P_{k | k - 1} C_{k}^{⊤} + R_{k}) = 0

K_{k} = P_{k | k - 1} C_{k}^{⊤} (C_{k} P_{k | k - 1} C_{k}^{⊤} + R_{k})^{- 1}

Covariance Derivation:

P_{k | k} = P_{k | k - 1} - K_{k} C_{k} P_{k | k - 1} = (P_{k | k - 1}^{- 1} + C_{k}^{⊤} R_{k}^{- 1} C_{k})^{- 1}

Probabilistic Perspective: Bayesian Filtering

Bayesian Filtering Framework

\begin{aligned} p (x_{k} | y_{1 : k}, u_{1 : k}) \\ = & p (x_{k} | y_{k}, y_{1 : k - 1}, u_{1 : k}) \\ = & \frac{p (y_{k} | x_{k}, y_{1 : k - 1}, u_{1 : k}) \cdot p (x_{k} | y_{1 : k - 1}, u_{1 : k})}{p (y_{k} | y_{1 : k - 1}, u_{1 : k})} \\ = & η \cdot p (y_{k} | x_{k}) \cdot p (x_{k} | y_{1 : k - 1}, u_{1 : k}) \\ = & η \cdot p (y_{k} | x_{k}) \cdot \int p (x_{k}, x_{k - 1} | y_{1 : k - 1}, u_{1 : k}) d x_{k - 1} \\ = & η \cdot p (y_{k} | x_{k}) \cdot \int p (x_{k} | x_{k - 1}, y_{1 : k - 1}, u_{1 : k}) \cdot p (x_{k - 1} | y_{1 : k - 1}, u_{1 : k}) d x_{k - 1} \\ = & η \cdot \underset{observation model}{\underset{⏟}{p (y_{k} | x_{k})}} \cdot \int \underset{motion model}{\underset{⏟}{p (x_{k} | x_{k - 1}, u_{k})}} \cdot \underset{previous belief}{\underset{⏟}{p (x_{k - 1} | y_{1 : k - 1}, u_{1 : k - 1})}} d x_{k - 1} \end{aligned}

Prediction Step: Gaussian Propagation

p (x_{k} | y_{1 : k}, u_{1 : k}) = η \cdot N (y_{k}; C_{k} x_{k}, R_{k}) \cdot \int N (x_{k}; A_{k - 1} x_{k - 1} + B_{k - 1} u_{k - 1}, Q_{k - 1}) \cdot N (x_{k - 1}; {\hat{x}}_{k - 1}, P_{k - 1}) d x_{k - 1}

Predicted Mean:

\begin{aligned} {\hat{x}}_{k | k - 1} & = E [A_{k - 1} x_{k - 1} + B_{k - 1} u_{k - 1} + w_{k - 1}] \\ = A_{k - 1} E [x_{k - 1}] + B_{k - 1} u_{k - 1} + E [w_{k - 1}] \\ = A_{k - 1} {\hat{x}}_{k - 1} + B_{k - 1} u_{k - 1} \end{aligned}

Predicted Covariance:

\begin{aligned} P_{k | k - 1} & = cov [A_{k - 1} x_{k - 1} + B_{k - 1} u_{k - 1} + w_{k - 1}] \\ = cov [A_{k - 1} x_{k - 1}] + cov [w_{k - 1}] \\ = A_{k - 1} cov [x_{k - 1}] A_{k - 1}^{⊤} + Q_{k - 1} \\ = A_{k - 1} P_{k - 1} A_{k - 1}^{⊤} + Q_{k - 1} \end{aligned}

Update Step: Gaussian Product

p (x_{k} | y_{1 : k}, u_{1 : k}) = η \cdot N (y_{k}; C_{k} x_{k}, R_{k}) \cdot N (x_{k}; {\hat{x}}_{k | k - 1}, P_{k | k - 1})

Gaussian Product:

N (x; μ, Σ) \propto N (x; μ_{1}, Σ_{1}) \cdot N (x, μ_{2}, Σ_{2})

\begin{aligned} Σ^{- 1} & = Σ_{1}^{- 1} + Σ_{2}^{- 1} \\ μ & = Σ (Σ_{1}^{- 1} μ_{1} + Σ_{2}^{- 1} μ_{2}) \end{aligned}

Posterior Result:

\begin{aligned} {\hat{x}}_{k | k} & = {\hat{x}}_{k | k - 1} + K_{k} (y_{k} - C_{k} {\hat{x}}_{k | k - 1}) \\ K_{k} & = P_{k | k - 1} C_{k}^{⊤} (C_{k} P_{k | k - 1} C_{k}^{⊤} + R)^{- 1} \\ P_{k | k} & = (I - K_{k} C_{k}) P_{k | k - 1} \end{aligned}

Optimization Perspective: MAP Estimation

Maximum A Posteriori Formulation

MAP Estimation:

\begin{aligned} {\hat{x}}_{k | k} & = \arg max_{x_{k}} p (x_{k} ∣ y_{1 : k}) \\ = \arg min_{x_{k}} [- \log p (x_{k} ∣ y_{1 : k})] \end{aligned}

Weighted Least Square:

E (x) = | | M x - n | |_{Σ}^{2} = x^{⊤} M^{⊤} Σ^{- 1} M x - 2 n^{⊤} Σ^{- 1} M x + n^{⊤} Σ^{- 1} n

\nabla E = 2 M^{⊤} Σ^{- 1} M x - 2 M^{⊤} Σ^{- 1} n

\hat{x} = (M^{⊤} Σ^{- 1} M)^{- 1} M^{⊤} Σ^{- 1} n

MAP as Weighted Least Squares

Posterior Distribution:

p (x_{k} ∣ y_{1 : k}) \propto p (y_{k} ∣ x_{k}) p (x_{k} ∣ y_{1 : k - 1})

Assume Gaussian Distributions:

\begin{aligned} p (x_{k} ∣ y_{1 : k - 1}) & = N (x_{k}; {\hat{x}}_{k | k - 1}, P_{k | k - 1}) \\ p (y_{k} ∣ x_{k}) & = N (y_{k}; C_{k} x_{k}, R_{k}) \end{aligned}

Negative Log-Posterior:

\begin{aligned} - \log p (x_{k} ∣ y_{1 : k}) & \propto \frac{1}{2} ∥ y_{k} - C_{k} x_{k} ∥_{R_{k}^{- 1}}^{2} + \frac{1}{2} ∥ x_{k} - {\hat{x}}_{k | k - 1} ∥_{P_{k | k - 1}^{- 1}}^{2} \\ = \frac{1}{2} {‖ [\begin{array}{c} C_{k} \\ I \end{array}] x_{k} - [\begin{array}{c} y_{k} \\ {\hat{x}}_{k | k - 1} \end{array}] ‖}_{Σ^{- 1}}^{2} \end{aligned}

where $Σ = [\begin{matrix} R_{k} & 0 \\ 0 & P_{k | k - 1} \end{matrix}]$ .

MAP Solution

Weighted Least Squares Form:

M = [\begin{matrix} C_{k} \\ I \end{matrix}], n = [\begin{matrix} y_{k} \\ {\hat{x}}_{k | k - 1} \end{matrix}], Σ = [\begin{matrix} R_{k} & 0 \\ 0 & P_{k | k - 1} \end{matrix}]

MAP Estimate:

\begin{aligned} {\hat{x}}_{k | k} & = {(M^{⊤} Σ^{- 1} M)}^{- 1} M^{⊤} Σ^{- 1} n \\ = {(C_{k}^{⊤} R_{k}^{- 1} C_{k} + P_{k | k - 1}^{- 1})}^{- 1} (C_{k}^{⊤} R_{k}^{- 1} y_{k} + P_{k | k - 1}^{- 1} {\hat{x}}_{k | k - 1}) \end{aligned}

Equivalence Proof

Using Matrix Inversion Lemma:

\begin{aligned} {\hat{x}}_{k | k} & = {(C_{k}^{⊤} R_{k}^{- 1} C_{k} + P_{k | k - 1}^{- 1})}^{- 1} (C_{k}^{⊤} R_{k}^{- 1} y_{k} + P_{k | k - 1}^{- 1} {\hat{x}}_{k | k - 1}) \\ = {\hat{x}}_{k | k - 1} + P_{k | k - 1} C_{k}^{⊤} (C_{k} P_{k | k - 1} C_{k}^{⊤} + R_{k})^{- 1} (y_{k} - C_{k} {\hat{x}}_{k | k - 1}) \end{aligned}

Proof:

\begin{array}{r} {(C_{k}^{⊤} R_{k}^{- 1} C_{k} + P_{k | k - 1}^{- 1})}^{- 1} C_{k}^{⊤} R_{k}^{- 1} = P_{k | k - 1} C_{k}^{⊤} (C_{k} P_{k | k - 1} C_{k}^{⊤} + R_{k})^{- 1} \end{array}

This shows the equivalence between the MAP solution and the Kalman update.

Conclusion

Theoretical Insights and Extensions

Key Insights:

Geometric: Reveals orthogonality principle and innovation process
Probabilistic: Shows optimality under Gaussian assumptions
Optimization: Connects to weighted least squares and regularization

Unified Algorithm: All approaches yield the same recursive equations:

\begin{aligned} time update & {\begin{cases} {\hat{x}}_{k | k - 1} = A_{k - 1} {\hat{x}}_{k - 1 | k - 1} \\ P_{k | k - 1} = A_{k - 1} P_{k - 1 | k - 1} A_{k - 1}^{⊤} + Q \end{cases} \\ measurement update & {\begin{cases} K_{k} = P_{k | k - 1} C_{k}^{⊤} (C_{k} P_{k | k - 1} C_{k}^{⊤} + R)^{- 1} \\ {\hat{x}}_{k | k} = {\hat{x}}_{k | k - 1} + K_{k} (y_{k} - C_{k} {\hat{x}}_{k | k - 1}) \\ P_{k | k} = (I - K_{k} C_{k}) P_{k | k - 1} \end{cases} \end{aligned}

Extensions:

Nonlinear systems: EKF, UKF, particle filters
Non-Gaussian noise: robust Kalman filters

Kalman Filter in 3 Ways ​

Introduction ​

System Model and Assumptions ​

Geometric Perspective: Orthogonal Projection ​

Hilbert Space of Random Variables ​

Time Update ​

Innovation Process ​

Measurement Update ​

Kalman Gain Derivation ​

Probabilistic Perspective: Bayesian Filtering ​

Bayesian Filtering Framework ​

Prediction Step: Gaussian Propagation ​

Update Step: Gaussian Product ​

Optimization Perspective: MAP Estimation ​

Maximum A Posteriori Formulation ​

MAP as Weighted Least Squares ​

MAP Solution ​

Equivalence Proof ​

Conclusion ​

Theoretical Insights and Extensions ​