Condition number of a matrix: basics

March 24, 2021

A special kind of norm for $n\times n$ square matrices $\mathrm{A}$ is called the subordinate (or induced) norm. If we have a norm $\lVert\cdot\rVert_\alpha$ on the input space, and a second norm $\lVert\cdot\rVert_\beta$ on the output space, the subordinate norm is

$$ \lVert\mathrm{A}\rVert_{\alpha, \beta} = \sup_{x \neq 0}\dfrac{\;\lVert\mathrm{A}x\rVert_{\beta}}{\;\lVert x\rVert_\alpha} $$

Submultiplicative property

An important property of these induced norms is that for any third norm $\lVert\cdot\rVert_{\gamma}$, we have:

$$ \lVert\mathrm{AB}\rVert_{\alpha, \beta} \leq \lVert\mathrm{A}\rVert_{\gamma, \beta} \lVert\mathrm{B}\rVert_{\alpha, \gamma} $$

This is easily proven via the computations

$$ \dfrac{\;\lVert\mathrm{ABx}\rVert_{\beta}}{\;\lVert x\rVert_\alpha} = \dfrac{\;\lVert\mathrm{A}y\rVert_{\beta}}{\;\lVert y\rVert_\gamma} \dfrac{\;\lVert\mathrm{B}x\rVert_{\gamma}}{\;\lVert x\rVert_\gamma} $$ for $y = \mathrm{B}x$. As each supremum on the right is an upper bound, we get the inequality

$$ \dfrac{\;\lVert\mathrm{ABx}\rVert_{\beta}}{\;\lVert x\rVert_\alpha} \leq \lVert\mathrm{A}\rVert_{\gamma, \beta} \lVert\mathrm{B}\rVert_{\alpha, \gamma} $$

we get the final equality because the supremum is the least upper bound. A special case is when taking $\alpha = \beta$, which yields that $$\lVert\mathrm{AB}\rVert_{\alpha, \alpha} \leq \lVert\mathrm{A}\rVert_{\alpha, \alpha} \lVert\mathrm{B}\rVert_{\alpha, \alpha} $$ we denote this norm $\lVert\cdot\rVert_\alpha$.

Condition number

The condition number $\kappa_\alpha(\mathrm{A})$ of a matrix is the quantity $$\kappa_\alpha(\mathrm{A}) = \lVert\mathrm{A}\rVert_\alpha\lVert\mathrm{A}^{-1}\rVert_\alpha $$ which is larger than $1$ because $$ 1 = \lVert\mathrm{I}\rVert_\alpha = \lVert\mathrm{A}\mathrm{A}^{-1}\rVert_\alpha \leq \lVert\mathrm{A}\rVert_\alpha\lVert\mathrm{A}^{-1}\rVert_\alpha $$ this number is large when either $\mathrm{A}$ or its inverse creates large distortions as the input varies on the unit circle.

Stability of a similarity

In linear algebra, a matrix $\mathrm{B}$ is said to be similar to $\mathrm{A}$ when there is an invertible transformation $\mathrm{X}$ such that $\mathrm{B} = \mathrm{X}\mathrm{A}\mathrm{X}^{-1}$. In numerical analyses, it is often that we wish to know the robustness of an operation $f\colon x \mapsto f(x)$ with respect to a small perturbation of its input: if $y = f(x)$, what is $f(x+\varepsilon)$ ? For real functions, continuity tells us that as long as $\lvert \varepsilon \rvert < \delta$, the perturbed output $f(x+\varepsilon)$ is $\tilde{\varepsilon}$-close to $y$ for some small $\tilde{\varepsilon} > 0$. What is the equivalent bound on the error of the matrix function $f_{\mathrm{X}}(\mathrm{A}) = \mathrm{X}\mathrm{A}\mathrm{X}^{-1}$ ? Suppose $\mathrm{A}$ is perturbed by a matrix $\mathrm{E}$ with small magnitude: $\lVert\mathrm{E}\rVert_\alpha < \delta$ Then, the value of the function at this perturbed output is

$$ f_{\mathrm{X}}(\mathrm{A}+ \mathrm{E}) = \mathrm{X}(\mathrm{A}+ \mathrm{E})\mathrm{X}^{-1} = f_{\mathrm{X}}(\mathrm{A}) + f_{\mathrm{X}}(\mathrm{E}) $$

thus, the norm of the error $\mathrm{B}\, – f_{\mathrm{X}}(\mathrm{A}+ \mathrm{E})$ is given by

$$ \lVert\mathrm{B}\,– f_{\mathrm{X}}(\mathrm{A}+ \mathrm{E})\rVert_\alpha = \lVert\mathrm{X}\mathrm{E}\mathrm{X}^{-1}\rVert_\alpha \leq \kappa_\alpha(\mathrm{X})\,\lVert\mathrm{E}\rVert_\alpha $$

therefore, the condition number $\kappa_\alpha$ serves as a measure of continuity for the similarity operation $f_\mathrm{X}(\cdot)$. If the condition number is small, the matrix is said to be “well-behaved”, so that for small input errors $\mathrm{E}$ the output is only perturbed at most by the small amout $\kappa_\alpha(\mathrm{X})\,\lVert\mathrm{E}\rVert_\alpha$. If the condition number is large, even small input errors cause large deviations in the output.

Orthogonal matrices are stable

The norm induced by setting $\alpha = 2$ reveals that orthogonal matrices (the matrices $\mathrm{Q}$ for which $\mathrm{Q}^\mathsf{T}\mathrm{Q} = \mathrm{I}$) are computationally very stable. Indeed, the perturbation created by the similarity $\mathrm{A}\mapsto \mathrm{Q}\mathrm{A}\mathrm{Q}^\mathsf{T}$ is equal to $\lVert\mathrm{Q}\mathrm{E}\mathrm{Q}^\mathsf{T}\rVert_2 = \lVert\mathrm{E}\rVert_2$

These transformations introduce no extra perturbations !

Stability of a time-varying linear system

Consider the time-varying $n\times n$ system $\mathrm{A}(t)\, x(t) = b(t)$. Here, the notion of “stable” is different: a time-varying linear system is unstable if small durations cause important variations in the state $x$. A stable system has therefore a small relative variation $\lVert \dot{x} \rVert / \lVert x \rVert$.

Differentiating this equality yields

$$ \dot{x}(t) =\mathrm{A}(t)^{-1}\dot{b}(t)\, – \mathrm{A}(t)^{-1}\dot{\mathrm{A}}(t)\,x(t) $$

Therefore the stability of the system can be upper-bounded (dropping the $t$ for simplicity)

$$ \dfrac{\lVert\dot{x}\rVert}{\lVert x \rVert} \leq \kappa(\mathrm{A}) \left(\dfrac{\lVert \dot{b} \rVert}{\lVert \mathrm{A}\rVert \lVert x\rVert } + \dfrac{\lVert \dot{\mathrm{A}}\rVert}{\lVert \mathrm{A}\rVert}\right) $$

therefore, a system whose matrix has a small condition number at time $t$ will stay relatively close to its current position in the near future.