Multivariate gaussian integral approximation

A fairly important result in statistics and machine learning is about the evaluation of the gaussian integral:

$$ I(\pmb{\Sigma}) = \int_{\mathcal{X}}\exp\left(– \frac{1}{2}\mathbf{x}^\mathsf{T}\pmb{\Sigma}^{-1}\mathbf{x}\right)\,\mathrm{d}\mathbf{x} $$

where the domain of integration is $\mathcal{X} = \mathbb{R}^d \ni \mathbf{x}$. The integral has a closed form solution:

$$ I(\pmb{\Sigma}) = (2\pi)^{d/2} \sqrt{\det(\pmb{\Sigma})}$$

The goal of this post is to estimate $I(\pmb{\Sigma})$ for different randomly generated matrices $\pmb{\Sigma}$ using numerical methods.

High-dimensional integral approximation

Our goal is to compute an integral $\int_\mathcal{X} g(\mathbf{x})\mathrm{d}\mathbf{x}$ over a multi-dimensional domain $\mathbb{R}^d \supseteq \mathcal{X} \ni \mathbf{x}$. In order to keep things simple, we'll use naïve Monte-Carlo integration. To be more specific, we will compute the integral on subsets $\Omega_k$ of the domain $\mathcal{X}$ whose volume $\mathrm{Vol}(\Omega_k)$ tends to $\mathrm{Vol}(\mathcal{X}) = \infty$. For each index $k$, we set $\Omega_k$ as the $d$-dimensional cuboïd $\Omega_k = [-k, +k]^d$ and sample $N = 10^4$ points using numpy's uniform generator np.random.rand. As the rand routine outputs a uniform RV $u \in [0,1]$ we shift and scale $u$ as $k(2u – 1) = v \ni [-k, +k]$. Below is a scatter of $10^3$ points for $d=2$ and $k=12$.

scatter two dim

The volume of the subsets is simple to compute: $\mathrm{Vol}(\Omega_k) = (2d)^k$. The naïve monte-carlo approximation for the integral $\int_{\Omega_k} g(\mathbf{x})\mathrm{d}\mathbf{x}$ is performed in two steps. First, sample $N$ points $\mathbf{x}_1, \dots \mathbf{x}_N$ uniformly on $\Omega_k$, then compute the empirical average using the function values $g(\mathbf{x}_i)$:

$$\int_{\Omega_k} g(\mathbf{x})\mathrm{d}\mathbf{x} \approx \mathrm{Vol}(\Omega_k) \dfrac{1}{N}\sum_{i=1}^N g(\mathbf{x}_i) = \dfrac{(2d)^k}{N}\sum_{i=1}^N g(\mathbf{x}_i)$$

Let's apply this to a randomly generated $2\times 2$ matrix

$$ \mathbf{\Lambda} = \pmb{\Sigma}^{-1} = \begin{bmatrix} \lambda_{11} & \lambda_{12} \\ \lambda_{21} & \lambda_{22} \\ \end{bmatrix} = \begin{bmatrix} 7.722 & 2.774\\ 2.774 & 1.078 \\ \end{bmatrix} $$

by integrating the function $g(\mathbf{x}) = \exp\bigg(-\frac12 \mathbf{x}^\mathsf{T}\mathbf{\Lambda} \mathbf{x}\,\bigg)$.

The results for different values of $k$ are plotted below:

naive mc integral

We can see the approximation oscillates around the true value (the dashed red line) for small $k$, and then become increasingly close to $0$ for large $k$. This is because the function $g$ has most of its mass around the origin, as a contour plot indicates:

contour ex func

We will therefore proceed differently.

Subset sampling for naive monte-carlo

Discarding the points sampled from $\Omega_{k-1}$ when estimating the integral on $\Omega_k$ isn't very efficient, even more so because $\Omega_{k-1} \subset \Omega_{k}$. We then change our approach: instead of discarding the points sampled from $\Omega_{k-1}$ and uniformly sample from $\Omega_{k}$, we keep the $N$ points from $\Omega_{k}$ and sample $N$ new points from $\Omega_{k}\setminus \Omega_{k-1}$.

Instead of $N$ points at step $k$, we instead have $(k-1)N$ points stored and $N$ new points. To be added.

Randomly symmetric PSD matrices

In order to sample from random PSD matrices, we first define a distribution $P$ on the matrix entries. We proceed to build a PSD matrix $\mathbf{\Lambda}$ using the steps

\begin{align} \mathbf{A}\,=\,&\left[ a_{ij} \right]_{ij}\sim P^{n\times n} \\ &\mathbf{\Lambda} \gets \mathbf{A}\mathbf{A}^\mathsf{T} \\ \end{align}

As an example, we plot the eigenvalues of $2\times 2$ matrices obtained by sampling from a normal distribution $P=\mathcal{N}$.