Non-Asymptotic Monte Carlo Approximations

With a little bit of rigor

By Christian | August 02, 2020

$\newcommand{\Expect}[2][]{\mathbb{E}_{#1}\left(#2\right)}$Suppose we are given some region $\Omega \subseteq \mathbb{R}^d$ where we would like to integrate a function $f(\bv{x})$. Suppose there exists a $d$-dimensional hyperrectangle $\mathcal{H}$, also referred to as a $d$-orthotope, such that $\Omega \subseteq \mathcal{H}$ and $\mathcal{H} = \bigtimes_{i=1}^d [s_i, t_i]$. If we define $\bv{1}_{A}(\bv{x})$ as an indicator function that returns 1 if and only if $\bv{x} \in A$, then we can write out integral in the following manner

\begin{align*} I &= \int_{\Omega} f(\bv{x}) d\bv{x} \\ &= \int_{\Omega} 1 \cdot f(\bv{x}) d\bv{x} + \int_{\mathcal{H} \setminus \Omega} 0 \cdot f(\bv{x}) d\bv{x} \\ &= \int_{\Omega} \bv{1}_{\Omega}(\bv{x}) f(\bv{x}) d\bv{x} + \int_{\mathcal{H} \setminus \Omega} \bv{1}_{\Omega}(\bv{x}) f(\bv{x}) d\bv{x} \\ &= \int_{\mathcal{H}} \bv{1}_{\Omega}(\bv{x}) f(\bv{x}) d\bv{x} \\ &= V(\mathcal{H}) \int_{\mathcal{H}} \bv{1}_{\Omega}(\bv{x}) f(\bv{x}) V(\mathcal{H})^{-1} d\bv{x} \tag{use $1 = \frac{V(\mathcal{H})}{V(\mathcal{H})}$}\\ &= V(\mathcal{H}) \Expect[\bv{x} \sim U(\Omega)]{\bv{1}_{\Omega}(\bv{x}) f(\bv{x}) } \end{align*}

where $p(\bv{x}) := V(\mathcal{H})^{-1}$ is the uniform density function corresponding to the uniform distribution $U(\mathcal{H})$. Now consider the random variable $Y$ distributed via $U(\mathcal{H})$ where $Y(\bv{x}) = f(\bv{x})$ if and only if $\bv{x} \in \Omega$. This random variable has the same value as $\bv{1}_{\Omega}(\bv{x}) f(\bv{x})$, implying that

\begin{align*} I &= V(\mathcal{H}) \Expect[\bv{x} \sim p]{\bv{1}_{\Omega}(\bv{x}) f(\bv{x}) } \\ &= V(\mathcal{H}) \Expect{Y} \end{align*}

Now suppose we have a set of $n$ random variables $X_1, \cdots, X_n$ that are independent copies of $Y$ and define $X := \frac{1}{n} \sum_{i=1}^n X_i$. Notice that

\[\Expect{X} = \frac{1}{n} \sum_{i=1}^n \Expect{X_i} = \frac{n \Expect{Y}}{n} = \Expect{Y}\]

This result actually gives us some insight for a randomized algorithm that can approximate the integral in question! In particular, suppose we randomly sample $n$ points $\lbrace \bv{x}_i \rbrace_{i=1}^n$ from $\mathcal{H}$ uniformly and independently. We will then maintain a sum where for each $\bv{x}_i \in \Omega$, we add $f(\bv{x}_i)$ to that sum and otherwise add nothing. After doing this for each point, we divide the result by $n$ and multiply by $V(\mathcal{H})$ to get our integral estimate $\hat{I}$. This algorithm is compactly described below:

$\left(f(\bv{x}), \bv{1}_{\Omega}(\bv{x}), \mathcal{H}, n\right)$

$\hat{I}$, estimate for integral value

$\hat{I} := 0$ initialize integral estimate to $0$

$i$ from $1$ to $n$

Randomly sample $\bv{x}$ from $U(\mathcal{H})$
$\hat{I} \leftarrow \hat{I} + \frac{\bv{1}_{\Omega}(\bv{x})f(\bv{x})}{n}$

$\hat{I} \leftarrow V(\mathcal{H}) \hat{I}$

$\hat{I}$

Now as one should ask, how does this randomized algorithm stack up against the true integral value? Can we guarantee with any sort of confidence a desired accuracy? With that, we need to state and make use of the following well known result, Hoeffding's Inequality.

Suppose we have a collection of independent random variables $X_1, X_2, \cdots, X_n$ where $a_i \leq X_i \leq b_i$ for all $i$. Then for $X = \frac{1}{n} \sum_{i=1}^n X_i$, we have that \[ \Prob{\left| X - \Expect{X} \right| \geq t} \leq 2 \exp\left(-\frac{2 n^2 t^2}{\sum_{i=1}^n (b_i - a_i)^2}\right)\]

In our case, suppose that in the finite domain $\Omega$, $M_L$ is the smallest value $f$ attains and similarly $M_U$ is the largest value $f$ attains. This implies that $M_L \leq X_i \leq M_U$ for any $i$, which gives us by the above version of Hoeffding's Inequality that

\begin{align*} \Prob{|\hat{I} - I| \geq \epsilon} &= \Prob{|V(\mathcal{H}) X - V(\mathcal{H}) \Expect{X}| \geq \epsilon } \\ &= \Prob{|X - \Expect{X}| \geq \frac{\epsilon}{V(\mathcal{H})} } \\ &\leq 2 \exp{\left(- \frac{2 n \epsilon^2}{V(\mathcal{H})^2 \left(M_U - M_L\right)^2}\right)} \end{align*}

The above can be viewed as an error probability, letting us know the worst cases chance we have of achieving an error larger than $\epsilon$ between the true integral value and our estimate. Suppose we wish for our error probability to be at most $\delta$. This implies that $n$, $\epsilon$, and $\delta$ must satisfy the following inequality

\[2 \exp{\left(- \frac{2 n \epsilon^2}{V(\mathcal{H})^2 \left(M_U - M_L\right)^2}\right)} = \delta \]

If we assume $n$ and $\delta$ are fixed, we can solve the above equality for $\epsilon$, finding

\[\epsilon = V(\mathcal{H}) (M_u - M_L)\sqrt{\frac{1}{2n}\log{\left(\frac{2}{\delta}\right)}}\]

which tells us that with probability at least $1 - \delta$, the absolute error between the exact and estimated integral values are

\[|\hat{I} - I| < V(\mathcal{H}) (M_u - M_L)\sqrt{\frac{1}{2n}\log{\left(\frac{2}{\delta}\right)}}\]

This inequality gives us some interesting insights. We can see that while the error decreases at a rate of $\sqrt{n}$, we also have a $\sqrt{\log\left(1/\delta\right)}$ dependence in the error that grows as we shrink our maximum allowable error probability $\delta$. Now if we assume $\epsilon$ and $\delta$ are fixed, we can solve the equality for $n$, giving us

\[n = \frac{1}{2}\epsilon^{-2}V(\mathcal{H})^2 \left(M_U - M_L\right)^2 \log{\left(\frac{2}{\delta}\right)}\]

This value for $n$ corresponds to how large of a sample we would need to ensure our maximum estimation error $\epsilon$ and maximum error probability $\delta$ are both satisfied. You can view this result as the sample complexity of our algorithm, which tells us that it has a polynomial dependence on both $\frac{1}{\epsilon}$ and $\log(1/\delta)$. This polynomial dependence in turn tells us that the smaller we choose $\epsilon$ and $\delta$, the more samples we should expect to use to satisfy our two error requirements.