Non-Asymptotic Monte Carlo Approximations
With a little bit of rigor
By Christian | August 02, 2020
$\newcommand{\Expect}[2][]{\mathbb{E}_{#1}\left(#2\right)}$Suppose we are given some region $\Omega \subseteq \mathbb{R}^d$ where we would like to integrate a function $f(\bv{x})$. Suppose there exists a $d$-dimensional hyperrectangle $\mathcal{H}$, also referred to as a $d$-orthotope, such that $\Omega \subseteq \mathcal{H}$ and $\mathcal{H} = \bigtimes_{i=1}^d [s_i, t_i]$. If we define $\bv{1}_{A}(\bv{x})$ as an indicator function that returns 1 if and only if $\bv{x} \in A$, then we can write out integral in the following manner
\begin{align*} I &= \int_{\Omega} f(\bv{x}) d\bv{x} \\ &= \int_{\Omega} 1 \cdot f(\bv{x}) d\bv{x} + \int_{\mathcal{H} \setminus \Omega} 0 \cdot f(\bv{x}) d\bv{x} \\ &= \int_{\Omega} \bv{1}_{\Omega}(\bv{x}) f(\bv{x}) d\bv{x} + \int_{\mathcal{H} \setminus \Omega} \bv{1}_{\Omega}(\bv{x}) f(\bv{x}) d\bv{x} \\ &= \int_{\mathcal{H}} \bv{1}_{\Omega}(\bv{x}) f(\bv{x}) d\bv{x} \\ &= V(\mathcal{H}) \int_{\mathcal{H}} \bv{1}_{\Omega}(\bv{x}) f(\bv{x}) V(\mathcal{H})^{-1} d\bv{x} \tag{use $1 = \frac{V(\mathcal{H})}{V(\mathcal{H})}$}\\ &= V(\mathcal{H}) \Expect[\bv{x} \sim U(\Omega)]{\bv{1}_{\Omega}(\bv{x}) f(\bv{x}) } \end{align*}where $p(\bv{x}) := V(\mathcal{H})^{-1}$ is the uniform density function corresponding to the uniform distribution $U(\mathcal{H})$. Now consider the random variable $Y$ distributed via $U(\mathcal{H})$ where $Y(\bv{x}) = f(\bv{x})$ if and only if $\bv{x} \in \Omega$. This random variable has the same value as $\bv{1}_{\Omega}(\bv{x}) f(\bv{x})$, implying that
\begin{align*} I &= V(\mathcal{H}) \Expect[\bv{x} \sim p]{\bv{1}_{\Omega}(\bv{x}) f(\bv{x}) } \\ &= V(\mathcal{H}) \Expect{Y} \end{align*}Now suppose we have a set of $n$ random variables $X_1, \cdots, X_n$ that are independent copies of $Y$ and define $X := \frac{1}{n} \sum_{i=1}^n X_i$. Notice that
\[\Expect{X} = \frac{1}{n} \sum_{i=1}^n \Expect{X_i} = \frac{n \Expect{Y}}{n} = \Expect{Y}\]This result actually gives us some insight for a randomized algorithm that can approximate the integral in question! In particular, suppose we randomly sample $n$ points $\lbrace \bv{x}_i \rbrace_{i=1}^n$ from $\mathcal{H}$ uniformly and independently. We will then maintain a sum where for each $\bv{x}_i \in \Omega$, we add $f(\bv{x}_i)$ to that sum and otherwise add nothing. After doing this for each point, we divide the result by $n$ and multiply by $V(\mathcal{H})$ to get our integral estimate $\hat{I}$. This algorithm is compactly described below:
$\hat{I} := 0$ initialize integral estimate to $0$
- Randomly sample $\bv{x}$ from $U(\mathcal{H})$
- $\hat{I} \leftarrow \hat{I} + \frac{\bv{1}_{\Omega}(\bv{x})f(\bv{x})}{n}$
$\hat{I} \leftarrow V(\mathcal{H}) \hat{I}$
Now as one should ask, how does this randomized algorithm stack up against the true integral value? Can we guarantee with any sort of confidence a desired accuracy? With that, we need to state and make use of the following well known result, Hoeffding's Inequality.
In our case, suppose that in the finite domain $\Omega$, $M_L$ is the smallest value $f$ attains and similarly $M_U$ is the largest value $f$ attains. This implies that $M_L \leq X_i \leq M_U$ for any $i$, which gives us by the above version of Hoeffding's Inequality that
\begin{align*} \Prob{|\hat{I} - I| \geq \epsilon} &= \Prob{|V(\mathcal{H}) X - V(\mathcal{H}) \Expect{X}| \geq \epsilon } \\ &= \Prob{|X - \Expect{X}| \geq \frac{\epsilon}{V(\mathcal{H})} } \\ &\leq 2 \exp{\left(- \frac{2 n \epsilon^2}{V(\mathcal{H})^2 \left(M_U - M_L\right)^2}\right)} \end{align*}The above can be viewed as an error probability, letting us know the worst cases chance we have of achieving an error larger than $\epsilon$ between the true integral value and our estimate. Suppose we wish for our error probability to be at most $\delta$. This implies that $n$, $\epsilon$, and $\delta$ must satisfy the following inequality
\[2 \exp{\left(- \frac{2 n \epsilon^2}{V(\mathcal{H})^2 \left(M_U - M_L\right)^2}\right)} = \delta \]If we assume $n$ and $\delta$ are fixed, we can solve the above equality for $\epsilon$, finding
\[\epsilon = V(\mathcal{H}) (M_u - M_L)\sqrt{\frac{1}{2n}\log{\left(\frac{2}{\delta}\right)}}\]which tells us that with probability at least $1 - \delta$, the absolute error between the exact and estimated integral values are
\[|\hat{I} - I| < V(\mathcal{H}) (M_u - M_L)\sqrt{\frac{1}{2n}\log{\left(\frac{2}{\delta}\right)}}\]This inequality gives us some interesting insights. We can see that while the error decreases at a rate of $\sqrt{n}$, we also have a $\sqrt{\log\left(1/\delta\right)}$ dependence in the error that grows as we shrink our maximum allowable error probability $\delta$. Now if we assume $\epsilon$ and $\delta$ are fixed, we can solve the equality for $n$, giving us
\[n = \frac{1}{2}\epsilon^{-2}V(\mathcal{H})^2 \left(M_U - M_L\right)^2 \log{\left(\frac{2}{\delta}\right)}\]This value for $n$ corresponds to how large of a sample we would need to ensure our maximum estimation error $\epsilon$ and maximum error probability $\delta$ are both satisfied. You can view this result as the sample complexity of our algorithm, which tells us that it has a polynomial dependence on both $\frac{1}{\epsilon}$ and $\log(1/\delta)$. This polynomial dependence in turn tells us that the smaller we choose $\epsilon$ and $\delta$, the more samples we should expect to use to satisfy our two error requirements.