May 2023 – H. Paul Keeler

In a previous post, I covered a simple but much used method for simulating random variables or, rather, generating random variates. To simulate a random variable, the method requires, in an easy fashion, calculating the inverse of its cumulative distribution function. But you cannot always do that.

In lieu of this, the great John von Neumann wrote in a 1951 paper that you can sample a sequence of values from another probability distribution, accepting only the values that meet a certain condition based on this other distribution and the desired distribution, while rejecting all the others. The accepted values will follow the desired probability distribution. This method of simulation or sampling is called the rejection method, the acceptance method, and it has even the double-barrelled name the acceptance-rejection (AR) method.

Details

Let $X$ be a continuous random variable with a (probability) density $p(x)$, which is the derivative of its cumulative probability distribution $P(X\leq x)$. The density $p(x)$ corresponds to the desired or target distribution from which we want to sample. For whatever reason, we cannot directly simulate the random variable $X$. (Maybe we cannot use the inverse method because $P(X\leq x)$ is too complicated.)

The idea that von Newman had was to assume that we can easily simulate another random variable, say, $Y$ with the (probability) density $q(x)$. The density $q(x)$ corresponds to a proposal distribution that we can sample (by using, for example, the inverse method).

Now we further assume that there exists some finite constant $M>0$ such that we can bound $p(x)$ by $Mq(x)$, meaning

$$ p(x) \leq M q(x), \text{ for all } x . $$

Provided this, we can then sample the random variable $Y$ and accept a value of it (for a value of $X$) with probability

$$\alpha = \frac{p(Y)}{Mq(Y)}.$$

If the sampled value of $Y$ is not accepted (which happens with probability $1-\alpha$), then we must repeat this random experiment until a sampled value of $Y$ is accepted.

Algorithm

We give the pseudo-code for the acceptance-rejection method suggested by von Neumann.

Random variable $X$ with density $p(x)$

Sample a random variable $Y$ with density $q(x)$, giving a sample value $y$.

Calculate the acceptance probability $\alpha = \frac{p(y)}{Mq(y)}$.

Sample a uniform random variable $U\sim U(0,1)$, giving a sample value $u$.

Return the value $y$ (for the value of $X$) if $u\leq \alpha$, otherwise go to Step 1 and repeat.

As covered in a previous post, Steps 3 and 4 are equivalent to accepting the value $y$ with probability $\alpha$.

Point process application

In the context of point processes, this method is akin to thinning point processes independently. This gives a method for positioning points non-uniformly by first placing the points uniformly. The method then thins points based on the desired intensity function. As I covered in a previous post, this is one way to simulate an inhomogeneous (or nonhomogeneous) Poisson point process.

Efficiency

Basic probability theory tells us that the number of experiment runs (Steps 1 to 3) until acceptance is a geometric variable with parameter $\alpha$. On average the acceptance(-rejection) method will take $1/\alpha$ number of simulations to sample one value of the random $X$ of the target distribution. The key then is to make the proposal density $q(x)$ as small as possible (and adjust $M$ accordingly), while still keeping the inequality $p(x) \leq M q(x)$.

Higher dimensions

The difficulty of the acceptance(-rejection) method is finding a good proposal distribution such that the product $Mq(x)$ is not much larger than the target density $p(x)$. In one-dimension, this can be often done, but in higher dimensions this becomes increasingly difficult. Consequently, this method is typically not used in higher dimensions.

Another approach with an acceptance step is the Metropolis-Hastings method, which is the quintessential Markov chain Monte Carlo (MCMC) method. This method and its cousins have become exceedingly popular, as they give ways to simulate collections of dependent random variables that have complicated (joint) distributions.

Proof outline

The joint probability density of two independent variables is simply the product of the two individual probabilities densities. Then the joint density of two standard normal variables is

$$\begin{align}f_{X,Y}(x,y)&=\left[\frac{1}{\sqrt{2\pi}}e^{-x^2/2}\right]\left[\frac{1}{\sqrt{2\pi}}e^{-y^2/2}\right]\\&=\frac{1}{{2\pi}}e^{-(x^2+y^2)/2}\,.\end{align}$$

Now it requires a change of coordinates in two dimensions (from Cartesian to polar) using a Jacobian determinant, which in this case is $|J(\theta,r)=r|$.¹ giving a new joint probability density

$$f_{\Theta,R}(\theta,r)=\left[\frac{1}{\sqrt{2\pi}}\right]\left[ r\,e^{-r^2/2}\right]\,.$$

Now we just identify the two probability densities. The first probability density corresponds to a uniform variable on $[0, 2\pi]$, whereas the second is that of a Rayleigh variable with parameter $\sigma=1$. Of course the proof works in the opposite direction because the transformation (between Cartesian and polar coordinates) is a one-to-one function.

Algorithm

Here’s the Box-Muller method for simulating two (independent) standard normal variables with two (independent) uniform random variables.

Two (independent) standard normal random variable $Z_1$ and $Z_2$

Generate two (independent) uniform random variables $U_1\sim U(0,1)$ and $U_2\sim U(0,1)$.

Return $Z_1=\sqrt{-2\ln U_1}\cos(2\pi U_2)$ and $Z_2=\sqrt{-2\ln U_1}\sin(2\pi U_2)$.

The method effectively samples a uniform angular variable $\Theta=2\pi U_2$ on the interval $[0,2\pi]$ and a radial variable $R=\sqrt{-2\ln U_1}$ with a Rayleigh distribution.

The algorithm produces two independent standard normal variables. Of course, as many of us learn in high school, if $Z$ is a standard normal variable, then the random variable $X=\sigma Z +\mu$ is a normal variable with mean $\mu$ and standard deviation $\sigma>0$ .

The fall of the Box-Muller method

Sadly this method was typically not used, as historically computer processors were slow at doing calculations involving the necessary mathematical functions. To avoid these functions researchers developed and employed other methods such as the ziggurat algorithm.

Also, although processors can now do such calculations much faster, many languages, not just scientific ones, come with functions for generating normal variables. Consequently, there had not been much need in implementing this method.

Update: The return of the Box-Muller method

The above conventional wisdom has changed in recent years as processors can now (on a hardware level) readily evaluate such functions. (I had been waiting to see if certain libraries would be re-written by using the Box-Muller method, but why bother if the old ones work so well?) When I used the term “processors”, I had central processor units (CPUs) in mind, but in recent years graphically processor units (GPUs) have become widely popular.

In a comment on this post, it pointed out that the Box-Muller method is the preferred choice for GPUs, as evidenced by its implementation in Nvidia’s CUDA library. The reason is GPUs do not handle well loops and branches in algorithms, so you should use methods that avoid these algorithmic steps. And the Box-Muller method is one that does just that.

The NVDIA website says:

Because GPUs are so sensitive to looping and branching, it turns out that the best choice for the Gaussian transform is actually the venerable Box-Muller transform

Month: May 2023

The acceptance(-rejection) method for simulating random variables

Details

Algorithm

Point process application

Efficiency

Higher dimensions

Further reading

The Box-Muller method for simulating normal variables

Proof outline

Algorithm

The fall of the Box-Muller method

Update: The return of the Box-Muller method

Further reading

Websites

Papers

Books