The acceptance(-rejection) method for simulating random variables

In a previous post, I covered a simple but much used method for simulating random variables or, rather, generating random variates. To simulate a random variable, the method requires, in an easy fashion, calculating the inverse of its cumulative distribution function. But you cannot always do that.

In lieu of this, the great John von Neumann wrote in a 1951 paper that you can sample a sequence of values from another probability distribution, accepting only the values that meet a certain condition based on this other distribution and the desired distribution, while rejecting all the others. The accepted values will follow the desired probability distribution. This method of simulation or sampling is called the rejection method, the acceptance method, and it has even the double-barrelled name the acceptance-rejection (AR) method.

Details

Let \(X\) be a continuous random variable with a (probability) density \(p(x)\), which is the derivative of its cumulative probability distribution \(P(X\leq x)\). The density \(p(x)\) corresponds to the desired or target distribution from which we want to sample. For whatever reason, we cannot directly simulate the random variable \(X\). (Maybe we cannot use the inverse method because \(P(X\leq x)\) is too complicated.)

The idea that von Newman had was to assume that we can easily simulate another random variable, say, \(Y\) with the (probability) density \(q(x)\). The density \(q(x)\) corresponds to a proposal distribution that we can sample (by using, for example, the inverse method).

Now we further assume that there exists some finite constant \(M>0\) such that we can bound \(p(x)\) by \(Mq(x)\), meaning

$$ p(x) \leq M q(x), \text{ for all } x . $$

Provided this, we can then sample the random variable \(Y\) and accept a value of it (for a value of \(X\)) with probability

$$\alpha = \frac{p(Y)}{Mq(Y)}.$$

If the sampled value of \(Y\) is not accepted (which happens with probability \(1-\alpha\)), then we must repeat this random experiment until a sampled value of \(Y\) is accepted.

Algorithm

We give the pseudo-code for the acceptance-rejection method suggested by von Neumann.

Random variable \(X\) with density \(p(x)\)

  1. Sample a random variable \(Y\) with density \(q(x)\), giving a sample value \(y\).
  2. Calculate the acceptance probability \(\alpha = \frac{p(y)}{Mq(y)}\).
  3. Sample a uniform random variable \(U\sim U(0,1)\), giving a sample value \(u\).
  4. Return the value \(y\) (for the value of \(X\)) if \(u\leq \alpha\), otherwise go to Step 1 and repeat.

As covered in a previous post, Steps 3 and 4 are equivalent to accepting the value \(y\) with probability \(\alpha\).

Point process application

In the context of point processes, this method is akin to thinning point processes independently. This gives a method for positioning points non-uniformly by first placing the points uniformly. The method then thins points based on the desired intensity function. As I covered in a previous post, this is one way to simulate an inhomogeneous (or nonhomogeneous) Poisson point process.

Efficiency

Basic probability theory tells us that the number of experiment runs (Steps 1 to 3) until acceptance is a geometric variable with parameter \(\alpha\). On average the acceptance(-rejection) method will take \(1/\alpha\) number of simulations to sample one value of the random \(X\) of the target distribution. The key then is to make the proposal density \(q(x)\) as small as possible (and adjust \(M\) accordingly), while still keeping the inequality \(p(x) \leq M q(x)\).

Higher dimensions

The difficulty of the acceptance(-rejection) method is finding a good proposal distribution such that the product \(Mq(x)\) is not much larger than the target density \(p(x)\). In one-dimension, this can be often done, but in higher dimensions this becomes increasingly difficult. Consequently, this method is typically not used in higher dimensions.

Another approach with an acceptance step is the Metropolis-Hastings method, which is the quintessential Markov chain Monte Carlo (MCMC) method. This method and its cousins have become exceedingly popular, as they give ways to simulate collections of dependent random variables that have complicated (joint) distributions.

Further reading

The original paper where the acceptance(-rejection) method appears (on page 769 in the right-hand column) is:

  • von Neumann, Various techniques used in connection with random digits, 1951.

The usual books on stochastic simulations and Monte Carlo methods will detail this method. For example, see the book by Devroye (Section II.3) or the more recent Handbook of Monte Carlo Methods (Section 3.1.5) by Kroese, Taimre and Botev. The book Stochastic Simulation: Algorithms and Analysis by Asmussen and Glynn also covers the method in Section 2b.

Other books include those by Fishman (Section 8.5) and Gentle (Section 4.5) respectively.

The Box-Muller method for simulating normal variables

In the previous post, I covered a simple but much used method for simulating random variables or, rather, generating random variates. To simulate a random variable, the method requires writing down, in a tractable manner, the inverse of its cumulative distribution function.

But in the case of the normal (or Gaussian) distribution, there is no closed-form expression for its cumulative distribution function nor its inverse. This means you cannot, in an elegant and fast way at least, generate with the inverse method a single normal random variable using a single uniform random variable.

Interestingly, however, you can generate two (independent) normal variables with two (independent) uniform variables using the Box-Muller method, originally proposed by George Box and Mervin E. Muller. This approach uses the inverse method, but in practice it’s not used much (see below). I detail this method because I find it neat and it highlights the connection between the normal distribution and rotational symmetry, which has been the subject of some recent 3Blue1Brown videos on YouTube.

(This method was also used to simulate the Thomas point process, which I covered in a previous post.)

Incidentally, this connection is also mentioned in a previous post on simulating a Poisson point process on the surface of a sphere.  In that method post, Method 2 uses an observation by the Muller that normal random variables can be used to position points uniformly on spheres.

I imagine this method was first observed by transforming two normal variables, instead of guessing various distribution pairs that would work.  Then I’ll sketch the proof in the opposite direction, though it works in both directions.

Proof outline

The joint probability density of two independent variables is simply the product of the two individual probabilities densities. Then the joint density of two standard normal variables is

$$\begin{align}f_{X,Y}(x,y)&=\left[\frac{1}{\sqrt{2\pi}}e^{-x^2/2}\right]\left[\frac{1}{\sqrt{2\pi}}e^{-y^2/2}\right]\\&=\frac{1}{{2\pi}}e^{-(x^2+y^2)/2}\,.\end{align}$$

Now it requires a change of coordinates in two dimensions (from Cartesian to polar) using a Jacobian determinant, which in this case is \(|J(\theta,r)=r|\).1Alternatively, you can simply recall the so-called area element \(dA=r\,dr\,d\theta\).  giving a new joint probability density

$$f_{\Theta,R}(\theta,r)=\left[\frac{1}{\sqrt{2\pi}}\right]\left[ r\,e^{-r^2/2}\right]\,.$$

Now we just identify the two probability densities. The first probability density corresponds to a uniform variable on \([0, 2\pi]\), whereas the second is that of a Rayleigh variable with parameter \(\sigma=1\). Of course the proof works in the opposite direction because the transformation (between Cartesian and polar coordinates) is a one-to-one function.

Algorithm

Here’s the Box-Muller method for simulating two (independent) standard normal variables with two (independent) uniform random variables.

Two (independent) standard normal random variable \(Z_1\) and \(Z_2\)

  1. Generate two (independent) uniform random variables \(U_1\sim U(0,1)\) and \(U_2\sim U(0,1)\).
  2. Return \(Z_1=\sqrt{-2\ln U_1}\cos(2\pi U_2)\) and \(Z_2=\sqrt{-2\ln U_1}\sin(2\pi U_2)\).

The method effectively samples a uniform angular variable \(\Theta=2\pi U_2\) on the interval \([0,2\pi]\) and a radial variable \(R=\sqrt{-2\ln U_1}\) with a Rayleigh distribution.

The algorithm produces two independent standard normal variables. Of course, as many of us learn in high school, if \(Z\) is a standard normal variable, then the random variable \(X=\sigma Z +\mu\) is a normal variable with mean \(\mu\) and standard deviation \(\sigma>0\) .

The Box-Muller method is rarely used

Sadly this method isn’t typically used, as historically computer processors were slow at doing the calculations, so other methods were employed such as the ziggurat algorithm. Also, although processors can now do such calculations much faster, many languages, not just scientific ones, come with functions for generating normal variables. Consequently, there’s not much need in implementing this method.

Further reading

Websites

Many websites detail this method. Here’s a couple:

Papers

The original paper (which is freely available here) is:

  • 1958 – Box and Muller, A Note on the Generation of Random Normal Deviates.

Another paper by Muller connects normal variables and the (surface of a) sphere:

  • 1959 – Muller, A note on a method for generating points uniformly on n-dimensional spheres.

Books

Many books on stochastic simulation cover the Box-Muller method. The classic book by Devroye with the descriptive title Non-Uniform Random Variate Generation covers this method; see Section 4.1. There’s also the Handbook of Monte Carlo Methods by Kroese, Taimre and Botev; see Section 3.1.2.7. Ripley also covers the method (and he makes a remark with some snark that many people incorrectly spell it the Box-Müller method); see Section 3.1. The book Stochastic Simulation: Algorithms and Analysis by Asmussen and Glynn also mention the method and a variation by Marsaglia; see Examples 2.11 and 2.12.

The inverse method for simulating random variables

We will cover a simple but much used method for simulating random variables or, rather, random variates. Although the material here is found in introductory probability courses, it frequently forms the foundation of more advance stochastic simulation techniques, such as Markov chain Monte Carlo methods.

Details

The basics of probability theory tell us that any random variable can, in theory, be written as a function of a uniform random variable \(U\) distributed on the interval \((0,1)\), which is usually written as \(U\sim U(0,1)\). All one needs is the inverse of the cumulative distribution function of the desired random variable.

More specifically, let \(X\) be a random variable with a cumulative distribution function \(F(x)=\mathbb{P}(X\leq x)\). The function \(F\) is nondecreasing in \(x\), so its inverse can be defined as \(F^{-1}(y)=\inf\{x:F(x)\geq y\}\), which is known as the generalized inverse of \(F(x)\).

Some authors assume the minimum is attained so the infimum is replaced with the minimum, giving \(F^{-1}(y)=\min\{x:F(x)\geq y\}\).

In short, the following result is all that we need.

Transform of a uniform variable \(U\sim U(0,1)\)

For a uniform random variable \(U\sim U(0,1)\), the random variable \(F^{-1}(U)\) has the cumulative distribution function \(\mathbb{P}(F^{-1}(U)\leq x)=P(U\leq F(x))=F(x)\).

Algorithm

The above observation gives a method, which I like to call the direct method, for exactly simulating a random variable \(X\) with the (cumulative) distribution (function) \(F\).

Random variable \(X\) with distribution \(F\)

  1. Sample a uniform random variable \(U\sim U(0,1)\), giving a value \(u\).
  2. Return the value \(x=F^{-1}(u)\) as the sampled value of \(U\).

But this approach only works if we can write down (in a relatively straightforward way) the inverse \(F^{-1}\), which is usually not the case. This means you cannot generate, for example, simulate a normal random variable with a single uniform random variable by using just the inverse method, as we cannot write down the inverse of its cumulative distribution function.

(Interestingly, with two (independent) uniform random variables, we can use the transform method to simulate two (independent) normal (or Gaussian) random variables. This approach is called the Box-Muller transform, which I’ll cover in another post.)

Nevertheless, we can apply the inverse method to some useful distributions.

Examples

Warning: The following examples are only for illustration purposes. Except for the Bernoulli example, you would never use them in standard scientific languages such as MATLAB, Python (with NumPy), R or Julia, because those languages already have much better functions for simulating these and many other random variables (or variates). If you are writing a function in a language that lacks such functions, I would consult one of the references mentioned below. Although the inverse method is usually intuitive and elegant, it is often not the fastest method.

Bernoulli distribution

The simplest random variable is that with the Bernoulli distribution. With probability \(p\), a Bernoulli random variable \(X\) takes the value one. Otherwise, \(X\) takes the value zero (with probability \(1-p\)). This gives the (cumulative) distribution (function):

$$ F_B(x)=\begin{cases}
0 & \text{if } x < 0 \\
1 – p & \text{if } 0 \leq x < 1 \\
1 & \text{if } x \geq 1
\end{cases}$$

This gives a very simple way to simulate (or sample) a Bernoulli variable \(X\) with parameter \(p\).

Bernoulli random variable \(X\) with parameter \(p\)

  1. Sample a uniform random variable \(U\sim U(0,1)\), giving a value \(u\).
  2. If \(u\leq p\), return \(x=1\); otherwise return \(x=0\).
Application: Acceptance simulation methods

In random simulation code, whenever you do something (or not) with some probability \(p\) (or probability \(1-p\)), then the code will perform the above step. Consequently, you see this in the (pseudo-)code of many stochastic simulations with random binary choices, particularly schemes that have an acceptance step such the Metropolis-Hastings method and other Markov chain Monte Carlo (MCMC) methods.

In MCMC schemes, a random (binary) choice is proposed and it is accepted with a certain probability, say, \(\alpha \). This is the equivalent of accepting the proposed choice if some uniform random variable \(U\) meets the condition \(U\leq \alpha\).

This explains why the pseudo-code of the same algorithm can vary. Some pseudo-code will say accept with probability \(\alpha\), while other pseudo-code will say do if \(U\leq \alpha\). It’s two equivalent formulations.

Exponential distribution

The cumulative distribution function of an exponential variable with mean \(1/\lambda\) is \(F_E(x)= 1-e^{-\lambda x}\), which has the inverse \(F^{-1}_E(y)=-(1/\mu)\ln[1-y]\). We can use the fact that on the interval \((0,1)\), a uniform variable \(U\sim U(0,1)\) and \(1-U\) have the same distribution. Consequently, the random variables \(\ln [1-U]\) and \(\ln U\) are equal in distribution.

This gives a method for simulating exponential random variables.

Exponential random variable \(X\) with mean \(1/\lambda\)

  1. Sample a uniform random variable \(U\sim U(0,1)\), giving a value \(u\).
  2. Return \(x=-(1/\lambda)\ln u\).
Application: Poisson simulation method

Of course you can use this method to simulate exponential random variables, but it has another application. In a previous post on simulating Poisson variables, I mentioned that exponential random variables can be used to simulate a Poisson random variable in a direct (or exact) manner. That method is based on the distances between the points of a homogeneous Poisson point process (on the real line) being exponential random variables.

But this method is only suitable for low values of \(\lambda\), less than say fifteen.

Rayleigh distribution

The Rayleigh distribution is \(\mathbb{P}(X\leq x)= (x/\sigma^2)e^{-x^2/(2\sigma^2)}\), where \(\sigma>0\) is its scale parameter. The square root of an exponential variable with mean \(1/\lambda\) has a Rayleigh distribution with scale parameter \(\sigma=1/\sqrt{2\lambda}\).

Consequently, the generation method is similar to the previous example.

Rayleigh random variable \(Y\) with scale parameter \(\sigma>0\)

  1. Sample a uniform random variable \(U\sim U(0,1)\), giving a value \(u\).
  2. Return \(y=\sigma\sqrt{-2\ln u}\).

Other methods

The inverse method is intuitive and often succinct. But most functions for simulating random variables (or, more correctly, generating random variates) do not use these methods, as they are not fast under certain parameter regimes, such as large means. Consequently, other method are used such as approximations (with, say, normal random variables), such as the ones I outlined in this post on simulating Poisson random variables.

More complicated random systems, such as collections of dependent variables, can be simulated using Markov chain Monte Carlo methods, which is the direction we’ll take in a couple posts after this one.

Further reading

The inverse technique is in your favourite introductory book on probability theory. The specific examples here are covered in books on stochastic simulations and Monte Carlo methods. The classic book by Devroye covers these topics; see Section 2.1 and the examples (inverse method) in Chapter 2.

For a modern take, there’s the extensive Handbook of Monte Carlo Methods by Kroese, Taimre and Botev; see Section 3.1.1, Algorithm 4.1 (Bernoulli) and Algorithm 4.29 (exponential), and Algorithm 4.66 (Rayleigh). There’s also the book Stochastic Simulation: Algorithms and Analysis by Asmussen and Gynn; in Chapter 2, see Example 2.1 (Bernoulli) and Exampe 2.3 (exponential).

Other books include those by Fishman (Section 8.1) and Gentle (Section 4.1) respectively. (Warning: the book by Gentle has a mistake on page 105 in algorithm for sampling Bernoulli variables, as noted by the author. It should be \(1-\pi\) and not \(\pi\) when zero is returned for the sampled value of the Bernoulli variable.)

The Newman-Ziff algorithm for simulating percolation models

Imagine you’re trying to estimate some statistics about a certain percolation system. For example, estimating the probability of a given bond being open, which means connected, when a giant component forms. The system or model is too complicated for analytic results, which is true for all percolation models with the exception of a handful.

So you code up the percolation model in your favourite programming language, run a large number of stochastic simulations, collect the statistics, and then look at the results. Percolation systems are, by definition, large, but with fast computers, the simulation method works. You get your statistics.

Easy.

But that’s just for one parameter. What if you want to see what happens with a range of parameter values? Well, you do the same thing, but now just run the large number of simulations for a range of different parameter values, right?

No.

You could do that, and it would work. But it would be slow. How to make it faster?

Newman and Ziff proposed and implemented a fast percolation algorithm based on the simple ideas of keeping old simulations and using conditional probabilities.

Algorithm

The algorithm was presented in this paper:

  • 2000 – Newman, Ziff – Efficient Monte Carlo Algorithm and High-Precision Results for Percolation

There exists a preprint with a slightly different title.

  • 2001 – Newman, Ziff – A fast Monte Carlo algorithm for site or bond percolation.

Overview

The simulation method consists of three components.

Components of the Newman-Ziff algorithm

  1. Re-use parts of previous simulation outcomes by shuffling.
  2. Use fast union-find algorithms.
  3. Find a smooth curve of results by using conditional probabilities.

None of them are entirely revolutionary, but put together they form a fast (and popular) simulation method for studying (monotonic) percolation models.

1. Randomly sampling sequences

The main insight was to sample in an incremental fashion, adding an open bond to the percolation model, such as a lattice, during each simulation run, without  throwing away the previous simulation. More specifically, let’s say they sample a bond model with \(k\) open bonds during one simulation run. Then in the next simulation, use the same configuration, and simply add one bond.

This is a nice trick, but there’s no free lunch here. On one hand, they don’t waste results. But on the other hand, they do throw away independence (and ergodicity) by doing this. But perhaps the numerical results don’t care.

To do this step, Newman and Ziff use a random shuffling algorithm called the Fisher-Yates shuffle. But it’s not clear if Newmand and Ziff thought of this shuffling algorithm independently or not. They never call it by its name nor do they cite any work on this well-known shuffling method. Both MATLAB and Python (SciPy) have functions for doing random shuffles.

2. Using union-find

Imagine you have a collection of things. Some of these things are connected to other things in your collection. You want to understand how these clusters of connected things behave.

To find the clusters, looking for ultimately the biggest one, Newman and Ziff use a union-find method. There are different types of these alogrithms, often hinging upon the use of recursion. They were developed mostly in the 1980s, particularly in work by Robert Tarjan.

These methods are very efficient at finding clusters. The algorithm speed or, rather, complexity is given in relation to the inverse of an Ackermann function, which is a famously fast growing function. These methods rely upon the monotonicity of growing unions.

(If you have a non-monotonic percolating system, then unfortunately you can’t use these algorithms, which is the case for percolation based on signal-to-interference-plus-noise ratio (SINR). )

3. Filling in the parameter gaps

For any random sample of a percolation model, the number of open bonds or sites is a natural number. But the parameter of the percolation model, such as the probability of a site or bond being open, is a real number.

Newman and Ziff use a simple conditioning argument to fill the gaps.

For any statistic \(S\) of of a given percolation model, the Newman-Ziff algorithm finds the expected statistic conditioned on \(n\) number of open bonds (or equivalent objects in other models). In other words, the algorithm finds the expected conditional statistics \(E[S_n|N=n]\). Then we just need the probability of \(N\) being \(n\) as a function of the parameter we’re trying to vary, such as the bond probability \(p\). With this probability \(P(N=n)\), we immediately arrive at the expression for the statistic \(S\) with introductory probability

$$ E(S_n)=E_N [E(S_n|N=n)].$$

For a nice discrete model with \(m\) total, we get the expression

$$ E(S_n)=\sum _{n=1}^{m}[P(N=n)S_n].$$

But we know the probability (distribution) of \(N\), as it’s simply the binomial variable. Assuming there are \(m\) total bonds (or equivalent objects), then we arrive at

$$P(N=n)={m\choose n} p^n(1-p)^{m-n} .$$

Then the final statistic or result is simply

$$E(S_n)=\sum _{n=1}^{m} {m\choose n} p^n(1-p)^{m-n} S_n.$$

Code

I wrote the algorithm in MATLAB and Python. The code can be found here. I’ve included a script for plotting the results (for small square lattices).

MATLAB

It turns out that MATLAB doesn’t like recursion too much. Well, at least that’s my explanation for the slow results when I use the union-find method with recursion. So I used the union-find method without recursion.

Python

My Python code is mostly for illustration purposes. Check out this Python library. That website warns of the traps that multithreading in NumPy can pose.

C

In the other original paper, Newman and Ziff included C code of their algorithm. You can find it on Newman’s website, but I have also a copy of it (with added comments from me) located here.

Future work

I plan to implement this algorithm using the NVIDIA CUDA library. This is exactly the type of algorithm that we can implement and run using graphical processing units (GPUs), which is the entire point of the CUDA library.

At first glance, the spatial dependence of the problem suggests using texture memory to speed up the memory accessing on the GPUs. But I will explain why that’s probably not a good idea for this specific algorithm

Further reading

Newman wrote the very readable book:

  • Newman, Networks: An introduction, Oxford Academic.

Using more words than equations (not a criticism), Newman explains the logic behind the algorithm.

The Newman-Ziff algorithm exists in the Python library PyPercolate:

https://pypercolate.readthedocs.io/en/stable/newman-ziff.html

Union-find algorithms are now standard fare in computer science books. I recommend the book by Tarjan himself:

  • Tarjan, Data structures and network algorithms, SIAM.

Some of these algorithms are covered in standard computer science textbooks on algorithms. For example:

  • Cormen, Leiserson, Rivest, and Stein,  Introduction to Algorithms, The MIT Press;
  • Sedgewick and Wayne,  Algorithms, Addison-Wesley.

This website gives a quick introduction to union-find in Python:

https://www.delftstack.com/howto/python/union-find-in-python/

Summary: Poisson simulations

Here’s a lazy summary post where I list all the posts on various Poisson simulations. I’ve also linked to code, which is found on this online repository. The code is typically written in MATLAB and Python.

Posts

Poisson point processes

Some simulations of Poisson point processes are also covered in this post on the Julia programming language:

Checking simulations

Poisson line process

Poisson random variables

Simulating Poisson random variables in Fortran

The hint’s in the title. I wrote a simple function in Fortran for simulating (or sampling) Poisson random variables. (More precisely, I should say that the function generates Poisson variates.) I used the simple direct method. This method is based on the exponential inter-arrival times of the Poisson (stochastic) process.

My code should not be used for large Poisson parameter values (larger than, say, 20 or 30), as the code may be too slow. Other methods exist for larger parameter values, which I’ve discussed previously.

I just use the standard Fortran function random_number for generating (pseudo-)random numbers. I am not an expert in Fortran, but my Poisson function seems to work fine. I wrote and ran a simple test that estimates the first and second moments, which should match for Poisson variables.

My Fortran code is very similar to the code that I wrote in C and C#, which is located here. You should be able to run it on this website or similar ones that can compile Fortran (95) code.

Further reading

For various Poisson simulation methods, see the stochastic simulation books by Devroye (Section X.3) or Fishman (Section 8.16). There’s a free online version of Devroye’s book here. The book by Gentle (Section 5.2.8) also briefly covers Poisson variables.

In this post on generating Poisson variates, John D. Cook briefly discusses the direct method (for small Poisson parameter values), as well as a rejection method from a 1979 paper by Atkinson.

I wrote the Poisson code using Fortran 95. There are various books and websites on Fortran. The website tutorialspoint.com gives a good introduction to Fortran. You can also edit, compile and run your Fortran code there with its online Fortran editor. I found this short summary a good start. For alternative Fortran code of a Poisson generator, consult the classic book Numerical Recipes, though I believe the book versions only exist for Fortran 77 and Fortran 90.

Code

On this page I’ve only included the code of the functions for generating uniform and Poisson variates. The rest of the code, including the test, is located here.

!Poisson function -- returns a single Poisson random variable
function funPoissonSingle(lambda) result(randPoisson)
real, intent(in) :: lambda !input
real :: exp_lambda !constant for terminating loop
real :: randUni !uniform variable
real :: prodUni !product of uniform variables
integer :: randPoisson !Poisson variable

exp_lambda= exp(-lambda) 

!initialize variables
randPoisson = -1;
prodUni = 1;
do while (prodUni > exp_lambda)
   randUni = funUniformSingle() !generate uniform variable
   prodUni = prodUni * randUni; !update product
   randPoisson = randPoisson + 1 !increase Poisson variable
end do
end function

!Uniform function -- returns a standard uniform random variable
function funUniformSingle() result(randUni)
real randUni;
call random_seed
call random_number(randUni)

end function

Simulating a Poisson point process on a n-dimensional sphere

In the previous post I outlined how to simulate or sample a homogeneous Poisson point process on the surface of a sphere. Now I will consider a homogeneous Poisson point process on the \((n-1)-\) sphere, which is the surface of the Euclidean ball in \(n\) dimensions.

This is a short post because it immediately builds off the previous post. For positioning the points uniformly, I will use Method 2 from that post, which uses normal random variables, as it immediately gives a fast method in \(n\) dimensions.

I wrote this post and the code more for curiosity than any immediate application. But simulating a Poisson point process in this setting requires placing points uniformly on a sphere. And there are applications in that, such as Monte Carlo integration methods, as mentioned in this post, which nicely details different sampling methods.

Steps

As is the case for other shapes, simulating a Poisson point process requires two steps.

Number of points

The number of points of a Poisson point process on the surface of a sphere of radius \(r>0\) is a Poisson random variable. The mean of this random variable is \(\lambda S_{n-1}\), where \(S_{n-1}\) is the surface area of the sphere.  For a ball embedded in \(n\) dimension, the area of the corresponding sphere is given by

$$S_{n-1} = \frac{2 \pi ^{n/2}  }{\Gamma(n/2)} r^{n-1},$$

where \(\Gamma\) is the gamma function, which is a natural generalization of the factorial. In MATLAB, we can simply use the function gamma.  In Python, we need to use the SciPy function scipy.special. gamma.

Locations of points

For each point on the sphere, we generate \(n\) standard normal or Gaussian random variables, say, \(W_1, \dots, W_n\), which are independent of each other. These random variables are the Cartesian components of the random point. We rescale the components by the Euclidean norm, then multiply by the radius \(r\).

For \(i=1,\dots, n\), we obtain

$$X_i=\frac{rW_i}{(W_1^2+\cdots+W_n^2)^{1/2}}.$$

These are the Cartesian coordinates of a point uniformly scattered on a  sphere with radius \(r\) and a centre at the origin.

How does it work?

In the post on the circle setting, I gave a more detailed outline of the proof, where I said the method is like the Box-Muller transform in reverse. The joint density of the normal random variables is from a multivariate normal distribution with zero correlation. This joint density a function of the Cartesian equation for a sphere. This means the density is constant on the sphere, implying that the angle of the point \((W_1,\dots, W_n)\) will be uniformly distributed.

The vector formed from the normal variables \((W_1,\dots,W_n)\) is a random variable with a chi distribution.  But the final vector, which stretches from the origin to the point \((X_1,\dots,X_n)\), has length one, because we rescaled it with the Euclidean norm.

Code

The code for all my posts is located online here. For this post, the code in MATLAB and Python is here.

Further reading

I recommend this blog post, which discusses different methods for randomly placing points on spheres and inside spheres (or, rather, balls) in a uniform manner.  (Embedded in two dimensions, a sphere is a circle and a ball is a disk.)

Our Method 2 for positioning points uniformly, which uses normal variables, comes from the paper:

  • 1959, Muller, A note on a method for generating points uniformly on n-dimensional spheres.

Two recent works on this approach are the papers:

  • 2010, Harman and Lacko, On decompositional algorithms for uniform sampling from -spheres and -balls;
  • 2017, Voelker, Gosman, Stewart, Efficiently sampling vectors and coordinates.

Simulating a Poisson point process on a sphere

In this post I’ll describe how to simulate or sample a homogeneous Poisson point process on the surface of a sphere. I have already simulated this point process on a rectangle, triangle disk, and circle.

Of course, by sphere, I mean the everyday object that is the surface of a three-dimensional ball, where this two-dimensional object is often denoted by \(S^2\).  Mathematically, this is a generalization from a Poisson point process on a circle, which is slightly simpler than randomly positioning points on a disk.  I recommend reading those two posts first, as a lot of the material presented here builds off them.

I have not needed such a simulation in my own work, but I imagine there are many reasons why you would want to simulate a Poisson point process on a sphere. As some motivation, we can imagine these points on a sphere representing, say, meteorites or lightning hitting the Earth.

The generating the number of points is not difficult. The trick is positioning the points on the sphere in a uniform way.  As is often the case, there are various ways to do this, and I recommend this post, which lists the main ones.  I will use two methods that I consider the most natural and intuitive ones, namely using spherical coordinates and normal random variables, which is what I did in the post on the circle.

Incidentally, a simple modification allows you to scatter the points uniformly inside the sphere, but you would typically say ball in mathematics, giving a Poisson point process inside a ball; see below for details.

Steps

As always, simulating a Poisson point process requires two steps.

Number of points

The number of points of a Poisson point process on the surface of a sphere of radius \(r>0\) is a Poisson random variable with mean \(\lambda S_2\), where \(S_2=4\pi r^2\) is the surface area of the sphere. (In this post I give some details for simulating or sampling Poisson random variables or, more accurately, variates.)

Locations of points

For any homogeneous Poisson point process, we need to position the points uniformly on the underlying space, which is in this case is the sphere. I will outline two different methods for positioning the points randomly and uniformly on a sphere.

Method 1: Spherical coordinates

The first method is based on spherical coordinates \((\rho, \theta,\phi)\), where the radial coordinate \(\rho\geq 0\), and the angular coordinates \(0 \leq \theta\leq 2\pi\) and \(0\leq \phi \leq \pi\). The change of coordinates gives \(x=\rho\sin(\theta)\cos(\phi)\), \(y=\rho\sin(\theta)\sin(\phi)\), and \(z=\rho\cos(\phi)\).

Now we use Proposition 1.1 in this paper. For each point, we generate two uniform variables \(V\) and \(\Theta\) on the respective intervals \((-1,1)\) and \((0,2\pi)\). Then we place the point with the Cartesian coordinates

$$X =  r  \sqrt{1-V^2} \cos\Theta, $$

$$Y =  r  \sqrt{1-V^2}\sin\Theta, $$

$$ Z=r V. $$

This method places a uniform point on a sphere with a radius \(r\).

How does it work?

I’ll skip the precise details, but give some interpretation of this method. The random variable \(\Phi := \arccos V\) is the \(\phi\)-coordinate of the uniform point, which implies \(\sin \Phi=\sqrt{1-V^2}\), due to basic trigonometric identities.  The area element in polar coordinates is \(dA = \rho^2 \sin\phi d\phi d\theta \), which is constant with respect to \(\theta\). After integrating with respect to \(\phi\),  we see that the random variable \(V=\cos\Phi\) needs to be uniform (instead of \(\Phi\)) to ensure the point is uniformly located on the surface.

Method 2: Normal random variables

For each point, we generate three standard normal or Gaussian random variables, say, \(W_x\), \(W_y\), and \(W_z\), which are independent of each other. (The term standard here means the normal random variables have mean \(\mu =0\) and standard deviation \(\sigma=1\).)  The three random variables are the Cartesian components of the random point. We rescale the components by the Euclidean norm, then multiply by the radius \(r\), giving

$$X=\frac{rW_x}{(W_x^2+W_y^2+W_z^2)^{1/2}},$$

$$Y=\frac{rW_y}{(W_x^2+W_y^2+W_z^2)^{1/2}},$$

$$Z=\frac{rW_z}{(W_x^2+W_y^2+W_z^2)^{1/2}}.$$

These are the Cartesian coordinates of a point uniformly scattered on a  sphere with radius \(r\) and a centre at the origin.

How does it work?

The procedure is somewhat like the Box-Muller transform in reverse. In the post on the circle setting,  I gave an outline of the proof, which I recommend reading. The joint density of the normal random variables is from a multivariate normal distribution with zero correlation. This joint density is constant on the sphere, implying that the angle of the point \((W_x, W_y, W_z)\) will be uniformly distributed.

The vector formed from the normal variables \((W_x, W_y,W_z)\) is a random variable with a chi distribution.  We can see that the vector from the origin to the point \((X,Y,Z)\) has length one, because we rescaled it with the Euclidean norm.

Plotting

Depending on your plotting software, the points may more resemble points on an ellipsoid than a sphere due to the different scaling of the x, y and z axes. To fix this in MATLAB, run the command: axis square. In Python, it’s not straightforward to do this, as it seems to lack an automatic function, so I have skipped it.

Results

I have presented some results produced by code written in MATLAB and Python. The blue points are the Poisson points on the sphere. I have used a surface plot (with clear faces) to illustrate the underling sphere in black.

MATLAB

Python

Note: The aspect ratio in 3-D Python plots tends to squash the sphere slightly, but it is a sphere.

Code

The code for all my posts is located online here. For this post, the code in MATLAB and Python is here.  In Python I used the library mpl_toolkits for doing 3-D plots.

Poisson point process inside the sphere

Perhaps you want to simulate a Poisson point process inside the ball.  There are different ways we can do this, but I will describe just one way, which builds off Method 1 for positioning the points uniformly. (In a later post, I will modify Method 2, giving a way to uniformly position points inside the ball.)

For this simulation method, you need to make two simple modifications to the simulation procedure.

Number of points

The number of points of a Poisson point process inside a sphere of radius \(r>0\) is a Poisson random variable with mean \(\lambda V_3\), where \(V_3=4\pi r^3\) is the volume of the sphere.

Locations of points

We will modify Method 1 as outlined above. To sample the points uniformly in the sphere, you need to generate uniform variables on the unit interval \((0,1)\), take their cubic roots, and then, multiply them by the radius \(r\). (This is akin to the step of taking the square root in the disk setting.) The random variables for the angular coordinates are generated as before.

Further reading

I recommend this blog post, which discusses different methods for randomly placing points on spheres and inside spheres (or, rather, balls) in a uniform manner.  (Embedded in two dimensions, a sphere is a circle and a ball is a disk.)

Our Method 2 for positioning points uniformly, which uses normal variables, comes from the paper:

  • 1959, Muller, A note on a method for generating points uniformly on n-dimensional spheres.

This sampling method relies upon old observations that normal variables are connected to spheres and circles. I also found this post on a similar topic. Perhaps not surprisingly, the above paper is written by the same Muller behind the Box-Muller method for sampling normal random variables.

Update: The connection between the normal distribution and rotational symmetry has been the subject of some recent 3Blue1Brown videos on YouTube.

Here is some sample Python code for creating a 3-D scatter plot.

Simulating a Poisson point process on a circle

In this post, I’ll take a break from the more theoretical posts. Instead I’ll describe how to simulate or sample a homogeneous Poisson point process on a circle.  I have already simulated this point process on a rectangle, triangle and disk. In some sense, I should have done this simulation method before the disk one, as it’s easier to simulate. I recommend reading that post first, as the material presented here builds off it.

Sampling a homogeneous Poisson point process on a circle is rather straightforward.  It just requires using a fixed radius and uniformly choose angles from interval \((0, 2\pi)\). But the circle setting gives an opportunity to employ a different method for positioning points uniformly on circles and, more generally, spheres. This approach uses Gaussian random variables, and it becomes much more efficient when the points are placed on high dimensional spheres.

Steps

Simulating a Poisson point process requires two steps: simulating the random number of points and then randomly positioning each point.

Number of points

The number of points of a Poisson point process on circle of radius \(r>0\) is a Poisson random variable with mean \(\lambda C\), where \(C=2\pi r\) is the circumference of the circle.  You just need to be able to need to produce (pseudo-)random numbers according to a Poisson distribution.

To generate Poisson variables in MATLAB,  use the poissrnd function with the argument \(\lambda C\).  In Python, use either the scipy.stats.poisson or numpy.random.poisson function from the SciPy or NumPy libraries. (If you’re curious how Poisson simulation works, I suggest seeing this post for details on sampling Poisson random variables or, more accurately, variates.)

Locations of points

For a homogeneous Poisson point process, we need to uniformly position points on the underlying space, which is this case is a circle. We will look at two different ways to position points uniformly on a circle. The first is arguably the most natural approach.

Method 1: Polar coordinates

We use polar coordinates due to the nature of the problem. To position all the points uniformly on a circle, we simple generate uniform numbers on the unit interval \((0,1)\). We then multiply these random numbers by \(2\pi\).

We have generated polar coordinates for points uniformly located on the circle. To plot the points, we have to convert the coordinates back to Cartesian form by using the change of coordinates:  \(x=\rho\cos(\theta)\) and \(y=\rho\sin(\theta)\).

Method 2: Normal random variables

For each point, we generate two standard normal or Gaussian random variables, say, \(W_x\) and \(W_y\), which are independent of each other. (The term standard here means the normal random variables have mean \(\mu =0\) and standard deviation \(\sigma=1\).) These two random variables are the Cartesian components of a random point.  We then rescale the two values by the Euclidean norm, giving

$$X=\frac{W_x}{(W_x^2+W_y^2)^{1/2}},$$

$$Y=\frac{W_y}{(W_x^2+W_y^2)^{1/2}}.$$

These are the Cartesian coordinates of points uniformly scattered around a unit circle with centre at the origin. We multiply the two random values \(X\) and \(Y\) by the \(r>0\)  for a circle with radius \(r\).

How does it work?

The procedure is somewhat like the Box-Muller transform in reverse. I’ll give an outline of the proof. The joint density of the random variables \(W_x\) and \(W_y\) is that of the bivariate normal distribution with zero correlation, meaning it has the joint density

$$ f(x,y)=\frac{1}{2\pi}e^{[-(x^2+y^2)/2]}.$$

We see that the function \(f\) is a constant when we trace around any line for which \((x^2+y^2)\) is a constant, which is simply the Cartesian equation for a circle (where the radius is the square root of the aforementioned constant). This means that the angle of the point \((W_x, W_y)\) will be uniformly distributed.

Now we just need to look at the distance of the random point. The vector formed from the pair of normal variables \((W_x, W_y)\) is a Rayleigh random variable.  We can see that the vector from the origin to the point \((X,Y)\) has length one, because we rescaled it with the Euclidean norm.

Results

I have presented some results produced by code written in MATLAB and Python. The blue points are the Poisson points on the sphere. I have used a surface plot (with clear faces) in black to illustrate the underling sphere.

MATLAB

Python

Code

The code for all my posts is located online here. For this post, the code in MATLAB and Python is here.

Further reading

I recommend this blog post, which discusses different methods for randomly placing points on spheres and inside spheres (or, rather, balls) in a uniform manner.  (Embedded in two dimensions, a sphere is a circle and a ball is a disk.) A key paper on using normal variables is the following:

  • 1959, Muller, A note on a method for generating points uniformly on n-dimensional spheres.

As I mentioned in the post on the disk, the third edition of the classic book Stochastic Geometry and its Applications by Chiu, Stoyan, Kendall and Mecke details on page 54 how to uniformly place points on a disk.  It just requires a small modification for the circle.

Simulating Poisson random variables with large means in C

When sampling Poisson random variables, the method you use depends on the Poisson parameter, which coincides with its mean. The direct method, as I call it, is fine for small Poisson parameter values, but it becomes too slow as this parameter grows.

For large Poisson parameter values, researchers have proposed many other methods that are not slow. I coded up in Python and MATLAB (for now) two of these methods from the two respective papers:

  • 1979, Atkinson, The computer generation of Poisson random variables;
  • 1993, Hörmann, The transformed rejection method for generating Poisson random variable.

I already discussed these two methods in detail in a previous post, where you can find the code here.

The code I wrote is only suitable for large parameter values, where the meaning of large depends on the procedure being used, but it’s typically around 30. In other words, any code for generating Poisson variables should have an if-statement, using the direct method for small parameter values and another method, such as the ones above, for large parameter values.

Algorithms

Both algorithms are (acceptance-)rejection methods. I discuss this general method, first proposed by Neumann, in another post. In context of generating Poisson variables, these methods are, according to the book by Devroye (page 502), known for being relatively simple to implement and fast to execute.

Both algorithms require calculating the log of a factorial or, equivalently, a log of a gamma function, which is done by using approximations, such those of Stirling or Lanczos. I used the one by Stirling.

Algorithm PA  by Atkinson (1979)

The Algorithm PA proposed by Atkinson, among other methods, is a rejection method that uses a logistic distribution as the envelope distribution. (Often such algorithms use a normal distribution as the envelope distribution.)

Algorithm PTRS by Hörmann (1993)

Hörmann refers to the Algorithm PTRS method as a transformed rejection method. It uses the inverse method, which I covered in a previous post, and a rejection method.

I have only seen one implementation of this algorithm. It’s written in C for the Python library NumPy; see the code here.  You’ll notice my C code and that C code is very similar, modulus some logarithm identities.

Possible (small) error: I noticed that in that C code, on line 591, the implementation of step 2.0 of the PTRS Algorithm has a strictly less than condition, so it’s \(k <0\), whereas in the original paper (and hence my code),  the condition is \(k\leq 0\). Perhaps this possible error is insignificant, as the procedure is for large-valued Poisson random variables, so the \(k=0\) scenario rarely happens.

Code

I used just a moment ratio to test the code. A proper test would involve a chi-squared test, for example, which is what I did when I wrote the same algorithms in Python and MATLAB; see this post here.

Warning: My C code is only for illustration purposes. If you can use an industrial library, use that.

Algorithm PA  by Atkinson (1979)

/*
This code generates Poisson variates (or simulates Poisson variables).
using a method designed for large (>30) Poisson parameter values.

The generation method is Algorithm PA, a type of rejection method, from 
the paper:

1979 - Atkinson - "The Computer Generation of Poisson Random Variables"


Author: H. Paul Keeler, 2020.
Website: hpaulkeeler.com
Repository: github.com/hpaulkeeler/posts
*/

static unsigned int funPoissonPA(double mu)
{
    // precalculate some Poisson-parameter-dependent numbers
    double c = 0.767 - 3.36 / mu;
    double beta = pi / sqrt(3.0 * mu);
    double alpha = beta * mu;
    double k = log(c) - mu - log(beta);
    double log_mu = log(mu);

    // declare variables for the loop
    double U, x, V, y, logfac_n, lhs, rhs;
    unsigned int n;
    unsigned int randPoisson = 0; // initialize the Poisson random variable (or variate)
    bool booleContinue = true;
    while (booleContinue)
    {
        U = funUniform(); // generate first uniform variable
        x = (alpha - log((1.0 - U) / U)) / beta;

        if (x < -.5)
        {
            continue;
        }
        else
        {
            V = funUniform(); // generate second uniform variable
            n = floor(x + .5);
            y = alpha - beta * x;
            logfac_n = funLogFac(n);

            // two sides of an inequality condition
            lhs = y + log(V / (1.0 + exp(y)) / (1.0 + exp(y)));
            rhs = k + n * log_mu - logfac_n; // NOTE: uses log factorial n

            if (lhs <= rhs)
            {
                randPoisson = n;
                booleContinue = false;
                return randPoisson;
            }
            else
            {
                continue;
            }
        }
    }
    return randPoisson;
}

Algorithm PTRS by Hörmann (1993)

/*
This code generates Poisson variates (or simulates Poisson variables).
using a method designed for large (>10) Poisson parameter values.

The generation method is Algorthm PTRS, a type of rejection method, from
the paper:

1993 - Hörmann - "The transformed rejection method for generating Poisson
random variables"

Author: H. Paul Keeler, 2020.
Website: hpaulkeeler.com
Repository: github.com/hpaulkeeler/posts
*/

/*WARNING:
This code uses rand(), the standard pseudo-random number generating function in C,
which is known for producing inadequately random numbers.
Replace the function rand() in the function funUniformSingle with another standard uniform number generator.
 */

static unsigned int funPoissonPTRS(double mu)
{
    // precalculate some Poisson-parameter-dep}ent numbers
    double b = 0.931 + 2.53 * sqrt(mu);
    double a = -0.059 + 0.02483 * b;
    double vr = 0.9277 - 3.6224 / (b - 2);
    double one_over_alpha = 1.1239 + 1.1328 / (b - 3.4);

    // declare variables for the loop
    double U, V, us, log_mu, logfac_n, lhs, rhs;
    unsigned int n;

    unsigned int randPoisson = 0; // initialize the Poisson random variable (or variate)
    bool booleContinue = true;
    // Steps 1 to 3.1 in Algorithm PTRS
    while (booleContinue)
    {
        // generate two uniform variables
        U = funUniform();
        V = funUniform();

        U = U - 0.5;
        us = 0.5 - fabs(U);

        n = floor((2 * a / us + b) * U + mu + 0.43);

        if1
        {
            randPoisson = n;
            return randPoisson;
        }

        if2)
        {
            continue;
        }

        log_mu = log(mu);
        logfac_n = funLogFac(n);

        // two sides of an inequality condition
        lhs = log(V * one_over_alpha / (a / us / us + b));
        rhs = -mu + n * log_mu - logfac_n; // NOTE: uses log factorial n

        if (lhs <= rhs)
        {
            randPoisson = n;
            return randPoisson;
        }
        else
        {
            continue;
        }
    }
    return randPoisson;
}

Further reading

Algorithm PA was proposed in the paper:

  • 1979, Atkinson, The computer generation of Poisson random variables.

This algorithm is also given in the book Monte Carlo Statistical Methods by Casella and Robert; see Algorithm A.6.

Algorithm PTRS was proposed in the paper:

  • 1993, Hörmann, The transformed rejection method for generating Poisson random variable.
  1. us >= 0.07) && (V <= vr []
  2. n <= 0) || ((us < 0.013) && (V > us []