November 2019 – H. Paul Keeler

Simulating Poisson random variables – Survey of methods

In this post I’ll cover the different ways for generating Poisson random variates. (I often just say simulating Poisson variables.) I’ll then list which methods different language libraries use, ranging from open source projects NumPy and R to industry-level libraries MLK (by Intel) and cuRand (CUDA) by Nvidia.

Direct method doesn’t scale well

In the previous post, I discussed how to sample or generate Poisson random variables or, more correctly, variates. I detailed a direct method that uses the fact that a Poisson stochastic process, which is directly related to a Poisson point process, has inter-arrival times that form independent and identically distributed exponential variables.

The direct method is an easy and intuitive sampling method, explaining why it is often used. I implemented the method in MATLAB, Python, C and C#, which can be found here. Later, in another post, I implemented the same Poisson sampling method in Fortran, which is located here.)

As elegant and exact as this simulation method is, it unfortunately decreases in speed as the Poisson parameter $\lambda$ increases. In a tutorial published in 1983, Brian D. Ripely, a major figure in spatial statistics, says this about the direct method:

This is simple, but has expected time proportional to $\lambda$. Some of its competitors use rejection methods with the envelope distribution that of the integer part of a continuous random variable, such as logistic, Laplace and normal mixed with exponential distributions.

We recall that acceptance-rejection or rejections methods involve simulating a random object, such as a random variable, by first simulating another random object of the same type that is easier to simulate. The simulation method then accepts or rejects these random objects based on a certain ratio. The distribution of the simpler random object that is first simulated is called the envelope distribution. Such rejection methods are one way to simulate Poisson variables.

In short, when simulating Poisson variables, the appropriate simulation algorithm should be chosen based on the Poisson parameter. Consequently, the code of most computer functions for generating Poisson variables will have an if-statement, using the direct method for small parameter values and another method for large parameter values. In addition to that, the method for large Poisson parameter values should be both fast but simple to implement.

We now consider the other methods.

Different methods

Over the years there have been different methods proposed for producing Poisson random variates. In the book Non-uniform random variate generation, Luc Devroye groups (in Section X.3 on page 502) the different methods into five categories coupled with his views. These methods are:

Direct methods based on the homogeneous Poisson stochastic process having exponential inter-arrival times. These methods are simple, but the expected time is proportional to the Poisson parameter $\lambda$.
Inversion methods that search through a table of cumulative Poisson probabilities. Examples include the papers by Fishman (1976) and Atkinson (1979)*.
Methods that use the recursive properties of the Poisson distribution. The paper by Ahrens and Dieter (1974) uses this approach, and its expected time of completion is proportional to $\log\lambda$.
Acceptance-rejection (or rejection) methods that give relatively fast but simple algorithms. Such methods are proposed in the papers by Atkinson (1979)*, Ahrens and Dieter (1980) and Devroye (1981) or the technical report by Schmeiser and Kachitvichyanukul (1981).
Acceptance-complement methods that uses a normal distribution as the starting distribution, such as the paper by Ahrens and Dieter (1982). This method is fast, but the code is rather long.

*Atkinson had (at least) two papers on generating Poisson variates published in 1979, but I believe Devroye is referring to the first paper, because in the second paper Atkinson compares methods proposed by others.

For the paper titles, see the Further reading section below.

Code

In a separate post, I present implementations in Python and MATLAB of algorithms found respectively in the papers by Atkinson (1979) and Hörmann (1991). But these are only for illustration purposes, as Python (with SciPy) and MATLAB only have good functions for generating Poisson variables.

Methods implemented in popular libraries

I’ll now state which methods are used in various programming languages and numerical methods. I won’t go into the details how the methods work, just citing the papers instead.

MATLAB

For small $\lambda$ values, the MATLAB function poissrnd uses the direct method (based on inter-arrival times) with a while-loop.

For $\lambda$ values greater than fifteen, I believe that the MATLAB function poissrnd uses Algorithm PG from the 1974 paper by Ahrens and Dieter, which uses the the generation of gamma and binomial random variates.

But to come to this conclusion, I had to do some investigating. You can skip to the next section if you’re not interested, but now I’ll explain my reasoning.

The MATLAB documentation says it uses a method proposed by Ahrens and Dieter, but these two researchers developed a number of methods for generating Poisson variables. The MATLAB code cites Volume 2 of the classic series by Knuth, who says the method is due to Ahrens and Dieter, but he doesn’t give an exact citation in that section of the book. Confusingly, Knuth cites in his book a couple papers by Ahrens and Dieter for generating different random variates. (Knuth later cites a seemingly relevant 1980 paper by Ahrens and Dieter, but that details another method.)

Both the MATLAB code and Knuth cite the book by Devroye. In his book (Exercise 3.5.2), Devroye discusses one method, among others, from a 1974 paper by Ahrens and Dieter. Another hint is given by examining the code of the MATLAB function poissrnd, which reveals that it uses the function randg to generate gamma variables. In the Ahrens and Dieter 1974 paper, their Algorithm PG (for producing Poisson variates) uses gamma random variables, and it’s suggested to use a parameter value of $7/8$. This is the same parameter used in the MATLAB code and mentioned by Knuth, confirming that this is the right paper by Ahrens and Dieter.

In summary, for large $\lambda$ the function MATLAB uses Algorithm PG from the 1974 paper by Ahrens and Dieter, whereas for small values it uses the direct method, which they refer to as the multiplication method.

R

In R, the function rpois use an algorithm outlined in the 1982 paper by Ahrens and Dieter. You can view the R source code here. The two cases for $\lambda$ (or $\mu$ in the paper) depend on whether $\lambda$ is greater than ten or not. For small $\lambda$, the R function rpois does not use the method based on inter-arrival times, but rather an inversion method based on a table of (cumulative) probabilities given by the Poisson probability distribution.

Python (NumPy)

In NumPy, the function numpy.random.poisson generates Poisson variates. The source code for the NumPy library is here, but for the Poisson function the underlying code is actually written in C; see the distributions.c file located here. For small Poisson parameter $\lambda$, the code uses the direct method; see the function random_poisson_mult in the code.

For Poisson parameter $\lambda \geq 10$, the comments in the code reveal that it uses a method from a 1993 paper by Hörmann; see Algorithm PTRS on page 43 of the paper. This is a transformation method, which for NumPy is implemented in the C code as the function random_poisson_ptrs. The method, which Hörmann calls the transformed rejection with squeeze, combines inversion and rejection methods.

Octave

Octave is intended to be a GNU clone of MATLAB, so you would suspect it uses the same methods as MATLAB for generating Poisson random variates. But the Octave function poissrnd uses different methods. The code reveals it generates the Poisson variates with a function called prand. It considers different cases depending on the value of the Poisson parameter $\lambda$ as well as whether a single variable (that is, a scalar) or vector or matrix of Poisson variates are being generated.

In total, the Octave function prand uses five different methods. For two of the methods, the documentation cites methods from the classic book Numerical Recipes in C (the 1992 edition); see next section. To generate a single Poisson variate with Poisson parameter $\lambda \leq 12$, the Octave function prand uses the direct method based on inter-arrival times.

Numerical Recipes (Fortran, C and C++)

The book Numerical Recipes is a classic by Press, Teukolsky, Vetterling and Flannery on numerical methods. The books comes in different editions reflecting different publication years and computer languages. (In the first two editions of the book, the authors implemented the algorithms respectively in Fortran and C.)

For generating Poisson variates, the book contents seems to have not changed over the editions that I looked at, which covered the programming languages Fortran (77 and 90), C, and C++. The authors cover Poisson generation in Section 7.3 in the Fortran and C editions. In the third edition of Numerical Recipes, they implement their methods in C++ in Section 7.3.12.

For small values of Poisson parameter $\lambda$, Numerical Recipes uses the direct method. For $\lambda >12$ values, an acceptance-rejection method is used, which relies upon finding a continuous version of the discrete Poisson probability distribution.

GSL Library (C)

In the GSL library, one can use the function gsl_ran_poisson, which uses the the direct method of exponential times. The code, which can be viewed here, cites simply Knuth (presumably the second volume). But it seems to use the aforementioend Algorithm PG (for producing Poisson variates) from the 1974 paper by Ahrens and Dieter 1974; see the section above on MATLAB.

NAG Library (C)

Although I didn’t see the code, it appears that the function nag_rand_poisson (g05tjc ) in the NAG library also uses the direct method, based on the material in the second volume of series by Knuth. But in a 1979 paper Atkinson says that the NAG library uses a method from the 1974 paper by Ahrens and Dieter.

Boost library Random (C++)

The Boost library Random uses the PTRD algorithm proposed in the 1993 paper by Hörmann to generate Poisson variates; see Algorithm PTRD on page 42 of the paper. In the same paper appears the PTRS method, which is used by Python (NumPy) (though implemented in C), as mentioned above.

MKL library (C)

In the MKL C library written by Intel, there seems to be three methods in use for generating Poisson variates.

The first function is called VSL_RNG_METHOD_POISSON_PTPE, which does the following for a Poisson distribution with parameter $\Lambda$:

If Λ ≥ 27, random numbers are generated by PTPE method. Otherwise, a combination of inverse transformation and table lookup methods is used. The PTPE method is a variation of the acceptance/rejection method that uses linear (on the fraction close to the distribution mode) and exponential (at the distribution tails) functions as majorizing functions. To avoid time-consuming acceptance/rejection checks, areas with zero probability of rejection are introduced and a squeezing technique is applied.

This function uses the so-called PTPE method, which is outlined in a 1981 technical report by Schmeiser and Kachitvichyanukul.

The second function is called VSL_RNG_METHOD_POISSON_POISNORM, which does the following :

If Λ < 1, the random numbers are generated by combination of inverse transformation and table lookup methods. Otherwise, they are produced through transformation of the normally distributed random numbers.

The third function is called VSL_RNG_METHOD_POISSONV_POISNORM, which does the following:

If Λ < 0.0625, the random numbers are generated by inverse transformation method. Otherwise, they are produced through transformation of normally distributed random numbers.

cuRAND (C)

Finally, there is the CUDA Random Number Generation library (cuRAND) developed by Nvidia for their (now ubiquitous) graphical processing units (GPUs). This C/C++ library has a function for generating Poisson variates. To see the C code, copies of it can be found in various GitHub repositories, such as this one. The cuRAND function curand_poisson uses the direct function for Poisson parameter values less than 64. For parameters values greater than 4000, it uses a normal approximation (rounded to the nearest integer).

For other values, the function curand_poisson uses a rejection method based on an approximation of the incomplete gamma function; see the function curand_poisson_gammainc. The book by Fishman is cited; see Section 8.16.

Simulating Poisson random variables – Direct method

If you were to write from scratch a program that simulates a homogeneous Poisson point process, the trickiest part would be the random number of points, which requires simulating a Poisson random variable. In previous posts on simulating this point process, such as this one and this one, I’ve simply used the inbuilt functions for simulating (or generating) Poisson random variables (or variates).¹

But how would one create such a Poisson function using just a standard uniform random variate generator? In this post I present my own Poisson simulation code in MATLAB, Python, C and C#, which can be found here.

The method being used depends on the value of the Poisson parameter, denoted here by $\lambda$, which is the mean (as well as the variance) of a random variable with a Poisson distribution. If this parameter value is small, then a direct simulation method can be used to generate Poisson random variates. In practice a small Poisson parameter is a number less than some number between 10 to 30.

For large $\lambda$ values, other methods are generally used, such as rejection or (highly accurate) approximation methods. In the book Non-uniform random variate generation, the author Luc Devroye groups the methods into five categories (Section X.3.2), which I briefly describe in the next post. The first of those categories covers the method that I mentioned above. I will cover that method in this post, presenting some Poisson sampling code in C and C#. (I will also present some code in MATLAB, but you would never use it instead of the the inbuilt function poissrnd.)

In the next post, I’ll describe other types of Poisson simulation methods, and I’ll detail which simulation methods various programming libraries use.

Direct method

An elegant and natural method for simulating Poisson variates is to use a result based on the homogeneous Poisson stochastic process. The points in time when a homogeneous Poisson stochastic process increases forms a Poisson point process on the real line. ²

Using exponential random variables

Here’s the algorithm for sampling Poisson variables with exponential random variables, which I’ll explain.

Sample Poisson random variable $N$ with parameter (mean) $\lambda$ using exponential random variables

Set count variable $N=0$ and initial sum variable $S=0$;

While $S<1$:

Sample uniform random variable $U\sim U(0,1)$;

Calculate $E= -\log(U)/\lambda $ ;

Update count and sum variables by setting $N\rightarrow N+1$ and $S\rightarrow S+E$;

Return N;

The point in time when the Poisson stochastic process increases are called arrival times or occurrence times. In classic random models they represent the arrivals or occurrences of something, such as phone calls over time. The differences between consecutive times are called inter-arrival times or inter-occurrence times. The inter-arrival times of a homogeneous Poisson process form independent exponential random variables, a result known as the Interval Theorem.

Using this connection to the Poisson stochastic process, we can generate exponential variables $E_1$, $E_2, \dots $, and add them up. The smallest number of exponential variables for the resulting sum to exceeds one will give a Poisson random variable. That is, if we define $N$ to be the smallest $n$ such that
$$ \sum_{k=1}^{n+1} E_k > 1, $$
then $N$ is a random variable distributed according to a Poisson distribution.

Generating exponential variates is easily done by using the inverse method. For a uniform random variable $U$ on the unit interval $(0,1)$, the transformation $E= -\log(U)/\lambda $ gives an exponential random variable with mean $1/\lambda$.

But we can skip generating exponential random variates.

Using uniform random variables

Here’s the algorithm for sampling Poisson variables with uniform random variables.

Sample Poisson random variable $N$ with parameter (mean) $\lambda$ using uniform random variables

Set count variable $N=0$ and initial product variable $P=1$;

While $P>e^{-\lambda}$:

Sample uniform random variable $U\sim U(0,1)$;

Update count and product variables by setting $N\rightarrow N+1$ and $P\rightarrow P\times U$;

Return N;

To reduce computations, the direct method using exponential random variables is often reformulated as products of uniform random variables. We can do this, due to logarithmic identities, and work with products of uniform variables instead of sums of exponential random variables.

Then, by using standard uniform random variables $U_1, U_2,\dots$, we define $N$ to be the smallest $n$ such that
$$ \prod_{k=1}^{n+1} U_k < e^{-\lambda}. $$ These two different formulations of the same method are captured by Lemma 3.2 and Lemma 3.3 in Chapter 10 of Devroye’s book.

Example in MATLAB

Warning: My online webpage editor tends to mangle symbols like < and >, so it’s best not to copy my code straight from this website, unless you check and edit it, and download my code directly from here.

In MATLAB, we can implement this method with the first formulation in a function with a simple while-loop:

function N=funPoissonLoop(lambda)
T=0; %initialize sum of exponential variables as zero
n=-1;%initialize counting variable as negative one

while (T <1)
E=-(1/lambda)*log(rand(1));%generate exponential random variable
T=T+E; %update sum of exponential variables
n=n+1; %update number of exponential variables
end
N=n;
end

But, as I said before, don’t use this code instead of the inbuilt function poissrnd.

If you want to be a bit more tricky, you could achieve the same result by using recursion:

function N=funPoissonRecursive(lambda)
T=0; %initialize sum of exponential variables as zero
n=-1; %initialize counting variable as negative one

%run (recursive) exponential function step function
[~,N]=funStepExp(lambda,T,n);

function [T,N]=funStepExp(nu,S,m)
if (S < 1)
%run if sum of exponential variables is not high enough

%generate exponential random variable
E=(-log(rand(1)))/nu;
S=S+E; %update sum of exponential variables
m=m+1; %update nunber of exponential variables

%recursively call function again
[T,N]=funStepExp(nu,S,m);
else
T=S;
N=m;
end
end
end

Note how the code recursively calls the function funStepExp, which generates an exponential variable each time.

In the Code section below I describe my code in C and C#, using the second formulation.

Origins

Some people attribute the direct method, based on inter-arrival times, to (or, at least, cite) Donald Knuth, who details it in the second volume of his classic series of books, but I doubt that the great Knuth was the first to have this idea. For example, a quick search on Google Scholar found a paper by K. D. Tocher on computers and random sampling, where Tocher proposes the direct method in 1954, some years before Knuth started publishing his classic series.

The direct method for Poisson sampling relies upon the Interval theorem. The Poisson point process expert Günter Last studied the origins of this fundamental result. He presented its history in a recent book authored by him and Matthew Penrose; see Chapter 7 and its corresponding historical footnotes in Section C of the appendix. (A free version of the book can be found here. ) People connected to the result include Robert Ellis and William Feller who respectively lived in the 19th and 20th centuries.

Other methods

The direct method perfectly generates Poisson random variables (or I should say Poisson random variates). But it can be too slow for large values of the Poisson parameter (that, is the mean) $\lambda$. This has motivated researchers to develop other methods, which I will mention in the next post.

Code

I wrote some code that simulates Poisson random variables by employing the direct method based on exponential inter-arrival times. As always, all my the code is online, with the code from this post being located here.

I have implemented the second formulation (using just uniform variables) in the C and C# languages. In the code, I have used a while-loop to implement the method. But I could have also used a recursion method, as I did in the MATLAB example above, which I have also done in Python (with NumPy).

For an empirical test, the code also calculates the mean and variance of a collection of Poisson variables. For a large enough number of variables, the sample mean and the variance will closely agree with each other, converging to the same value.

C

Warning: My C code uses rand(), the standard pseudo-random number function in C, which is known for failing certain tests of randomness. The function is adequate for regular simulation work. But it gives poor results for large number of simulations. Replace this function with another pseudo-random number generator.

The code for generating a single Poisson variate is fairly straightforward in C. Here’s a sample of the code with just the Poisson function:

//Poisson function -- returns a single Poisson random variable
int funPoissonSingle(double lambda)
{
double exp_lambda = exp(-lambda); //constant for terminating loop
double randUni; //uniform variable
double prodUni; //product of uniform variables
int randPoisson; //Poisson variable

//initialize variables
randPoisson = -1;
prodUni = 1;
do
{
randUni = funUniformSingle(); //generate uniform variable
prodUni = prodUni * randUni; //update product
randPoisson++; //increase Poisson variable

} while (prodUni > exp_lambda);
return randPoisson;
}

For generating multiple variates, the code becomes more complicated, as one needs to use pointers, due to the memory capabilities of C. Again, the function uses the pseudo-random number generator in C.

C#

The code for generating a single Poisson variate is also straightforward in C#. Here’s the function in C#:

//Poisson function -- returns a single Poisson random variable
public int funPoissonSingle (double lambda) {
double exp_lambda = Math.Exp (-lambda); //constant for terminating loop
double randUni; //uniform variable
double prodUni; //product of uniform variables
int randPoisson; //Poisson variable

//initialize variables
randPoisson = -1;
prodUni = 1;
do {
randUni = funUniformSingle (); //generate uniform variable
prodUni = prodUni * randUni; //update product
randPoisson++; // increase Poisson variable

} while (prodUni > exp_lambda);

return randPoisson;
}

Generalizing the code so it generates multiple variates just requires a little change, compared to C, as the C# language is a much more modern language.

Fortran

After this original post, I later wrote a post about implementing the same Poisson algorithm in Fortran. My Fortran code is very similar to the code that I wrote in C and C#. You should be able to run it on this website or similar ones that can compile Fortran (95) code.

Month: November 2019

Simulating Poisson random variables – Survey of methods

Direct method doesn’t scale well

Different methods

Code

Methods implemented in popular libraries

MATLAB

R

Python (NumPy)

Octave

Numerical Recipes (Fortran, C and C++)

GSL Library (C)

NAG Library (C)

Boost library Random (C++)

MKL library (C)

cuRAND (C)

Further reading

Books

Articles

Simulating Poisson random variables – Direct method

Direct method

Using exponential random variables

Sample Poisson random variable \(N\) with parameter (mean) \(\lambda\) using exponential random variables

Using uniform random variables

Sample Poisson random variable \(N\) with parameter (mean) \(\lambda\) using uniform random variables

Example in MATLAB

Origins

Other methods

Code

C

C#

Fortran

Further reading