In the previous post, I discussed how to sample or generate Poisson random variables or (more correctly) variates. I detailed a direct method that uses the fact that a Poisson stochastic process, which is directly related to a Poisson point process, has inter-arrival times that form independent and identically distributed exponential variables. This method in turn can be easily reformulated so it only uses (standard) uniform variables to generate Poisson random variables. It is an easy and popular sampling method, explaining why it is often used.
As elegant and precise as this simulation method is, it unfortunately decreases in speed as the Poisson parameter \(\lambda\) increases. In a tutorial published in 1983, Brian D. Ripely, a major figure in spatial statistics, says this about the direct method:
This is simple, but has expected time proportional to \(\lambda\). Some of its competitors use rejection methods with the envelope distribution that of the integer part of a continuous random variable, such as logistic, Laplace and normal mixed with exponential distributions.
We recall that acceptance-rejection or rejections methods involve simulating a random object, such as a random variable, by first simulating another random object of the same type that is easier to simulate, and then accepting or rejecting these random variables or variates based on a certain ratio. The distribution of the random object that is first simulated is called the envelope distribution. Such rejection methods are one way to simulate Poisson variables.
Consequently, the code of most computer functions for generating Poisson variables will have an if-statement, using the direct method for small parameter values and another method for large parameter values.
Over the years there have been different methods proposed for producing Poisson random variates. In the book Non-uniform random variate generation, Luc Devroye groups the different methods into five categories with his views, which I’ll briefly describe here:
- Direct methods based on the homogeneous Poisson stochastic process having exponential inter-arrival times. These methods are simple, but the expected time is proportional to the Poisson parameter \(\lambda\).
- Inversion methods that search through a table of cumulative Poisson probabilities. Examples include the papers by Fishman (1976) and Atkinson (1979)*.
- Methods that use the recursive properties of the Poisson distribution. The paper by Ahrens and Dieter (1974) uses this approach, and its expected time of completion is proportional to \(\log\lambda\).
- Acceptance-rejection (or rejection) methods that give relatively fast but simple algorithms. Such methods are proposed in the papers by Atkinson (1979)*, Ahrens and Dieter (1980) and Devroye (1981) or the technical report by Schmeiser and Kachitvichyanukul (1981).
- Acceptance-complement methods that uses a normal distribution as the starting distribution, such as the paper by Ahrens and Dieter (1982). This method is fast, but the code is rather long.
*Atkinson had (at least) two papers on generating Poisson variates published in 1979, but I believe Devroye is referring to the first paper, because in the second paper Atkinson compares methods proposed by others.
For titles of the papers, see the Further reading section below.
In this section, I’ll state which proposed methods are used in various programming languages and numerical methods. I won’t go into the details how the methods work, as I’ll just cite the papers instead.
For \(\lambda\) values greater than fifteen, I believe that the MATLAB function poissrnd uses Algorithm PG from the 1974 paper by Ahrens and Dieter. But to come to this conclusion, I had to do some investigating. You can skip to the next section if you’re not interested, but now I’ll explain my reasoning.
The MATLAB documentation says it uses a method proposed by Ahrens and Dieter, but these two researchers developed a number of methods for generating Poisson variables. The MATLAB code cites Volume 2 of the classic series by Knuth, who says the method is due to Ahrens and Dieter, but he doesn’t give an exact citation in that section of the book. Confusingly, Knuth cites in his book a couple papers by Ahrens and Dieter for generating different random variates. (Knuth later cites a seemingly relevant 1980 paper by Ahrens and Dieter, but that details another method.)
Both the MATLAB code and Knuth cite the book by Devroye. In his book (Exercise 3.5.2), Devroye discusses one method, among others, from a 1974 paper by Ahrens and Dieter. Another hint is given by examining the code of the MATLAB function poissrnd, which reveals that it uses the function randg to generate gamma variables. In the Ahrens and Dieter 1974 paper, their Algorithm PG (for producing Poisson variates) uses gamma random variables, and it’s suggested to use a parameter value of \(7/8\), as used by MATLAB and mentioned by Knuth, confirming this is the right paper.
In summary, for large \(\lambda\) the function MATLAB uses Algorithm PG from the 1974 paper by Ahrens and Dieter, whereas it uses the direct method for small values.
In R, the function rpois use an algorithm outlined in the 1982 paper by Ahrens and Dieter. You can view the R source code here. The two cases for \(\lambda\) (or \(\mu\) in the paper) depend on whether \(\lambda\) is greater than ten or not. For small \(\lambda\), the R function rpois does not use the method based on inter-arrival times, but rather an inversion method based on a table of (cumulative) probabilities given by the Poisson probability distribution.
In NumPy, the function numpy.random.poisson generates Poisson variates. The source code for the NumPy library is here, but for Poisson function the underlying code is actually written in C; see the distributions.c file located here. For small Poisson parameter \(\lambda\), the code uses the direct method; see the function random_poisson_mult in the code.
For Poisson parameter \(\lambda \geq 10\), the comments in the code reveal that it uses a method from a 1993 paper by Hörmann; see Algorithm PTRS on page 43 of the paper. This is a transformation method, which for NumPy is implemented in the C code as the function random_poisson_ptrs. The method, which Hörmann calls the transformed rejection with squeeze, combines inversion and rejection methods.
Octave is intended to be a GNU clone of MATLAB, so you would suspect it uses the same methods as MATLAB for generating Poisson random variates. But the Octave function poissrnd uses different methods. The code reveals it generates the Poisson variates with a function called prand. It considers different cases depending on the value of the Poisson parameter \(\lambda\) as well as whether a single variable (that is, a scalar) or vector or matrix of Poisson variates are being generated.
In total, the Octave function prand uses five different methods. For two of the methods, the documentation cites methods from the classic book Numerical Recipes in C (the 1992 edition); see next section. To generate a single Poisson variate with Poisson parameter $\lambda \leq 12$, the Octave function prand uses the direct method based on inter-arrival times.
Numerical Recipes (Fortran, C and C++)
The Numerical Recipes is a classic book by Press, Teukolsky, Vetterling and Flannery on numerical methods. The books comes in different editions reflecting different publication years and computer languages. (In the first two editions of the book, the authors implemented the algorithms respectively in Fortran and C.)
For generating Poisson variates, the book contents seems to have not changed over the editions that I looked at, which covered the programming languages Fortran (77 and 90), C, and C++. The authors cover Poisson generation in Section 7.3 in the Fortran and C editions. In the third edition of Numerical Recipes, they implement their methods in C++ in Section 7.3.12.
For small values of Poisson parameter \(\lambda\), Numerical Recipes uses the direct method. For \(\lambda >12\) values, an acceptance-rejection method is used, which relies upon finding a continuous version of the discrete Poisson probability distribution.
GSL Library (C)
In the GSL library, one can use the function gsl_ran_poisson, which uses the the direct method of exponential times. The code, which can be viewed here, cites simply Knuth (presumably the second volume).
NAG Library (C)
Although I didn’t see the code, it appears that the function nag_rand_poisson (g05tjc ) in the NAG library also uses the direct method, based on the material in the second volume of series by Knuth. But in a 1979 paper Atkinson says that the NAG library uses a method from the 1974 paper by Ahrens and Dieter.
MKL library (C)
In the MKL C library written by Intel, they seem to have three functions or methods for generating Poisson variates.
The first function is called VSL_RNG_METHOD_POISSON_PTPE, which does the following for a Poisson distribution with parameter \(\Lambda\):
If Λ ≥ 27, random numbers are generated by PTPE method. Otherwise, a combination of inverse transformation and table lookup methods is used. The PTPE method is a variation of the acceptance/rejection method that uses linear (on the fraction close to the distribution mode) and exponential (at the distribution tails) functions as majorizing functions. To avoid time-consuming acceptance/rejection checks, areas with zero probability of rejection are introduced and a squeezing technique is applied.
This function uses the so-called PTPE method, which is outlined in a 1981 technical report by Schmeiser and Kachitvichyanukul.
The second function is called VSL_RNG_METHOD_POISSON_POISNORM, which does the following :
If Λ < 1, the random numbers are generated by combination of inverse transformation and table lookup methods. Otherwise, they are produced through transformation of the normally distributed random numbers.
The third function is called VSL_RNG_METHOD_POISSONV_POISNORM, which does the following:
If Λ < 0.0625, the random numbers are generated by inverse transformation method. Otherwise, they are produced through transformation of normally distributed random numbers.
Finally, Nvidia’s CUDA Random Number Generation library (cuRAND) has a function for generating Poisson variates. To see the C code, copies of it can be found in various GitHub repositories, such as this one. The cuRAND function curand_poisson uses the direct function for Poisson parameter values less than 64. For parameters values greater than 4000, it uses a normal approximation (rounded to the nearest integer).
For other values, the function curand_poisson uses an a rejection method based on an approximation of the incomplete gamma function; see the function curand_poisson_gammainc. The book by Fishman is cited; see Section 8.16.
For various Poisson simulation methods, see the stochastic simulation books by Devroye (Section X.3) or Fishman (Section 8.16). The book by Gentle (Section 5.2.8) also briefly covers Poisson variables. Kuhl wrote a paper on the history of generating random variates. It has some interesting historical points, but I was not entirely impressed by the quality or detail of the paper.
Of course, it’s a good idea to look at the citations that the different functions use.
Finally, here is a list of the papers I mentioned in this post:
- 1974, Ahrens and Dieter, Computer methods for sampling from gamma, beta, poisson and bionomial distributions;
- 1976, Fishman, Sampling from the Poisson distribution on a computer;
- 1979, Atkinson, The computer generation of Poisson random variables;
- 1979, Atkinson, Recent developments in the computer generation of Poisson random variables;
- 1980, Ahrens and Dieter, Sampling from binomial and Poisson distributions: a method with bounded computation times;
- 1980, Devroye, The Computer Generation of Poisson Random Variables;
- 1981, Schmeiser and Kachitvichyanukul, Poisson Random Variate Generation;
- 1982, Ahrens and Dieter, Computer generation of Poisson deviates from modified normal distributions;
- 1983, Ripley, Computer Generation of Random Variables: A Tutorial;
- 1993, Hörmann, The transformed rejection method for generating Poisson random variable.