acceptance-rejection Archives

The Metropolis-Hastings algorithm in C

There’s an old saying, which I just made up. If you really want to understand something, code it up in C.¹

I wrote some C code that basically does what my code in this previous post does without the pretty pictures. The code performs a Metropolis-Hastings algorithm, simulating random variables (or, more correctly, generating random variates) for a joint probability density describing two random variables.

In previous posts, I have covered the topic of Markov chain Monte Carlo (MCMC) methods, particularly the central workhorse, the Metropolis(-Rosenbluth-Rosenbluth-Teller-Teller)-Hastings algorithm. These methods are frequently used in Bayesian statistics, high-dimensional integration, and optimization. For more details on how they work, I have written a couple posts, starting with this one and ending with this one, where I detail the mechanics of MCMC methods.

Let’s call this code MCMC C code².

Code considerations

The C programming language was not written for playing with random numbers. The standard uniform random number generator in C, simply called rand, is not good for research level random simulations. But it does the job for illustration purposes.

In lieu of this generator, the Mersenne Twister is a popular algorithm for producing such numbers, which is widely recommended and used. There are implementations of this algorithm in the CUDA and MKL libraries for Nvidia GPUs and Intel CPUs; see this CUDA page and this MKL page for details. Check out this PDF file for further details on the CUDA version.

In addition to that, my C code needs to simulate normal (or Gaussian) random variables. For that I wrote my own simple code using the Box-Muller transform, which I covered in a previous post, so the code would be self-contained and less opaque. But in reality, you should always use functions from a pre-written library for generating variates according to a normal or whichever distribution.

Finally, C was never intended as a scientific language, despite its wide use as one. (Historically, that was Fortran’s job, which is still the workhorse for many serious number-crunching institutes, hence why there’s Fortran -ready version of CUDA.) So when handling sets of numbers, such as vectors and matrices, one has to use pointers and malloc more often that not, which can be a tricky. This is the case here, though the use of pointers in this code is relatively simple.

Where are the pretty pictures?

Unfortunately, when number crunching in C, you don’t immediately have access to plotting libraries that are available in scientific programming languages such as Python and Julia.

But you can simply create .csv (or text) files, and then plot them using whichever library you prefer. And if you have gnuplot installed, you can perform simple one-dimensional histograms using one-line commands such as:

gnuplot -e “plot ‘MCMCData_1D.csv’ using 1 bins=20;” -persist

In the above command, the file MCMCData_1D.csv has the random variates (the simulated random variables) stored in a single column.

Code

I only present code for the more complicated two-dimensional case. The code can be found here.

Note that there’s an option in the code of MCMC_1D.c to plot the results using gnuplot, if it’s installed on your machine. I didn’t include code for plotting the results of the 2-D case as gnuplot doesn’t do a 2-D histogram.

Warning: The code uses the standard pseudo-random number generator in C, which is known for being bad. I only used built-in the C generator to keep my code self-contained. For that reason, I also wrote my code for generating Gaussian (or normal) random variables, by using the Box-Muller transform, but in reality one would never do that for research or industry purposes.

/***********************************************************************
 * Runs a simple Metropolis-Hastings (ie MCMC) algorithm to simulate two
 * jointly distributed random variables with probability density
 * p(x,y)=exp(-(x^4+x*y+y^2)/s^2)/consNorm, where s>0 and consNorm is a
 * normalization constant. The probability density function is defined in
 * the function pdf_single.
 *
 * NOTE: In practice, the value of the normalization constant is not needed, as it cancels out in the algorithm.
 *
 * NOTE: This code will *create* a local file (see variable strFilename) to store results. It will *overwrite* that file if it already exists.
 *
 * WARNING: This code uses the default C random number generator, which is known for failing various tests of randomness.
 * Strongly recommended to use another generator for purposes beyond simple illustration.
 *
 * Author: H. Paul Keeler, 2024.
 * Website: hpaulkeeler.com
 * Repository: github.com/hpaulkeeler/posts
 *
 ***********************************************************/

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include <stdbool.h>
#include <string.h>

const long double pi = 3.14159265358979323846; // constant pi for generating polar coordinates

// helper function declarations; see below for definitions
static double *unirand(double *randValues, unsigned numbRand);                           // generate  uniform random variables on (0,1)
static double *normrand(double *randValues, unsigned numbRand, double mu, double sigma); // generate normal random variables
static double pdf_single(double x, double y, double s);                      // define probability density to be simulated
static double mean_var(double *set_sample, unsigned numbSim, double *varX);              // calculate meana and variance

int main(int argc, char *argv[])
{

    if (argc > 1)
    {
        fprintf(stderr, "This program takes no arguments...\n");
        exit(1);
    }
    else
    {

        char strFilename[] = "MCMCData_2D.csv"; // filename for storing simulated random variates

        // intializes (pseudo)-random number generator
        time_t timeCPU; // use CPU time for seed
        srand((unsigned)time(&timeCPU));
        // srand(42); //to reproduce results

        bool booleWriteData = true; // write data to file
        bool booleStats = true;     // perform simple mean/std stats

        // parameters
        unsigned numbSim = 1e4;   // number of random variables simulated
        unsigned numbSteps = 200; // number of steps for the Markov process
        double sigma = 2;         // standard deviation for normal random steps

        // probability density parameters
        double s = .5; // scale parameter for distribution to be simulated

        // Metropolis-Hastings variables
        // proposal for a new position in the random walk
        double zxRandProposal;      
        double zyRandProposal;      
        double pdfProposal; // density for proposed position
        double pdfCurrent;  // density of current position
        double ratioAccept; // ratio of densities (ie acceptance probability)
        double uRand;       // uniform variable for Bernoulli trial (ie a coin flip)
        // random step (normally distributed)
        double *p_numbNormX = (double *)malloc(1 * sizeof(double));
        double *p_numbNormY = (double *)malloc(1 * sizeof(double));
//positions of the random walk (ie the simualted random variables after numbSteps)
        double *p_xRand = (double *)malloc(numbSim * sizeof(double));
        double *p_yRand = (double *)malloc(numbSim * sizeof(double));

        (void)unirand(p_xRand, numbSim); // random initial values
        (void)unirand(p_yRand, numbSim); // random initial values

        unsigned i, j; // loop varibales
        for (i = 0; i < numbSim; i++)
        {
            // loop through each random walk instance (or random variable to be simulated)

            pdfCurrent = pdf_single(*(p_xRand + i), *(p_yRand + i), s); // current probability density

            for (j = 0; j < numbSteps; j++)
            {
                // loop through each step of the random walk
                (void)normrand(p_numbNormX, 1, 0, sigma);
                (void)normrand(p_numbNormY, 1, 0, sigma);
                // take a(normally distributed) random step in x and y
                zxRandProposal = (*(p_xRand + i)) + (*p_numbNormX);
                zyRandProposal = (*(p_yRand + i)) + (*p_numbNormY);

                pdfProposal = pdf_single(zxRandProposal, zyRandProposal, s); // proposed probability density

                // acceptance rejection step
                (void)unirand(&uRand, 1);
                ratioAccept = pdfProposal / pdfCurrent;
                if (uRand < ratioAccept)
                {
                    // update state of random walk / Markov chain
                    *(p_xRand + i) = zxRandProposal;
                    *(p_yRand + i) = zyRandProposal;
                    pdfCurrent = pdfProposal;
                }
            }
        }

        free(p_numbNormX);
        free(p_numbNormY);

        if (booleStats)
        {

            // initialize statistics variables (for testing results)
            char strVariable[] = "XY";
            double *p_AllRand[] = {p_xRand, p_yRand};
            double meanTemp = 0;
            double varTemp = 0;
            double stdTemp = 0;
            char strTemp='X';
            for (i = 0; i < 2; i++)
            {
                meanTemp = mean_var(p_AllRand[i], numbSim, &varTemp);
                stdTemp = sqrt(varTemp);
                strTemp=strVariable[i];
                printf("The average of the %c random variables is %lf.\n", strTemp, meanTemp);
                printf("The standard deviation of the %c random  variables is %lf.\n", strTemp, stdTemp);
            }
        }

        if (booleWriteData)
        {
            // print to file
            FILE *outputFile;
            outputFile = fopen(strFilename, "w");
            for (i = 0; i < numbSim; i++)
            {
                fprintf(outputFile, "%lf,%lf\n", *(p_xRand + i), *(p_yRand + i)); // output to file
            }
            fclose(outputFile);
            printf("Data printed to file.\n");
        }
        free(p_xRand);
        free(p_yRand);

        return (0);
    }
}

static double pdf_single(double x, double y, double s)
{
    // returns the probability density of a single point (x,y) inside a simulation window defined below
    double pdf_output;

    // non-zero density window parameters
    double xMin = -1;
    double xMax = 1;
    double yMin = -1;
    double yMax = 1;

    if¹
    {
        pdf_output = exp(-((pow(x, 4) + x * y + pow(y, 2)) / (s * s)));
    }
    else
    {
        pdf_output = 0;
    }
    return pdf_output;
}

static double *normrand(double *randValues, unsigned numbRand, double mu, double sigma)
{
    // simulate pairs of iid normal variables using Box-Muller transform
    // https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform

    double U1, U2, thetaTemp, rhoTemp, Z1, Z2;
    int i = 0;
    while (i < numbRand)
    {
        // simulate variables in polar coordinates (theta, rho)
        (void)unirand(&U1, 1);
        thetaTemp = 2 * pi * U1; // create uniform theta values
        (void)unirand(&U2, 1);
        rhoTemp = sqrt(-2 * log(U2)); // create Rayleigh rho values

        // change to Cartesian coordinates
        Z1 = rhoTemp * cos(thetaTemp);
        Z1 = sigma * Z1 + mu;
        randValues[i] = Z1; // assign first of random variable pair
        i++;
        if (i < numbRand)
        {
            // if more variables are needed, generate second value of random pair
            Z2 = rhoTemp * sin(thetaTemp);
            Z2 = sigma * Z2 + mu;
            randValues[i] = Z2; // assign second of random variable pair
            i++;
        }
        else
        {
            break;
        }
    }
    return randValues;
}

static double *unirand(double *randValues, unsigned numbRand)
{ // simulate numbRand uniform random variables on the unit interval
  // storing them in randValues which must be allocated by the caller
  // with enough space for numbRand doubles

    for (int i = 0; i < numbRand; i++)
    {
        randValues[i] = (double)rand() / RAND_MAX;
    }
    return randValues;
}

static double mean_var(double *set_sample, unsigned numbSim, double *varX)
{
    // mean and variance of set_sample
    int i;
    // initialize statistics variables (for testing results)
    double meanX = 0;
    double meanXSquared = 0;
    double tempX;
    for (i = 0; i < numbSim; i++)
    {
        tempX = *(set_sample + i);
        meanX += tempX / ((double)numbSim);
        meanXSquared += tempX * tempX / ((double)numbSim);
    }

    *varX = meanXSquared - meanX * meanX;
    return meanX;
}

Acknowledg(e)ments

A hat tip to C and CUDA guru Alex Stivala who pointed out a couple of minor issues in my original C code.

x >= xMin) && (x <= xMax) && (y >= yMin) && (y <= yMax [↩]

Quantum-enhanced Markov chain Monte Carlo

The not-so-mathematical journal Nature recently published a paper proposing a new Markov chain Monte Carlo method:

2023 – Layden, Mazzola, Mishmash, Motta, Wocjan, Kim, and Sheldon – Quantum-enhanced Markov chain Monte Carlo.

Appearing earlier as this preprint, the paper’s publication in such a journal is a rare event indeed. This post notes this, as well as the fact that we can already simulate perfectly¹ the paper’s test model, the Ising or Potts model.² But this is a quantum algorithm, which is exciting and explains how it can end up in that journal.

The algorithm

The paper’s proposed algorithm adds a quantum mechanical edge or enhancement to the classic Metropolis-Hastings algorithm.³ As I covered in a recent post, the original algorithm uses a Markov chain defined on some mathematical space. Running it on a traditional or classical computer, at each time step, the algorithm consists of proposing a random jump and then accepting the proposed jump or not. Owing to the magic of Markov chains, in the long run, the algorithm simulates a desired probability distribution.

The new quantum version of the algorithm uses a quantum computer to propose the jump, while still using a classical computer to accept the proposal or not.⁴ The quantum jump proposals are driven by a time-independent Hamiltonian, which is a central object in quantum and, in fact, all physics. This leads to a Boltzmann (or Gibbs) probability distribution for the jumping process.

Then, running the quantum part on a quantum computer, the algorithm will hopefully outperform its classical counterpart. The paper nurtures this hope by giving empirical evidence of the algorithm’s convergence speed. The researchers performed the numerical experiments on a 27-qubit quantum processor at IBM using the platform Qiskit.

Quantum is so hot right now

In recent years researchers have been focusing on such algorithms that exploit the strangeness and spookiness of quantum mechanics. You will see more and more quantum versions of algorithms that appear in statistics, machine learning, and related fields, as suggested by this survey paper, which also appeared in Nature.

Quantum lite

Sometimes quantum mechanics only loosely inspires algorithms and models. In this setting, some of my machine learning work uses determinantal point processes. This kernel-based random model draws direct inspiration from the wave function, a standard object in quantum mechanics. Under suitable simplifying conditions, the model describes the locations of particles known as fermions such as electrons and protons. Still, it’s fascinating that a quantum physics model inspired an interesting random object that has found applications in spatial statistics and machine learning.

The Metropolis-Hastings algorithm in Python, Julia, R, and MATLAB

I wrote this code a little while ago, and I thought now would be a good time to present it, as I have covered the topic of Markov chain Monte Carlo (MCMC) methods, particularly the central workhorse the Metropolis(-Rosenbluth-Rosenbluth-Teller-Teller)-Hastings algorithm. For more details on these methods, I have written about these methods in a couple posts, starting with this one and ending with this particularly relevant one.

Update: I re-wrote this code in C, located here, where I used my own code for generating Gaussian (or normal) variables, to keep the code entirely self-contained, but in practice you should never do that.

What the code does

The code basically does the same thing in four different (scientific) programming languages, namely Python, Julia, C, and MATLAB. It performs a Metropolis-Hastings algorithm, simulating random variables (or more correctly, generating random variates) for two respective probabilities densities in one dimension and two dimensions.

Two examples

The one-dimensional case is particularly simple, as the code only simulates an exponential variable. But for this random variable in practice, you would never ever do with any MCMC method, because simply simulate exponential random variables directly. I discussed in a previous post how this direct approach is used to simulate Poisson variables.

The two-dimensional case is slightly more complicated than the the classic joint Gaussian (or normal) probability density, which you would use other methods to simulate. But the idea can be extended to $n$ dimensions, which is often the case when dealing with the probability distributions and their corresponding integrals that arise in Bayesian statistics.

Implementation considerations

Variance of the walk

To create the random walk, the code uses normal (or Gaussian) random variables, where the mean is simply the current position of the random walk. This is a standard approach due to the convenient properties of normal distribution.

There’s also the standard deviation $\sigma>0$ of the normal variables. In machine learning circles, this is what they call a hyperparameter. The random Metropolis-Hastings algorithm will, in theory, work regardless of the value, but some values result in faster results than others. This issue is covered briefly in this question on Stats Exchange. And it raises an important question when using MCMC methods in general.

Convergence tests

How many simulations steps are enough to ensure that the random variables being simulated behave closely enough to the desired random variables? In other words, how long does it take for the algorithm to target distribution?

The simulation time elapsed before the algorithm has reached a certain level of sufficient called the burn-in period. Due to its vital importance, it is a central topic in the development and implementation of MCMC methods.

There are tests for assessing the degree of convergence during the simulation run such as the Gelman-Rubin test. This particular test involves taking simple empirical means and variances over the last $m$ samples, across all simulation runs, and between simulation runs. Then a ratio is calculated of variances is calculated, which should be close to one if the algorithm is sufficiently converged. I haven’t implemented any such tests here, but it’s something that one should do in practice.

Perhaps I’ll cover that topic in a future post.

Testing the results

To test the results, you can empirical calculate the first two moments (that is, the mean and variance) of the simulated random variables. That acts as a very good sanity check.

But if you need more convincing, you can perform a histogram on the results, which effectively a way to empirically estimate the probability density of the simulated random variables. Fortunately, all four programming languages have built in histogram (counting) functions for one and two dimensions.

I already covered this topic in a previous posts on checking Poisson point process simulations.

Code

I’ll only include the code for Python and Julia, and refefr the reader to the rest of the code found here.

Python

I used the Matplotlib library to plot the probability density and its estimate.

For the histogram section, I used the histogram and histogram2d functions respectively to estimate the distribution of the number of points and the intensity function. I used the pdf option.

import numpy as np;  # NumPy package for arrays, random number generation, etc
import matplotlib.pyplot as plt  # for plotting
from matplotlib import cm  # for heatmap plotting
from mpl_toolkits import mplot3d  # for 3-D plots
from scipy import integrate  # for integrating

plt.close("all");  # close all previous plots

# Simulation window parameters
xMin = -1;
xMax = 1;
yMin = -1;
yMax = 1;

numbSim = 10 ** 4;  # number of random variables simulated
numbSteps = 200;  # number of steps for the Markov process
numbBins = 50;  # number of bins for histogram
sigma = 2;  # standard deviation for normal random steps

# probability density parameters
s = .5;  # scale parameter for distribution to be simulated

def fun_lambda(x, y):
    return np.exp(-(x ** 4 + x*y + y ** 2) / s ** 2);

# normalization constant
consNorm = integrate.dblquad(fun_lambda, xMin, xMax, lambda x: yMin, lambda y: yMax)[0];
#un-normalized joint density of variables to be simulated
def fun_p(x, y):
    return (fun_lambda(x, y) ) * (x >= xMin) * (y >= yMin) * (x <= xMax) * (y <= yMax);

xRand = np.random.uniform(xMin, xMax, numbSim);  # random initial values
yRand = np.random.uniform(yMin, yMax, numbSim);  # random initial values

pdfCurrent = fun_p(xRand, yRand);  # current transition (probability) densities

for jj in range(numbSteps):
    zxRand = xRand + sigma * np.random.normal(0, 1, numbSim);  # take a (normally distributed) random step
    zyRand = yRand + sigma * np.random.normal(0, 1, numbSim);  # take a (normally distributed) random step
    # Conditional random step needs to be symmetric in x and y
    # For example: Z|x ~ N(x,1) (or Y=x+N(0,1)) with probability density
    # p(z|x)=e(-(z-x)^2/2)/sqrt(2*pi)
    pdfProposal = fun_p(zxRand, zyRand);  # proposed probability

    # acceptance rejection step
    booleAccept = np.random.uniform(0, 1, numbSim) < pdfProposal / pdfCurrent;
    # update state of random walk/Markov chain
    xRand[booleAccept] = zxRand[booleAccept];
    yRand[booleAccept] = zyRand[booleAccept];
    # update transition (probability) densities
    pdfCurrent[booleAccept] = pdfProposal[booleAccept];

# for histogram, need to reshape as vectors
xRand = np.reshape(xRand, numbSim);
yRand = np.reshape(yRand, numbSim);

p_Estimate, xxEdges, yyEdges = np.histogram2d(xRand, yRand, bins=numbSteps, density=True);
xValues = (xxEdges[1:] + xxEdges[0:xxEdges.size - 1]) / 2;  # mid-points of bins
yValues = (yyEdges[1:] + yyEdges[0:yyEdges.size - 1]) / 2;  # mid-points of bins
X, Y = np.meshgrid(xValues, yValues);  # create x/y matrices for plotting

# analytic solution of (normalized) joint probability density
p_Exact = fun_p(X, Y) / consNorm;

# Plotting
# Plot empirical estimate
fig1 = plt.figure();
ax = plt.axes(projection="3d");
#plt.rc("text", usetex=True);
#plt.rc("font", family="serif");
surf = ax.plot_surface(X, Y, p_Estimate, cmap=plt.cm.plasma);
plt.xlabel("x");
plt.ylabel("y");
plt.title("p(x,y) Estimate");

# Plot exact expression
fig2 = plt.figure();
#plt.rc("text", usetex=True);
#plt.rc("font", family="serif")
ax = plt.axes(projection="3d");
surf = ax.plot_surface(X, Y, p_Exact, cmap=plt.cm.plasma);
plt.xlabel("x");
plt.ylabel("y");
plt.title("p(x,y) Exact Expression");

Julia

using Distributions #for random simulations
using PyPlot #for plotting
using StatsBase #for histograms etc
using Random
using LinearAlgebra
using HCubature #for numerical integration
#using LaTeXStrings #for LateX in labels/titles etc
PyPlot.close("all");  # close all PyPlot figures

#set random seed for reproducibility
#Random.seed!(1234)

# Simulation window parameters
xMin = -1;
xMax = 1;
yMin = -1;
yMax = 1;

numbSim = 10 ^ 5;  # number of random variables simulated
numbSteps = 25;  # number of steps for the Markov process
numbBins = 50;  # number of bins for histogram
sigma = 2;  # standard deviation for normal random steps

# probability density parameters
s = .5;  # scale parameter for distribution to be simulated

function fun_lambda(x,y)
    return (exp.(-(x.^4+x.*y+y.^2)./s^2));
end

#normalization constant -- UNDER CONSTRUCTION
consNorm,errorCub=hcubature(x -> fun_lambda(x[1],x[2]), [xMin, yMin], [xMax, yMax]);
#un-normalized joint density of variables to be simulated
function fun_p(x,y)
    return((fun_lambda(x,y)).*(x.>=xMin).*(y.>=yMin).*(x.<=xMax).*(y.<=yMax));
end
xRand=(xMax-xMin).*rand(numbSim).+xMin; #random initial values
yRand=(yMax-yMin).*rand(numbSim).+yMin; #random initial values

pdfCurrent=fun_p(xRand,yRand); #current transition (probability) densities
for jj=1:numbSteps
    zxRand= xRand.+sigma.*rand(Normal(),numbSim);#take a (normally distributed) random step
    zyRand= yRand.+sigma.*rand(Normal(),numbSim);#take a (normally distributed) random step

    # Conditional random step needs to be symmetric in x and y
    # For example: Z|x ~ N(x,1) (or Y=x+N(0,1)) with probability density
    # p(z|x)=e(-(z-x)^2/2)/sqrt(2*pi)
    pdfProposal = fun_p(zxRand, zyRand);  # proposed probability

    # acceptance rejection step
    booleAccept=rand(numbSim) .< pdfProposal./pdfCurrent;
    # update state of random walk/Markov chain
    xRand[booleAccept] = zxRand[booleAccept];
    yRand[booleAccept] = zyRand[booleAccept];
    # update transition (probability) densities
    pdfCurrent[booleAccept] = pdfProposal[booleAccept];
end

#histogram section: empirical probability density
histXY=fit(Histogram, (xRand,yRand),nbins=numbBins); #find histogram data
histXY=normalize(histXY,mode=:pdf); #normalize histogram
binEdges=histXY.edges; #retrieve bin edges
xValues=(binEdges[1][2:end]+binEdges[1][1:end-1])./2; #mid-points of bins
yValues=(binEdges[2][2:end]+binEdges[2][1:end-1])./2; #mid-points of bins
p_Estimate=(histXY.weights)
#create a meshgrid
X=[xValues[ii] for ii=1:length(xValues), jj=1:length(yValues)];
Y=[yValues[jj] for ii=1:length(xValues), jj=1:length(yValues)];

#analytic solution of (normalized) joint probability density
p_Exact = fun_p(X, Y)./consNorm;

# Plotting
# Plot empirical estimate
fig1 = PyPlot.figure();
PyPlot.rc("text", usetex=true);
PyPlot.rc("font", family="serif");
surf(X, Y, p_Estimate, cmap=PyPlot.cm.plasma);
PyPlot.xlabel("x");
PyPlot.ylabel("y");
PyPlot.title("p(x,y) Estimate");

# Plot exact expression
fig2 = PyPlot.figure();
PyPlot.rc("text", usetex=true);
PyPlot.rc("font", family="serif")
surf(X, Y, p_Exact, cmap=PyPlot.cm.plasma);
PyPlot.xlabel("x");
PyPlot.ylabel("y");
PyPlot.title("p(x,y) Exact Expression");

Creating a reversible Markov chain using acceptance(-rejection)

The study of Markov chains is generally the study of their long term behaviour, which, under certain conditions, is captured by them having a unique stationary distribution. Stationarity is an important property. It is, in a sense, a local property of a Markov chain.

For a Markov chain, a more global property is something called reversibility. Markov chains with this property must possess a stationary distribution, which, we see below, is an immediate consequence of reversibility. Reversible Markov chains (or processes) with discrete time¹ are the cornerstone of Markov chain Monte Carlo (MCMC) methods.

In this post we look at how a reversible Markov chain is constructed from a non-reversible (but irreducible) Markov chain by introducing an acceptance-rejection step. This post complements another post I wrote on the Metropolis-Hastings algorithm.

Reversibility

A Markov process on state space $\mathbb{X}$ with kernel $K$ is (time) reversible with respect to the distribution $\mu$ if the following holds

$$ \mu(x)K(x,y) = \mu (y) K(y,x)\quad x,y\in\mathbb{X}\,.$$

This reversibility condition is also called the detailed balance equation. If this condition is met, then the Markov process will have a stationary distribution $\mu$. By summing over $x$, we can verify this because we obtain

$$ \sum_{x\in\mathbb{X}}\mu(x)K(x,y) =\mu(y)\sum_{x\in\mathbb{X}} K(y,x)=\mu(y)\,.$$

This is just the balance equation, often written as $\mu=K\mu$, which says that the transition kernel $K$ has a stationary distribution $\mu $.

First Markov chain

We consider a Markov chain with kernel $J$ defined on a finite state space $\mathbb{X}$. If the Markov chain is at state $x\in \mathbb{X}$, it visits another state $y\in \mathbb{X}$ with the probability $J(x,y)$. This is a simple time-homogeneous finite Markov chain.

Irreducibility

For our Markov chain, we assume that every state $x$ in $\mathbb{X}$ where $\pi(x)>0$ is reachable with positive probability in a single step. This implies the easy-to-achieve condition $J(x,y)>0$ where $\pi(x)>0$ for all points $x,y \in \mathbb{X}$. This requirement is a stronger form of irreducibility.

Creating a new Markov chain with acceptance

We create a new Markov chain by introducing an acceptance step. For the Markov chain with kernel $J$, we assume that each time step, after choosing the jump direction but before jumping, a biased coin is flipped . The success probability $\alpha(x,y)$ depends on the current position $x\in \mathbb{X}$ and the (potential) next position $y\in \mathbb{X}$.

Transition kernel

For our new Markov chain, we can quickly reason the transition kernel $M$. We first look at the off-diagonal elements of the kernel (matrix) $M$. To go from state $x$ and to another state $y\neq x$, the probability is simply

$$ M(x,y) = \alpha(x,y) J(x,y), \quad x\neq y\,.$$

The transition matrix $M$ needs to be stochastic, so the rows sum to one, so $\sum_{y\in\mathbb{X}}M(x,y)=1$. That gives us the diagonal elements of $M$, although their exact form is not needed to show reversibility.

$\alpha(x,y)$ needs a symmetric function $s(x,y)$

For reversibility, we just need to swap rows and columns. Clearly we only need to look at the off-diagonal entries, which implies the requirement

$$ \pi(x)M(x,y) = \pi (y) M(y,x)\quad x\neq y\,.$$

Both sides are symmetric in $x$ and $y$, meaning they are equal to some non-negative symmetric $s(x,y)=s(y,x)$. Looking at the right-hand side, we get

$$\begin{aligned}\pi(y) M(y,x)&=\pi(y)J(y,x) \alpha(y,x)\\ &= s(y,x)\,.\end{aligned}$$

This implies that the function $\alpha$ is a non-negative function such that $\alpha\leq 1$, to ensure it’s a probability, with the form

$$ \alpha(x,y)=\frac{s(x,y)}{\pi(x)J(x,y)}\,.$$

The only task remaining now is to choose a reasonable symmetric function $s$ such that $\alpha\leq 1$, ensuring $\alpha$ is a probability. Of course, our choice for the symmetric function $s$ should also be a function of the stationary distribution $\pi$ and the underlying kernel $J$.

Examples

I’ll give two principal examples of the symmetric function $s(x,y)$. Working in reverse chronological order, I’ll give the simpler of the two examples first.

Barker

A somewhat natural example is

$$s(x,y) = \frac{\pi(x)J(x,y)\pi(y)J(y,x)}{\pi(x)J(x,y)+\pi(y)J(y,x)}\,.$$

This is clearly a symmetric function, which only has the terms $\pi$ and $J$. The acceptance probability becomes

$$\alpha(x,y) = \frac{\pi(y)J(y,x)}{\pi(x)J(x,y)+\pi(y)J(y,x)}\,.$$

A.A. Barker proposed this function in a 1965 paper as part of his PhD work in mathematical physics at the University of Adelaide. Barker had been inspired by a previous 1953 paper, which brings us to the next example.

Metropolis(-Rosenbluth-Rosenbluth-Teller-Teller)-Hastings

The now most important example is

$$s(x,y) = \min[\pi(x)J(x,y),\pi(y)J(y,x)]\,.$$

We can see that this is a symmetric function. The acceptance probability becomes

$$\alpha(x,y) = \min[1,\frac{\pi(y)J(y,x)}{\pi(x)J(x,y)}]\,.$$

This example is very famous in the world of Markov chain Monte Carlo methods. It is the main part of the so-called Metropolis-Hastings algorithm, which comes from a 1953 paper by Nicholas Metropolis, Arianna W. Rosenbluth, Marshall Rosenbluth, Augusta H. Teller, and Edward Teller (two husband-wife pairs), who looked at a special case, and a 1970 paper by W.K. Hastings, who generalized the method.

The acceptance(-rejection) method for simulating random variables

In a previous post, I covered a simple but much used method for simulating random variables or, rather, generating random variates. To simulate a random variable, the method requires, in an easy fashion, calculating the inverse of its cumulative distribution function. But you cannot always do that.

In lieu of this, the great John von Neumann wrote in a 1951 paper that you can sample a sequence of values from another probability distribution, accepting only the values that meet a certain condition based on this other distribution and the desired distribution, while rejecting all the others. The accepted values will follow the desired probability distribution. This method of simulation or sampling is called the rejection method, the acceptance method, and it has even the double-barrelled name the acceptance-rejection (AR) method.

Details

Let $X$ be a continuous random variable with a (probability) density $p(x)$, which is the derivative of its cumulative probability distribution $P(X\leq x)$. The density $p(x)$ corresponds to the desired or target distribution from which we want to sample. For whatever reason, we cannot directly simulate the random variable $X$. (Maybe we cannot use the inverse method because $P(X\leq x)$ is too complicated.)

The idea that von Newman had was to assume that we can easily simulate another random variable, say, $Y$ with the (probability) density $q(x)$. The density $q(x)$ corresponds to a proposal distribution that we can sample (by using, for example, the inverse method).

Now we further assume that there exists some finite constant $M>0$ such that we can bound $p(x)$ by $Mq(x)$, meaning

$$ p(x) \leq M q(x), \text{ for all } x . $$

Provided this, we can then sample the random variable $Y$ and accept a value of it (for a value of $X$) with probability

$$\alpha = \frac{p(Y)}{Mq(Y)}.$$

If the sampled value of $Y$ is not accepted (which happens with probability $1-\alpha$), then we must repeat this random experiment until a sampled value of $Y$ is accepted.

Algorithm

We give the pseudo-code for the acceptance-rejection method suggested by von Neumann.

Random variable $X$ with density $p(x)$

Sample a random variable $Y$ with density $q(x)$, giving a sample value $y$.

Calculate the acceptance probability $\alpha = \frac{p(y)}{Mq(y)}$.

Sample a uniform random variable $U\sim U(0,1)$, giving a sample value $u$.

Return the value $y$ (for the value of $X$) if $u\leq \alpha$, otherwise go to Step 1 and repeat.

As covered in a previous post, Steps 3 and 4 are equivalent to accepting the value $y$ with probability $\alpha$.

Point process application

In the context of point processes, this method is akin to thinning point processes independently. This gives a method for positioning points non-uniformly by first placing the points uniformly. The method then thins points based on the desired intensity function. As I covered in a previous post, this is one way to simulate an inhomogeneous (or nonhomogeneous) Poisson point process.

Efficiency

Basic probability theory tells us that the number of experiment runs (Steps 1 to 3) until acceptance is a geometric variable with parameter $\alpha$. On average the acceptance(-rejection) method will take $1/\alpha$ number of simulations to sample one value of the random $X$ of the target distribution. The key then is to make the proposal density $q(x)$ as small as possible (and adjust $M$ accordingly), while still keeping the inequality $p(x) \leq M q(x)$.

Higher dimensions

The difficulty of the acceptance(-rejection) method is finding a good proposal distribution such that the product $Mq(x)$ is not much larger than the target density $p(x)$. In one-dimension, this can be often done, but in higher dimensions this becomes increasingly difficult. Consequently, this method is typically not used in higher dimensions.

Another approach with an acceptance step is the Metropolis-Hastings method, which is the quintessential Markov chain Monte Carlo (MCMC) method. This method and its cousins have become exceedingly popular, as they give ways to simulate collections of dependent random variables that have complicated (joint) distributions.

Simulating Poisson random variables with large means

There’s basically only one method for simulating Poisson variables with a small parameter value. This direct method, as I call it, uses the inter-arrival times of a homogeneous Poisson (stochastic) process, which I covered in a previous post.

But if large parameter, which coincides with its mean, is large, then this method becomes too slow, so you need to use other methods. In another post, I briefly surveyed these methods and listed different language libraries that use them, ranging from open source projects NumPy and R to industry-level libraries MLK (by Intel) and cuRand (CUDA) by Nvidia.

To simulate large-mean Poisson random variables, I coded up in Python and MATLAB (for now) two of these methods from the two respective papers:

1979, Atkinson, The computer generation of Poisson random variables;
1993, Hörmann, The transformed rejection method for generating Poisson random variable.

You can find my code here.

Again, the code I wrote is only suitable for large parameter values, where the meaning of large depends on the procedure being used, but it’s typically around 30. Consequently, any code for generating Poisson variables should have an if-statement, using the direct method for small parameter values and another method, such as the ones above, for large parameter values.

Update: In another post I presented these two algorithms in C. You can find my code here.

Algorithms

I’ll give some light details of the two methods, which are referred to as PA and PTRS algorithms in the respective papers. I suggest reading the papers for further details.

Both algorithms are (acceptance-)rejection methods. I discuss this general method, first proposed by Neumann, in another post. In context of generating Poisson variables, these methods are, according to the book by Devroye (page 502), known for being relatively simple to implement and fast to execute.

Both algorithms require calculating the log of a factorial or, equivalently, a log of a gamma function, which is done by using approximations, such those of Stirling or Lanczos. This is a common approach in general, as you’ll find that computer functions, such as those in MATLAB or Python (NumPy), that give values for gamma functions and their logs are always based on approximations of the gamma function. That means you can write your approximation for the log of a factorial or use a pre-existing function.

Algorithm PA by Atkinson (1979)

The Algorithm PA proposed by Atkinson, among other methods, is a rejection method that uses a logistic distribution as the envelope distribution. (Often such algorithms use a normal distribution as the envelop distribution.)

After writing my code, I noticed that John D. Cook gave pseudo-code of the method in this post and then presented an implementation in C# in this post. Apart from that, I have not seen any implementations of this method.

Algorithm PTRS by Hörmann (1993)

Hörmann refers to the Algorithm PTRS method as a transformed rejection method. It uses the inverse method, which I covered in a previous post, and a rejection method.

I have only seen one implementation of this algorithm. It’s written in C for the Python library NumPy; see the code here. You’ll notice my code and that C code is very similar, modulus some logarithm identities.

Possible (small) error: I noticed that in that C code, on line 591, the implementation of step 2.0 of the PTRS Algorithm has a strictly less than condition, so it’s $k <0$, whereas in the original paper (and hence my code), the condition is $k\leq 0$. Perhaps this possible error is insignificant, as the procedure is for large-valued Poisson random variables, so the $k=0$ scenario rarely happens.

Code

I only present here the code in Python, but you can go here to see the code implemented in MATLAB. I also wrote a script that tests if the generated variates (or simulated values) adhere to a Poisson distribution. This test compares the mean and variance (the ratio of which should be equal to one) and a chi-squared test.

Warning: My Python and MATLAB code is only for illustration purposes. In Python (with SciPy), MATLAB, or similar, you would use the pre-existing functions for simulating Poisson random variables.

Algorithm PA by Atkinson (1979)

# This code generates Poisson variates (or simulates Poisson variables).
# using a method designed for large (>30) Poisson parameter values.
#
# The generation method is Algorithm PA, a type of rejection method, from 
# the paper:
#
# 1979 - Atkinson - "The Computer Generation of Poisson Random Variables"
#
# In practice, you should *always* use the built-in NumPy function
# random.poisson, which (for large Poisson parameter) uses Algorithm PTRS in the
# paper:
#
# 1993 - Hörmann - "The transformed rejection method for generating Poisson
# random variables"
#
# That method is also suggested by Knuth in Volume 2 of his classic
# series "The Art of Computer Programming".
#
# INPUT:
# mu is a single Poisson parameter (or mean) such that mu>=0.
# OUTPUT:
# result_k is a single Poisson variate (that is, an instance of a Poisson random
# variable), which is a non-negative integer.
#
# Author: H. Paul Keeler, 2019.
# Website: hpaulkeeler.com
# Repository: github.com/hpaulkeeler/posts

import numpy as np;  # NumPy package for arrays, random number generation, etc
import scipy 
from getLogFac import getLogFac # type: ignore

def funPoissonLargePA(mu):
    #precalculate some Poisson-parameter-dependent numbers
    c = 0.767 - 3.36/mu;
    beta = np.pi/np.sqrt(3.0*mu);
    alpha = beta*mu;
    k = np.log(c) - mu - np.log(beta);
    log_mu=np.log(mu);

    result_n=-1; #initialize the Poisson random variable (or variate)
    while (result_n<0):
        U = np.random.uniform(0, 1, 1); #generate first uniform variable
        x = (alpha - np.log((1.0 - U)/U))/beta;

        if (x <-.5):
            continue
        else:
            V = np.random.uniform(0, 1, 1); #generate second uniform variable
            n = np.floor(x+.5);
            y = alpha - beta*x;
            #logfac_n=getLogFac(n); 
            #above can be replaced with scipy function: 
            logfac_n=scipy.special.gammaln(n+1)

            #two sides of an inequality condition
            lhs = y + np.log(V/(1.0 + np.exp(y))**2);
            rhs = k + n*log_mu- logfac_n; # NOTE: uses log factorial n

            if (lhs <= rhs):
                result_n=n;
                return result_n;
            else:
                continue;

            #end if-statement

        #end if-statement
    #end while-loop

#end function

Algorithm PTRS by Hörmann (1993)

# This code generates Poisson variates (or simulates Poisson variables).
# using a method designed for large (>10) Poisson parameter values.
#
# The generation method is Algorthm PTRS, a type of rejection method, from
# the paper:
#
# 1993 - Hörmann - "The transformed rejection method for generating Poisson
# random variables"
#
# WARNING: This code is for illustration purposes only.
#
# In practice, you should *always* use the built-in NumPy function
# random.poisson, which (for large Poisson parameter) uses Algorithm PTRS in the
# above paper.
#
# INPUT:
# mu is a single Poisson parameter (or mean) such that mu>=0.
# OUTPUT:
# result_k is a single Poisson variate (that is, an instance of a Poisson random
# variable), which is a non-negative integer.
#
# Author: H. Paul Keeler, 2019.
# Website: hpaulkeeler.com
# Repository: github.com/hpaulkeeler/posts

import numpy as np;  # NumPy package for arrays, random number generation, etc
import scipy 
from getLogFac import getLogFac # type: ignore

def funPoissonLargePTRS(mu):
    #precalculate some Poisson-parameter-dependent numbers
    b = 0.931 + 2.53 * np.sqrt(mu);
    a =  -0.059 + 0.02483 * b;
    vr = 0.9277 - 3.6224 / (b - 2);
    one_over_alpha=1.1239 + 1.1328/(b - 3.4);

    result_n=-1; #initialize the Poisson random variable (or variate)
    #Steps 1 to 3.1 in Algorithm PTRS
    while (result_n<0):
        #generate two uniform variables
        U = np.random.uniform(0, 1, 1); 
        V = np.random.uniform(0, 1, 1); 

        U=U-0.5;
        us = 0.5 -  abs(U);

        n=np.floor((2 * a / us + b) * U + mu + 0.43);

        if (us>=0.07)&(V<=vr):
            result_n = n;
            return result_n;
        #end if-statement

        if (n<=0) |((us < 0.013) & ( V> us)):
            continue
        #end if-statement

        log_mu = np.log(mu);
        #logfac_n=getLogFac(n); 
        #above can be replaced with SciPy's function: 
        logfac_n = scipy.special.gammaln(n+1);

        #two sides of an inequality condition
        lhs = np.log(V * one_over_alpha / (a/us/us + b));
        rhs = -mu + n * log_mu - logfac_n ;# NOTE: uses log factorial n

        if lhs <= rhs:
            result_n = n;
            return result_n;
        else:
            continue
        #end if-statement


    #end while-loop

#end function

Simulating Poisson random variables – Survey of methods

In this post I’ll cover the different ways for generating Poisson random variates. (I often just say simulating Poisson variables.) I’ll then list which methods different language libraries use, ranging from open source projects NumPy and R to industry-level libraries MLK (by Intel) and cuRand (CUDA) by Nvidia.

Direct method doesn’t scale well

In the previous post, I discussed how to sample or generate Poisson random variables or, more correctly, variates. I detailed a direct method that uses the fact that a Poisson stochastic process, which is directly related to a Poisson point process, has inter-arrival times that form independent and identically distributed exponential variables.

The direct method is an easy and intuitive sampling method, explaining why it is often used. I implemented the method in MATLAB, Python, C and C#, which can be found here. Later, in another post, I implemented the same Poisson sampling method in Fortran, which is located here.)

As elegant and exact as this simulation method is, it unfortunately decreases in speed as the Poisson parameter $\lambda$ increases. In a tutorial published in 1983, Brian D. Ripely, a major figure in spatial statistics, says this about the direct method:

This is simple, but has expected time proportional to $\lambda$. Some of its competitors use rejection methods with the envelope distribution that of the integer part of a continuous random variable, such as logistic, Laplace and normal mixed with exponential distributions.

We recall that acceptance-rejection or rejections methods involve simulating a random object, such as a random variable, by first simulating another random object of the same type that is easier to simulate. The simulation method then accepts or rejects these random objects based on a certain ratio. The distribution of the simpler random object that is first simulated is called the envelope distribution. Such rejection methods are one way to simulate Poisson variables.

In short, when simulating Poisson variables, the appropriate simulation algorithm should be chosen based on the Poisson parameter. Consequently, the code of most computer functions for generating Poisson variables will have an if-statement, using the direct method for small parameter values and another method for large parameter values. In addition to that, the method for large Poisson parameter values should be both fast but simple to implement.

We now consider the other methods.

Different methods

Over the years there have been different methods proposed for producing Poisson random variates. In the book Non-uniform random variate generation, Luc Devroye groups (in Section X.3 on page 502) the different methods into five categories coupled with his views. These methods are:

Direct methods based on the homogeneous Poisson stochastic process having exponential inter-arrival times. These methods are simple, but the expected time is proportional to the Poisson parameter $\lambda$.
Inversion methods that search through a table of cumulative Poisson probabilities. Examples include the papers by Fishman (1976) and Atkinson (1979)*.
Methods that use the recursive properties of the Poisson distribution. The paper by Ahrens and Dieter (1974) uses this approach, and its expected time of completion is proportional to $\log\lambda$.
Acceptance-rejection (or rejection) methods that give relatively fast but simple algorithms. Such methods are proposed in the papers by Atkinson (1979)*, Ahrens and Dieter (1980) and Devroye (1981) or the technical report by Schmeiser and Kachitvichyanukul (1981).
Acceptance-complement methods that uses a normal distribution as the starting distribution, such as the paper by Ahrens and Dieter (1982). This method is fast, but the code is rather long.

*Atkinson had (at least) two papers on generating Poisson variates published in 1979, but I believe Devroye is referring to the first paper, because in the second paper Atkinson compares methods proposed by others.

For the paper titles, see the Further reading section below.

Code

In a separate post, I present implementations in Python and MATLAB of algorithms found respectively in the papers by Atkinson (1979) and Hörmann (1991). But these are only for illustration purposes, as Python (with SciPy) and MATLAB only have good functions for generating Poisson variables.

Methods implemented in popular libraries

I’ll now state which methods are used in various programming languages and numerical methods. I won’t go into the details how the methods work, just citing the papers instead.

MATLAB

For small $\lambda$ values, the MATLAB function poissrnd uses the direct method (based on inter-arrival times) with a while-loop.

For $\lambda$ values greater than fifteen, I believe that the MATLAB function poissrnd uses Algorithm PG from the 1974 paper by Ahrens and Dieter, which uses the the generation of gamma and binomial random variates.

But to come to this conclusion, I had to do some investigating. You can skip to the next section if you’re not interested, but now I’ll explain my reasoning.

The MATLAB documentation says it uses a method proposed by Ahrens and Dieter, but these two researchers developed a number of methods for generating Poisson variables. The MATLAB code cites Volume 2 of the classic series by Knuth, who says the method is due to Ahrens and Dieter, but he doesn’t give an exact citation in that section of the book. Confusingly, Knuth cites in his book a couple papers by Ahrens and Dieter for generating different random variates. (Knuth later cites a seemingly relevant 1980 paper by Ahrens and Dieter, but that details another method.)

Both the MATLAB code and Knuth cite the book by Devroye. In his book (Exercise 3.5.2), Devroye discusses one method, among others, from a 1974 paper by Ahrens and Dieter. Another hint is given by examining the code of the MATLAB function poissrnd, which reveals that it uses the function randg to generate gamma variables. In the Ahrens and Dieter 1974 paper, their Algorithm PG (for producing Poisson variates) uses gamma random variables, and it’s suggested to use a parameter value of $7/8$. This is the same parameter used in the MATLAB code and mentioned by Knuth, confirming that this is the right paper by Ahrens and Dieter.

In summary, for large $\lambda$ the function MATLAB uses Algorithm PG from the 1974 paper by Ahrens and Dieter, whereas for small values it uses the direct method, which they refer to as the multiplication method.

R

In R, the function rpois use an algorithm outlined in the 1982 paper by Ahrens and Dieter. You can view the R source code here. The two cases for $\lambda$ (or $\mu$ in the paper) depend on whether $\lambda$ is greater than ten or not. For small $\lambda$, the R function rpois does not use the method based on inter-arrival times, but rather an inversion method based on a table of (cumulative) probabilities given by the Poisson probability distribution.

Python (NumPy)

In NumPy, the function numpy.random.poisson generates Poisson variates. The source code for the NumPy library is here, but for the Poisson function the underlying code is actually written in C; see the distributions.c file located here. For small Poisson parameter $\lambda$, the code uses the direct method; see the function random_poisson_mult in the code.

For Poisson parameter $\lambda \geq 10$, the comments in the code reveal that it uses a method from a 1993 paper by Hörmann; see Algorithm PTRS on page 43 of the paper. This is a transformation method, which for NumPy is implemented in the C code as the function random_poisson_ptrs. The method, which Hörmann calls the transformed rejection with squeeze, combines inversion and rejection methods.

Octave

Octave is intended to be a GNU clone of MATLAB, so you would suspect it uses the same methods as MATLAB for generating Poisson random variates. But the Octave function poissrnd uses different methods. The code reveals it generates the Poisson variates with a function called prand. It considers different cases depending on the value of the Poisson parameter $\lambda$ as well as whether a single variable (that is, a scalar) or vector or matrix of Poisson variates are being generated.

In total, the Octave function prand uses five different methods. For two of the methods, the documentation cites methods from the classic book Numerical Recipes in C (the 1992 edition); see next section. To generate a single Poisson variate with Poisson parameter $\lambda \leq 12$, the Octave function prand uses the direct method based on inter-arrival times.

Numerical Recipes (Fortran, C and C++)

The book Numerical Recipes is a classic by Press, Teukolsky, Vetterling and Flannery on numerical methods. The books comes in different editions reflecting different publication years and computer languages. (In the first two editions of the book, the authors implemented the algorithms respectively in Fortran and C.)

For generating Poisson variates, the book contents seems to have not changed over the editions that I looked at, which covered the programming languages Fortran (77 and 90), C, and C++. The authors cover Poisson generation in Section 7.3 in the Fortran and C editions. In the third edition of Numerical Recipes, they implement their methods in C++ in Section 7.3.12.

For small values of Poisson parameter $\lambda$, Numerical Recipes uses the direct method. For $\lambda >12$ values, an acceptance-rejection method is used, which relies upon finding a continuous version of the discrete Poisson probability distribution.

GSL Library (C)

In the GSL library, one can use the function gsl_ran_poisson, which uses the the direct method of exponential times. The code, which can be viewed here, cites simply Knuth (presumably the second volume). But it seems to use the aforementioend Algorithm PG (for producing Poisson variates) from the 1974 paper by Ahrens and Dieter 1974; see the section above on MATLAB.

NAG Library (C)

Although I didn’t see the code, it appears that the function nag_rand_poisson (g05tjc ) in the NAG library also uses the direct method, based on the material in the second volume of series by Knuth. But in a 1979 paper Atkinson says that the NAG library uses a method from the 1974 paper by Ahrens and Dieter.

Boost library Random (C++)

The Boost library Random uses the PTRD algorithm proposed in the 1993 paper by Hörmann to generate Poisson variates; see Algorithm PTRD on page 42 of the paper. In the same paper appears the PTRS method, which is used by Python (NumPy) (though implemented in C), as mentioned above.

MKL library (C)

In the MKL C library written by Intel, there seems to be three methods in use for generating Poisson variates.

The first function is called VSL_RNG_METHOD_POISSON_PTPE, which does the following for a Poisson distribution with parameter $\Lambda$:

If Λ ≥ 27, random numbers are generated by PTPE method. Otherwise, a combination of inverse transformation and table lookup methods is used. The PTPE method is a variation of the acceptance/rejection method that uses linear (on the fraction close to the distribution mode) and exponential (at the distribution tails) functions as majorizing functions. To avoid time-consuming acceptance/rejection checks, areas with zero probability of rejection are introduced and a squeezing technique is applied.

This function uses the so-called PTPE method, which is outlined in a 1981 technical report by Schmeiser and Kachitvichyanukul.

The second function is called VSL_RNG_METHOD_POISSON_POISNORM, which does the following :

If Λ < 1, the random numbers are generated by combination of inverse transformation and table lookup methods. Otherwise, they are produced through transformation of the normally distributed random numbers.

The third function is called VSL_RNG_METHOD_POISSONV_POISNORM, which does the following:

If Λ < 0.0625, the random numbers are generated by inverse transformation method. Otherwise, they are produced through transformation of normally distributed random numbers.

cuRAND (C)

Finally, there is the CUDA Random Number Generation library (cuRAND) developed by Nvidia for their (now ubiquitous) graphical processing units (GPUs). This C/C++ library has a function for generating Poisson variates. To see the C code, copies of it can be found in various GitHub repositories, such as this one. The cuRAND function curand_poisson uses the direct function for Poisson parameter values less than 64. For parameters values greater than 4000, it uses a normal approximation (rounded to the nearest integer).

For other values, the function curand_poisson uses a rejection method based on an approximation of the incomplete gamma function; see the function curand_poisson_gammainc. The book by Fishman is cited; see Section 8.16.

Simulating Poisson random variables – Direct method

If you were to write from scratch a program that simulates a homogeneous Poisson point process, the trickiest part would be the random number of points, which requires simulating a Poisson random variable. In previous posts on simulating this point process, such as this one and this one, I’ve simply used the inbuilt functions for simulating (or generating) Poisson random variables (or variates).¹

But how would one create such a Poisson function using just a standard uniform random variate generator? In this post I present my own Poisson simulation code in MATLAB, Python, C and C#, which can be found here.

The method being used depends on the value of the Poisson parameter, denoted here by $\lambda$, which is the mean (as well as the variance) of a random variable with a Poisson distribution. If this parameter value is small, then a direct simulation method can be used to generate Poisson random variates. In practice a small Poisson parameter is a number less than some number between 10 to 30.

For large $\lambda$ values, other methods are generally used, such as rejection or (highly accurate) approximation methods. In the book Non-uniform random variate generation, the author Luc Devroye groups the methods into five categories (Section X.3.2), which I briefly describe in the next post. The first of those categories covers the method that I mentioned above. I will cover that method in this post, presenting some Poisson sampling code in C and C#. (I will also present some code in MATLAB, but you would never use it instead of the the inbuilt function poissrnd.)

In the next post, I’ll describe other types of Poisson simulation methods, and I’ll detail which simulation methods various programming libraries use.

Direct method

An elegant and natural method for simulating Poisson variates is to use a result based on the homogeneous Poisson stochastic process. The points in time when a homogeneous Poisson stochastic process increases forms a Poisson point process on the real line. ²

Using exponential random variables

Here’s the algorithm for sampling Poisson variables with exponential random variables, which I’ll explain.

Sample Poisson random variable $N$ with parameter (mean) $\lambda$ using exponential random variables

Set count variable $N=0$ and initial sum variable $S=0$;

While $S<1$:

Sample uniform random variable $U\sim U(0,1)$;

Calculate $E= -\log(U)/\lambda $ ;

Update count and sum variables by setting $N\rightarrow N+1$ and $S\rightarrow S+E$;

Return N;

The point in time when the Poisson stochastic process increases are called arrival times or occurrence times. In classic random models they represent the arrivals or occurrences of something, such as phone calls over time. The differences between consecutive times are called inter-arrival times or inter-occurrence times. The inter-arrival times of a homogeneous Poisson process form independent exponential random variables, a result known as the Interval Theorem.

Using this connection to the Poisson stochastic process, we can generate exponential variables $E_1$, $E_2, \dots $, and add them up. The smallest number of exponential variables for the resulting sum to exceeds one will give a Poisson random variable. That is, if we define $N$ to be the smallest $n$ such that
$$ \sum_{k=1}^{n+1} E_k > 1, $$
then $N$ is a random variable distributed according to a Poisson distribution.

Generating exponential variates is easily done by using the inverse method. For a uniform random variable $U$ on the unit interval $(0,1)$, the transformation $E= -\log(U)/\lambda $ gives an exponential random variable with mean $1/\lambda$.

But we can skip generating exponential random variates.

Using uniform random variables

Here’s the algorithm for sampling Poisson variables with uniform random variables.

Sample Poisson random variable $N$ with parameter (mean) $\lambda$ using uniform random variables

Set count variable $N=0$ and initial product variable $P=1$;

While $P>e^{-\lambda}$:

Sample uniform random variable $U\sim U(0,1)$;

Update count and product variables by setting $N\rightarrow N+1$ and $P\rightarrow P\times U$;

Return N;

To reduce computations, the direct method using exponential random variables is often reformulated as products of uniform random variables. We can do this, due to logarithmic identities, and work with products of uniform variables instead of sums of exponential random variables.

Then, by using standard uniform random variables $U_1, U_2,\dots$, we define $N$ to be the smallest $n$ such that
$$ \prod_{k=1}^{n+1} U_k < e^{-\lambda}. $$ These two different formulations of the same method are captured by Lemma 3.2 and Lemma 3.3 in Chapter 10 of Devroye’s book.

Example in MATLAB

Warning: My online webpage editor tends to mangle symbols like < and >, so it’s best not to copy my code straight from this website, unless you check and edit it, and download my code directly from here.

In MATLAB, we can implement this method with the first formulation in a function with a simple while-loop:

function N=funPoissonLoop(lambda)
T=0; %initialize sum of exponential variables as zero
n=-1;%initialize counting variable as negative one

while (T <1)
E=-(1/lambda)*log(rand(1));%generate exponential random variable
T=T+E; %update sum of exponential variables
n=n+1; %update number of exponential variables
end
N=n;
end

But, as I said before, don’t use this code instead of the inbuilt function poissrnd.

If you want to be a bit more tricky, you could achieve the same result by using recursion:

function N=funPoissonRecursive(lambda)
T=0; %initialize sum of exponential variables as zero
n=-1; %initialize counting variable as negative one

%run (recursive) exponential function step function
[~,N]=funStepExp(lambda,T,n);

function [T,N]=funStepExp(nu,S,m)
if (S < 1)
%run if sum of exponential variables is not high enough

%generate exponential random variable
E=(-log(rand(1)))/nu;
S=S+E; %update sum of exponential variables
m=m+1; %update nunber of exponential variables

%recursively call function again
[T,N]=funStepExp(nu,S,m);
else
T=S;
N=m;
end
end
end

Note how the code recursively calls the function funStepExp, which generates an exponential variable each time.

In the Code section below I describe my code in C and C#, using the second formulation.

Origins

Some people attribute the direct method, based on inter-arrival times, to (or, at least, cite) Donald Knuth, who details it in the second volume of his classic series of books, but I doubt that the great Knuth was the first to have this idea. For example, a quick search on Google Scholar found a paper by K. D. Tocher on computers and random sampling, where Tocher proposes the direct method in 1954, some years before Knuth started publishing his classic series.

The direct method for Poisson sampling relies upon the Interval theorem. The Poisson point process expert Günter Last studied the origins of this fundamental result. He presented its history in a recent book authored by him and Matthew Penrose; see Chapter 7 and its corresponding historical footnotes in Section C of the appendix. (A free version of the book can be found here. ) People connected to the result include Robert Ellis and William Feller who respectively lived in the 19th and 20th centuries.

Other methods

The direct method perfectly generates Poisson random variables (or I should say Poisson random variates). But it can be too slow for large values of the Poisson parameter (that, is the mean) $\lambda$. This has motivated researchers to develop other methods, which I will mention in the next post.

Code

I wrote some code that simulates Poisson random variables by employing the direct method based on exponential inter-arrival times. As always, all my the code is online, with the code from this post being located here.

I have implemented the second formulation (using just uniform variables) in the C and C# languages. In the code, I have used a while-loop to implement the method. But I could have also used a recursion method, as I did in the MATLAB example above, which I have also done in Python (with NumPy).

For an empirical test, the code also calculates the mean and variance of a collection of Poisson variables. For a large enough number of variables, the sample mean and the variance will closely agree with each other, converging to the same value.

C

Warning: My C code uses rand(), the standard pseudo-random number function in C, which is known for failing certain tests of randomness. The function is adequate for regular simulation work. But it gives poor results for large number of simulations. Replace this function with another pseudo-random number generator.

The code for generating a single Poisson variate is fairly straightforward in C. Here’s a sample of the code with just the Poisson function:

//Poisson function -- returns a single Poisson random variable
int funPoissonSingle(double lambda)
{
double exp_lambda = exp(-lambda); //constant for terminating loop
double randUni; //uniform variable
double prodUni; //product of uniform variables
int randPoisson; //Poisson variable

//initialize variables
randPoisson = -1;
prodUni = 1;
do
{
randUni = funUniformSingle(); //generate uniform variable
prodUni = prodUni * randUni; //update product
randPoisson++; //increase Poisson variable

} while (prodUni > exp_lambda);
return randPoisson;
}

For generating multiple variates, the code becomes more complicated, as one needs to use pointers, due to the memory capabilities of C. Again, the function uses the pseudo-random number generator in C.

C#

The code for generating a single Poisson variate is also straightforward in C#. Here’s the function in C#:

//Poisson function -- returns a single Poisson random variable
public int funPoissonSingle (double lambda) {
double exp_lambda = Math.Exp (-lambda); //constant for terminating loop
double randUni; //uniform variable
double prodUni; //product of uniform variables
int randPoisson; //Poisson variable

//initialize variables
randPoisson = -1;
prodUni = 1;
do {
randUni = funUniformSingle (); //generate uniform variable
prodUni = prodUni * randUni; //update product
randPoisson++; // increase Poisson variable

} while (prodUni > exp_lambda);

return randPoisson;
}

Generalizing the code so it generates multiple variates just requires a little change, compared to C, as the C# language is a much more modern language.

Fortran

After this original post, I later wrote a post about implementing the same Poisson algorithm in Fortran. My Fortran code is very similar to the code that I wrote in C and C#. You should be able to run it on this website or similar ones that can compile Fortran (95) code.

Simulating an inhomogeneous Poisson point process

In previous posts I described how to simulate homogeneous Poisson point processes on a rectangle, disk and triangle. But here I will simulate an inhomogeneous or nonhomogeneous Poisson point process. (Both of these terms are used, where the latter is probably more popular, but I prefer the former.) For such a point process, the points are not uniformly located on the underlying mathematical space on which the Poisson process is defined. This means that certain regions will, on average, tend to have more (or less) points than other regions of the underlying space.

Basics

Any Poisson point process is defined with a non-negative measure called the intensity or mean measure. I make the standard assumption that the intensity measure $\Lambda$ has a derivative $\lambda(x,y)$. (I usually write a single $x$ to denote a point on the plane, that is $x\in \mathbb{R}^2$, but in this post I will write the $x$ and $y$ and coordinates separately.) The function $\lambda(x,y)$ is often called the intensity function or just intensity, which I further assume is bounded, so $\lambda(x,y)<\infty$ for all points in a simulation window $W$. Finally, I assume that the simulation window $W$ is a rectangle, but later I describe how to lift that assumption.

Number of points

To simulate a point process, the number of points and the point locations in the simulation window $W$ are needed. For any Poisson point process, the number of points is a Poisson random variable with a parameter (that is, a mean) $\Lambda(W)$, which under our previous assumptions is given by the integral

$$\Lambda(W)=\int_W \lambda(x,y)dxdy. $$

Assuming we can evaluate such an integral analytically or numerically, then the number of points is clearly not difficult to simulate.

Locations of points

The difficulty lies in randomly positioning the points. But a defining property of the Poisson point process is its independence, which allows us to treat each point completely independently. Positioning each point then comes down to suitably simulating two (or more) random variables for Poisson point processes in two (or higher) dimensions. Similarly, the standard methods used for simulating continuous random variables can be applied to simulating random point locations of a Poisson point process.

In theory, you can rescale the intensity function with the total measure of the simulation window, giving

$$f(x,y):=\frac{\lambda(x,y)}{\Lambda(W)}. $$

We can then interpret this rescaled intensity function $f(x,y)$ as the joint probability density of two random variables $X$ and $Y$, because it integrates to one,

$$\int_W f(x,y)dxdy=1.$$

Clearly the method for simulating an inhomogeneous Poisson point process depends on the nature of the intensity function. For the inhomogeneous case, the random variables $X$ and $Y$ are, in general, not independent.

Transformation

To simulate an inhomogeneous Poisson point process, one method is to first simulate a homogeneous one, and then suitably transform the points according to deterministic function. For simple random variables, this transformation method is quick and easy to implement, if we can invert the probability distribution. For example, a uniform random variable $U$ defined on the interval $(0,1)$ can be used to give an exponential random variable by applying the transformation $h(u)= -(1/\lambda)\log(u)$, where $\lambda>0$, meaning $h(U)$ is an exponential random variable with parameter $\lambda>0$ (or mean $1/\lambda$).

Similarly for Poisson point processes, the transformation approach is fairly straightforward in a one-dimensional setting, but generally doesn’t work easily in two (or higher) dimensions. The reason being that often we cannot simulate the random variables $X$ and $Y$ independently, which means, in practice, we need first to simulate one random variable, then the other.

It is a bit easier if we can re-write the rescaled intensity function or joint probability density $f(x,y)$ as a product of single-variable functions $f_X(x)$ and $f_Y(y)$, meaning the random variables $X$ and $Y$ are independent. We can then simulate independently the random variables $X$ and $Y$, corresponding to the $x$ and $y$ coordinates of the points. But this would still require integrating and inverting the functions.

Markov chain Monte Carlo

A now standard way to simulate jointly distributed random variables is to use Markov chain Monte Carlo (MCMC), which we can also use to simulate the the $X$ and $Y$ random variables. Applying MCMC methods is simply applying random point process operations repeatedly to all the points. But this is a bit too tricky and involved. Instead I’ll use a general yet simpler method based on thinning.

Thinning

The thinning method is the arguably the simplest and most general way to simulate an inhomogeneous Poisson point process. If you’re unfamiliar with thinning, I recommend my previous post on thinning and the material I cite.

This simulation method is simply a type of acceptance-rejection method for simulating random variables. More specifically, it is the acceptance-rejection or rejection method, attributed to the great John von Neumann, for simulating a continuous random variable, say $X$, with some known probability density $f(x)$. The method accepts/retains or rejects/thins the outcome of each random variable/point depending on the outcome of a uniform random variable associated with each random variable/point.

The thinning or acceptance-rejection method is also appealing because it is an example of a perfect simulation method, which means the distribution of the simulated random variables or points will not be an approximation. This can be contrasted with typical MCMC methods, which, in theory, reach the desired distribution of the random variables in infinite time, which is clearly not possible in practice.

Simulating the homogeneous Poisson point process

First simulate a homogeneous Poisson point process with intensity value $\lambda^*$, which is an upper bound of the intensity function $\lambda(x,y)$. The simulation step is the easy part, but what value is $\lambda^*$?

I will use the maximum value that the intensity function $\lambda(x,y)$ takes, which I denote by

$$ \lambda_{\max}:=\max_{(x,y)\in W}\lambda(x,y),$$

so I set $\lambda^*=\lambda_{\max}$. Of course with $\lambda^*$ being an upper bound, you can use any larger $\lambda$-value, so $\lambda^*\geq \lambda_{\max}$, but that just means more points will need to be thinned.

Scientific programming languages have implemented algorithms that find or estimate minima of mathematical functions, meaning such an algorithm just needs to find the $(x,y)$ point that gives the minimum value of $-\lambda(x,y)$, which corresponds to the maximum value of $\lambda(x,y)$. What is very important is that the minimization procedure can handle constraints on the $x$ and $y$ values, which in our case of a rectangular simulation window $W$ are sometimes called box constraints.

Thinning the Poisson point process

All we need to do now is to thin the homogeneous Poisson point process with the thinning probability function

$$1-p(x,y)=\frac{\lambda(x,y)}{\lambda^*}.$$

This will randomly remove the points so the remaining points will form a inhomogeneous Poisson point process with intensity function
$$ (1-p(x,y))\lambda^* =\lambda(x,y).$$
As a result, we can see that provided $\lambda^*\geq \lambda_{\max}>0$, this procedure will give the right intensity function $\lambda(x,y)$. I’ll skip the details on the point process still being Poisson after thinning, as I have already covered this in the thinning post.

Empirical check

You can run an empirical check by simulating the point process a large number (say $10^3$ or $10^4$) of times, and collect statistics on the number of points. As the number of simulations increases, the average number of points should converge to the intensity measure $\Lambda(W)$, which is given by (perhaps numerically) evaluating the integral

$$\Lambda(W)=\int_W \lambda(x,y)dxdy.$$

This is a test for the intensity measure, a type of first moment, which will work for the intensity measure of any point process. But for Poisson point processes, the variance of the number of points will also converge to intensity measure $\Lambda(W)$, giving a second empirical test based on second moments.

An even more thorough test would be estimating an empirical distribution (that is, performing and normalizing a histogram) on the number of points. These checks will validate the number of points, but not the positioning of the points. In my next post I’ll cover how to perform these tests.

Results

The homogeneous Poisson point process with intensity function $\lambda(x)=100\exp(-(x^2+y^2)/s^2)$, where $s=0.5$. The results look similar to those in the thinning post, where the thinned points (that is, red circles) are generated from the same Poisson point process as the one that I have presented here.

MATLAB

Python

Method extensions

We can extend the thinning method for simulating inhomogeneous Poisson point processes a couple different ways.

Using an inhomogeneous Poisson point process

The thinning method does not need to be applied to a homogeneous Poisson point process with intensity $\lambda^*$. In theory, we could have simulated a suitably inhomogeneous Poisson point process with intensity function $\lambda^*(x,y)$, which has the condition

$$ \lambda^*(x,y)\geq \lambda(x,y), \quad \forall (x,y)\in W .$$

Then this Poisson point process is thinned. But then we would still need to simulate the underlying Poisson point process, which often would be as difficult to simulate.

Partitioning the simulation window

Perhaps the intensity of the Poisson point process only takes two values, $\lambda_1$ and $\lambda_2$, and the simulation window $W$ can be nicely divided or partitioned into two disjoints sets $B_1$ and $B_2$ (that is, $B_1\cap B_2=\emptyset$ and $B_1\cup B_2=W$), corresponding to the subregions of the two different intensity values. The Poisson independence property allows us to simulate two independent Poisson point processes on the two subregions.

This approach only works for a piecewise constant intensity function. But if if the intensity function $\lambda(x)$ varies wildly, the simulation window can be partitioned into subregions $B_1\dots,B_m$ for different ranges of the intensity function $\lambda(x)$. This allows us to simulate independent homogeneous Poisson point processes with different densities $\lambda^*_1\dots, \lambda^*_m$, where for each subregion $B_i$ we set

$$ \lambda^*_i:=\max_{x\in B_i}\lambda(x,y).$$

The resulting Poisson point processes are then suitably thinned, resulting in a more efficient simulation method. (Although I imagine the gain would often be small.)

Non-rectangular simulation windows

If you want to simulate on non-rectangular regions, which is not a disk or triangle, then the easiest way is to simulate a Poisson point process on a rectangle $R$ that completely covers the simulation window, so $W \subset R\subset \mathbb{R}^2$, and then set the intensity function $\lambda $ to zero for the region outside the simulation window $W$, that is $\lambda (x,y)=0$ when $(x,y)\in R\setminus W$.

Code

All code from my posts, as always, can be found on the my GitHub repository. The code for this post is located here. You can see that the code is very similar to that of the thinning code, which served as the foundation for this code. (Note how we now keep the points, so in the code the > has become < on the line where the uniform variables are generated).

I have implemented the code in MATLAB and Python with an intensity function $\lambda(x,y)=100\exp(-(x^2+y^2)/s^2)$, where $s>0$ is a scale parameter. Note that in the minimization step, the box constraints are expressed differently in MATLAB and Python: MATLAB first takes the minimum values then the maximum values, whereas Python first takes the $x$-values then the $y$-values.

The code presented here does not have the empirical check, which I described above, but it is implemented in the code located here. For the parameters used in the code, the total measure is $\Lambda(W)\approx 77.8068$, meaning each simulation will generate on average almost seventy-eight points.

I have stopped writing code in R for a couple of reasons, but mostly because anything I could think of simulating in R can already be done in the spatial statistics library spatstat. I recommend the book Spatial Point Patterns, co-authored by the spatstat’s main contributor, Adrian Baddeley.

MATLAB

I have used the fmincon function to find the point that gives the minimum of $-\lambda(x,y)$.

%Simulation window parameters
xMin=-1;xMax=1;
yMin=-1;yMax=1;
xDelta=xMax-xMin;yDelta=yMax-yMin; %rectangle dimensions
areaTotal=xDelta*yDelta; %area of rectangle

s=0.5; %scale parameter

%Point process parameters
fun_lambda=@(x,y)(100*exp(-((x).^2+(y).^2)/s^2));%intensity function

%%%START -- find maximum lambda -- START %%%
%For an intensity function lambda, given by function fun_lambda,
%finds the maximum of lambda in a rectangular region given by
%[xMin,xMax,yMin,yMax].
funNeg=@(x)(-fun_lambda(x(1),x(2))); %negative of lambda
%initial value(ie centre)
xy0=[(xMin+xMax)/2,(yMin+yMax)/2];%initial value(ie centre)
%Set up optimization step
options=optimoptions('fmincon','Display','off');
%Find largest lambda value
[~,lambdaNegMin]=fmincon(funNeg,xy0,[],[],[],[],...
[xMin,yMin],[xMax,yMax],'',options);
lambdaMax=-lambdaNegMin;
%%%END -- find maximum lambda -- END%%%

%define thinning probability function
fun_p=@(x,y)(fun_lambda(x,y)/lambdaMax);

%Simulate Poisson point process
numbPoints=poissrnd(areaTotal*lambdaMax);%Poisson number of points
xx=xDelta*(rand(numbPoints,1))+xMin;%x coordinates of Poisson points
yy=xDelta*(rand(numbPoints,1))+yMin;%y coordinates of Poisson points

%calculate spatially-dependent thinning probabilities
p=fun_p(xx,yy);

%Generate Bernoulli variables (ie coin flips) for thinning
booleRetained=rand(numbPoints,1)<p; %points to be thinned

%x/y locations of retained points
xxRetained=xx(booleRetained); yyRetained=yy(booleRetained);

%Plotting
plot(xxRetained,yyRetained,'bo'); %plot retained points
xlabel('x');ylabel('y');

The box constraints for the optimization step were expressed as:

[xMin,yMin],[xMax,yMax]

Python

I have used the minimize function in SciPy.

import numpy as np; #NumPy package for arrays, random number generation, etc
import matplotlib.pyplot as plt #For plotting
from scipy.optimize import minimize #For optimizing
from scipy import integrate

#Simulation window parameters
xMin=-1;xMax=1;
yMin=-1;yMax=1;
xDelta=xMax-xMin;yDelta=yMax-yMin; #rectangle dimensions
areaTotal=xDelta*yDelta;

s=0.5; #scale parameter

#Point process parameters
def fun_lambda(x,y):
return 100*np.exp(-(x**2+y**2)/s**2); #intensity function

###START -- find maximum lambda -- START ###
#For an intensity function lambda, given by function fun_lambda,
#finds the maximum of lambda in a rectangular region given by
#[xMin,xMax,yMin,yMax].
def fun_Neg(x):
return -fun_lambda(x[0],x[1]); #negative of lambda

xy0=[(xMin+xMax)/2,(yMin+yMax)/2];#initial value(ie centre)
#Find largest lambda value
resultsOpt=minimize(fun_Neg,xy0,bounds=((xMin, xMax), (yMin, yMax)));
lambdaNegMin=resultsOpt.fun; #retrieve minimum value found by minimize
lambdaMax=-lambdaNegMin;
###END -- find maximum lambda -- END ###

#define thinning probability function
def fun_p(x,y):
return fun_lambda(x,y)/lambdaMax;

#Simulate a Poisson point process
numbPoints = np.random.poisson(lambdaMax*areaTotal);#Poisson number of points
xx = np.random.uniform(0,xDelta,((numbPoints,1)))+xMin;#x coordinates of Poisson points
yy = np.random.uniform(0,yDelta,((numbPoints,1)))+yMin;#y coordinates of Poisson points

#calculate spatially-dependent thinning probabilities
p=fun_p(xx,yy);

#Generate Bernoulli variables (ie coin flips) for thinning
booleRetained=np.random.uniform(0,1,((numbPoints,1)))<p; #points to be thinned

#x/y locations of retained points
xxRetained=xx[booleRetained]; yyRetained=yy[booleRetained];

#Plotting
plt.scatter(xxRetained,yyRetained, edgecolor='b', facecolor='none', alpha=0.5 );
plt.xlabel("x"); plt.ylabel("y");
plt.show();

The box constraints were expressed as:

(xMin, xMax), (yMin, yMax)

Julia

After writing this post, I later wrote the code in Julia. The code is here and my thoughts about Julia are here.

Thinning point processes

One way to create new point processes is to apply thinning to a point process. As I mentioned in a previous post on point process operations, thinning is a random operation applied to the points of an underlying point process, where the points are thinned (or removed) or retained (or kept) according to some probabilistic rule. Both the thinned and retained points form two separate point processes, but one usually focuses on the retained points. Given an underlying point process, the nature of the thinning rule will result in different types of point processes.

As I detailed in the Applications section below, thinning can be used to simulate an inhomogeneous Poisson point process, as I covered in another post.

Thinning types

Thinning can be statistically independent or dependent, meaning that the probability of thinning any point is either independent or dependent of thinning other points. The more tractable case is statistically independent thinning, which is the thinning type covered here. We can further group this thinning into two types based on whether the thinning rule depends on the locations of the point. (I use the word location, instead of point, to refer to where a point of a point process is located on the underlying mathematical space on which the point process is defined.)

Spatially independent thinning

The simplest thinning operation is one that does not depend on point locations. This thinning is sometimes referred to as $p$-thinning, where the constant $p$ has the condition $0\leq p \leq 1$ because it is the probability of thinning a single point. Simply put, the probability of a point being thinned does not depend on the point locations.

Example

We can liken the thinning action to flipping a biased coin with probability of $p$ for heads (or tails) for each point of the underlying point process, and then removing the point if a head (or tails) occurs. If there were a constant number $n$ of points of the underlying point process, then the number of thinned (or retained) points will form a binomial random variable with parameters $n$ and $p$ (or $1-p$).

Simulation

Simulating this thinning operation is rather straightforward. Given a realization of a point process, for each point, simply generate or simulate a uniform random variable on the interval $(0,1)$, and if this random variable is less than $p$, remove the point. (This is simply sampling a Bernoulli distribution, which is covered in this post.)

In the code section below, I have shown how this thinning operation is implemented.

Spatially dependent thinning

To generalize the idea of $p$-thinning, we can simply require that the thinning probability of any point depends on its location $x$, which gives us the concept of $p(x)$-thinning. (I write a single $x$ to denote a point on the plane, that is $x\in \mathbb{R}^2$, instead of writing, for example, the $x$ and $y$ and coordinates separately.) More precisely, the probability of thinning a point is given by a function $p(x)$ such that $0 \leq p(x)\leq 1$, but all point thinnings occur independently of each other. In other words, this is a spatially dependent thinning that is statistically independent.

Example

I’ll illustrate the concept of (statistically independent) spatially dependent thinning with a somewhat contrived example. We assume that the living locations of all the people in the world form a point process on a (slightly squashed) sphere. Let’s say that Earth has become overpopulated, particularly in the Northern Hemisphere, so we decide to randomly choose people and send them off to another galaxy, but we do it based on how far they live from the North Pole. The thinning rule could be, for example, $p(x)= \exp(- |x|^2/s^2)$, where $|x|$ is the distance to the North Pole and $s>0$ is some constant for distance scaling.

Put another way, a person at location $x$ flips a biased coin with the probability of heads being equal to $p(x)=\exp(- |x|^2/s^2)$. If a head comes up, then that person is removed from the planet. With the maximum of $p(x)$ is at the North Pole, we can see that the lucky (or unlucky?) people in countries like Australia, New Zealand (or Aotearoa), South Africa, Argentina and Chile, are more likely not to be sent off (that is, thinned) into the great unknown.

For people who live comparable distances from the North Pole, the removal probabilities are similar in value, yet the events of being remove remain independent. For example, the probabilities of removing any two people from the small nation Lesotho are similar in value, but these two random events are still completely independent of each other.

Simulation

Simulating a spatially dependent thinning is just slightly more involved than the spatially independent case. Given a realization of a point process, for each point at, say, $x$, simply generate or simulate a uniform random variable on the interval $(0,1)$, and if this random variable is less than $p(x)$, remove the point.

In the code section, I have shown how this thinning operation is implemented with an example like the above one, but on a rectangular region of Cartesian space. In this setting, the maximum of $p(x)$ is at the origin, resulting in more points being thinned in this region.

Thinning a Poisson point process

Perhaps not surprisingly, under the thinning operation the Poisson point process exhibits a closure property, meaning that a Poisson point process thinned in a certain way gives another Poisson point process. More precisely, if the thinning operation is statistically independent, then the resulting point process formed from the retained points is also a Poisson point process, regardless if it is spatially independent or dependent thinning. The resulting intensity (interpreted as the average density of points) of this new Poisson point process has a simple expression.

Homogeneous case

For a spatially independent $p$-thinning, if the original (or underlying) Poisson point process is homogeneous with intensity $\lambda$, then the point process formed from the retained points is a homogeneous Poisson point process with intensity $\lambda$. (There are different ways to prove this, but you can gain some intuition behind the proof by conditioning on the Poisson number of points and then applying the total law of probability. Using generating functions helps.)

Inhomogeneous case

More generally, if we apply a spatially dependent $p(x)$-thinning to a Poisson point process has a intensity $\lambda$, then the retained points form a an inhomogeneous or nonhomogeneous Poisson point process with $\lambda p(x)$, due to the spatial dependence in the thinning function $p(x)$. This gives a way to simulate such Poisson point processes, which I’ll cover in another post.

Splitting

We can see by symmetry that if we look at the thinned points, then the resulting point process is also a Poisson point process, but with intensity $(1-p(x))\lambda$. The retained and thinned points both form Poisson point processes, but what is really interesting is these two point processes are independent of each other. This means that any random configuration that occurs among the retained points is independent of any configurations among the thinned points.

This ability to split a Poisson point processes into independent ones is sometimes called the splitting property.

Applications

Thinning point processes has the immediate application of creating new point processes. It can also be used to randomly generate two point processes from one. In network applications, a simple example is using the thinning procedure to model random sleep schemes in wireless networks, where random subsets of the network have been powered down.

Perhaps the most useful application of thinning is creating point processes with spatially-dependent intensities such that of an inhomogeneous Poisson point process. In another post I give details on how to simulate this point process. In this setting, the thinning operation essentially is acceptance(-rejection) sampling, which I will cover in a future post.

Code

All code from my posts, as always, can be found on the my GitHub repository. The code for this post is located here.

Spatially independent thinning

I have implemented in code the simple $p$-thinning operation applied to a Poisson point process on a rectangle, but in theory any point process can be used for the underlying point process that is thinned.

MATLAB

%Simulation window parameters
xMin=-1;xMax=1;
yMin=-1;yMax=1;
xDelta=xMax-xMin;yDelta=yMax-yMin; %rectangle dimensions
areaTotal=xDelta*yDelta; %area of rectangle

%Point process parameters
lambda=100; %intensity (ie mean density) of the Poisson process

%Thinning probability parameters
sigma=1;
p=0.25; %thinning probability

%Simulate Poisson point process
numbPoints=poissrnd(areaTotal*lambda);%Poisson number of points
xx=xDelta*(rand(numbPoints,1))+xMin;%x coordinates of Poisson points
yy=xDelta*(rand(numbPoints,1))+yMin;%y coordinates of Poisson points

%Generate Bernoulli variables (ie coin flips) for thinning
booleThinned=rand(numbPoints,1)&amp;amp;gt;p; %points to be thinned
booleRetained=~booleThinned; %points to be retained

%x/y locations of thinned points
xxThinned=xx(booleThinned); yyThinned=yy(booleThinned);
%x/y locations of retained points
xxRetained=xx(booleRetained); yyRetained=yy(booleRetained);

%Plotting
plot(xxRetained,yyRetained,'bo'); %plot retained points
hold on; plot(xxThinned,yyThinned,'ro'); %plot thinned points
xlabel('x');ylabel('y');

#Simulation window parameters
xMin=-1;xMax=1;
yMin=-1;yMax=1;
xDelta=xMax-xMin;yDelta=yMax-yMin; #rectangle dimensions
areaTotal=xDelta*yDelta;

#Point process parameters
lambda=100; #intensity (ie mean density) of the Poisson process

#Thinning probability
p=0.25; 

#Simulate a Poisson point process
numbPoints=rpois(1,areaTotal*lambda);#Poisson number of points
xx=xDelta*runif(numbPoints)+xMin;#x coordinates of Poisson points
yy=xDelta*runif(numbPoints)+yMin;#y coordinates of Poisson points

#Generate Bernoulli variables (ie coin flips) for thinning
booleThinned=runif(numbPoints)&amp;amp;gt;p; #points to be thinned
booleRetained=!booleThinned; #points to be retained

#x/y locations of thinned points
xxThinned=xx[booleThinned]; yyThinned=yy[booleThinned];
#x/y locations of retained points
xxRetained=xx[booleRetained]; yyRetained=yy[booleRetained];

#Plotting
par(pty="s")
plot(xxRetained,yyRetained,'p',xlab='x',ylab='y',col='blue'); #plot retained points
points(xxThinned,yyThinned,col='red'); #plot thinned points

Of course, as I have mentioned before, simulating a spatial point processes in R is even easier with the powerful spatial statistics library spatstat. With this library, thinning can be done in quite a general way by using the function rthin.

Python

import numpy as np; #NumPy package for arrays, random number generation, etc
import matplotlib.pyplot as plt

#Simulation window parameters
xMin=-1;xMax=1;
yMin=-1;yMax=1;
xDelta=xMax-xMin;yDelta=yMax-yMin; #rectangle dimensions
areaTotal=xDelta*yDelta;

#Point process parameters
lambda0=100; #intensity (ie mean density) of the Poisson process

#Thinning probability
p=0.25; 

#Simulate a Poisson point process
numbPoints = np.random.poisson(lambda0*areaTotal);#Poisson number of points
xx = np.random.uniform(0,xDelta,((numbPoints,1)))+xMin;#x coordinates of Poisson points
yy = np.random.uniform(0,yDelta,((numbPoints,1)))+yMin;#y coordinates of Poisson points

#Generate Bernoulli variables (ie coin flips) for thinning
booleThinned=np.random.uniform(0,1,((numbPoints,1)))&amp;amp;gt;p; #points to be thinned
booleRetained=~booleThinned; #points to be retained

#x/y locations of thinned points
xxThinned=xx[booleThinned]; yyThinned=yy[booleThinned];
#x/y locations of retained points
xxRetained=xx[booleRetained]; yyRetained=yy[booleRetained];

#Plotting
plt.scatter(xxRetained,yyRetained, edgecolor='b', facecolor='none', alpha=0.5 );
plt.scatter(xxThinned,yyThinned, edgecolor='r', facecolor='none', alpha=0.5 );
plt.xlabel("x"); plt.ylabel("y");
plt.show();

Spatially dependent thinning

I have implemented in code a $p(x)$-thinning operation with the function $p(x)=\exp(-|x|^2/s^2)$, where $|x|$ is the Euclidean distance from $x$ to the origin. This small changes means that in the code there will be a vector or array of $p$ values instead of a single $p$ value in the section where the uniform random variables are generated and compared said $p$ values. (Lines 24, 26 and 28 respectively in the MATLAB, R and Python code presented below.)

Again, I have applied thinning to a Poisson point process on a rectangle, but in theory any point process can be used for the underlying point process.

MATLAB

%Simulation window parameters
xMin=-1;xMax=1;
yMin=-1;yMax=1;
xDelta=xMax-xMin;yDelta=yMax-yMin; %rectangle dimensions
areaTotal=xDelta*yDelta; %area of rectangle
 
%Point process parameters
lambda=100; %intensity (ie mean density) of the Poisson process

%Thinning probability parameters
sigma=0.5; %scale parameter for thinning probability function
%define thinning probability function
fun_p=@(s,x,y)(exp(-(x.^2+y.^2)/s^2)); 

%Simulate Poisson point process
numbPoints=poissrnd(areaTotal*lambda);%Poisson number of points
xx=xDelta*(rand(numbPoints,1))+xMin;%x coordinates of Poisson points
yy=xDelta*(rand(numbPoints,1))+yMin;%y coordinates of Poisson points

%calculate spatially-dependent thinning probabilities
p=fun_p(sigma,xx,yy); 

%Generate Bernoulli variables (ie coin flips) for thinning
booleThinned=rand(numbPoints,1)&amp;amp;gt;p; %points to be thinned
booleRetained=~booleThinned; %points to be retained

%x/y locations of thinned points
xxThinned=xx(booleThinned); yyThinned=yy(booleThinned);
%x/y locations of retained points
xxRetained=xx(booleRetained); yyRetained=yy(booleRetained);

%Plotting
plot(xxRetained,yyRetained,'bo'); %plot retained points
hold on; plot(xxThinned,yyThinned,'ro'); %plot thinned points
xlabel('x');ylabel('y');

#Simulation window parameters
xMin=-1;xMax=1;
yMin=-1;yMax=1;
xDelta=xMax-xMin;yDelta=yMax-yMin; #rectangle dimensions
areaTotal=xDelta*yDelta;

#Point process parameters
lambda=100; #intensity (ie mean density) of the Poisson process

#Thinning probability parameters
sigma=0.5; #scale parameter for thinning probability function
#define thinning probability function
fun_p &amp;amp;lt;- function(s,x,y) {
  exp(-(x^2 + y^2)/s^2);
}

#Simulate a Poisson point process
numbPoints=rpois(1,areaTotal*lambda);#Poisson number of points
xx=xDelta*runif(numbPoints)+xMin;#x coordinates of Poisson points
yy=xDelta*runif(numbPoints)+yMin;#y coordinates of Poisson points

#calculate spatially-dependent thinning probabilities
p=fun_p(sigma,xx,yy); 

#Generate Bernoulli variables (ie coin flips) for thinning
booleThinned=runif(numbPoints)&amp;amp;lt;p; #points to be thinned
booleRetained=!booleThinned; #points to be retained

#x/y locations of thinned points
xxThinned=xx[booleThinned]; yyThinned=yy[booleThinned];
#x/y locations of retained points
xxRetained=xx[booleRetained]; yyRetained=yy[booleRetained];

#Plotting
par(pty="s")
plot(xxRetained,yyRetained,'p',xlab='x',ylab='y',col='blue'); #plot retained points
points(xxThinned,yyThinned,col='red'); #plot thinned points

Again, use the spatial statistics library spatstat, which has the function rthin.

Python

import numpy as np; #NumPy package for arrays, random number generation, etc
import matplotlib.pyplot as plt

#Simulation window parameters
xMin=-1;xMax=1;
yMin=-1;yMax=1;
xDelta=xMax-xMin;yDelta=yMax-yMin; #rectangle dimensions
areaTotal=xDelta*yDelta;

#Point process parameters
lambda0=100; #intensity (ie mean density) of the Poisson process

#Thinning probability parameters
sigma=0.5; #scale parameter for thinning probability function
#define thinning probability function
def fun_p(s, x, y):
    return np.exp(-(x**2+y**2)/s**2);    

#Simulate a Poisson point process
numbPoints = np.random.poisson(lambda0*areaTotal);#Poisson number of points
xx = np.random.uniform(0,xDelta,((numbPoints,1)))+xMin;#x coordinates of Poisson points
yy = np.random.uniform(0,yDelta,((numbPoints,1)))+yMin;#y coordinates of Poisson points

#calculate spatially-dependent thinning probabilities
p=fun_p(sigma,xx,yy); 

#Generate Bernoulli variables (ie coin flips) for thinning
booleThinned=np.random.uniform(0,1,((numbPoints,1)))&amp;amp;gt;p; #points to be thinned
booleRetained=~booleThinned; #points to be retained

#x/y locations of thinned points
xxThinned=xx[booleThinned]; yyThinned=yy[booleThinned];
#x/y locations of retained points
xxRetained=xx[booleRetained]; yyRetained=yy[booleRetained];

#Plotting
plt.scatter(xxRetained,yyRetained, edgecolor='b', facecolor='none', alpha=0.5 );
plt.scatter(xxThinned,yyThinned, edgecolor='r', facecolor='none', alpha=0.5 );
plt.xlabel("x"); plt.ylabel("y");
plt.show();

Results

In the plotted results, the blue and red circles represent respectively the retained and thinned points.

Spatially independent thinning

For these results, I used a thinning probability $p=0.25$, which means that roughly a quarter of the points will be thinned, so on average the ratio of blue to red circles is three to one.

MATLAB

Python

Spatially dependent thinning

Observe how there are more thinned points (that is, red circles) near the origin, which is of course where the thinning function $p(x)=\exp(-|x|^2/s^2)$ attains its maximum.

MATLAB

Python

Code considerations

Where are the pretty pictures?

Code

Further reading

Acknowledg(e)ments

The algorithm

Quantum is so hot right now

Quantum lite

What the code does

Two examples

Implementation considerations

Variance of the walk

Convergence tests

Testing the results

Code

Python

Julia

Reversibility

First Markov chain

Irreducibility

Creating a new Markov chain with acceptance

Transition kernel

\(\alpha(x,y)\) needs a symmetric function \(s(x,y)\)

Examples

Barker

Metropolis(-Rosenbluth-Rosenbluth-Teller-Teller)-Hastings

Further reading

Articles

Historical

Introductory

History

Books

Websites

Details

Algorithm

Point process application

Efficiency

Higher dimensions

Further reading

Algorithms

Algorithm PA by Atkinson (1979)

Algorithm PTRS by Hörmann (1993)

Code

Algorithm PA by Atkinson (1979)

Algorithm PTRS by Hörmann (1993)

Further reading

Direct method doesn’t scale well

Different methods

Code

Methods implemented in popular libraries

MATLAB

R

Python (NumPy)

Octave

Numerical Recipes (Fortran, C and C++)

GSL Library (C)

NAG Library (C)

Boost library Random (C++)

MKL library (C)

cuRAND (C)

Further reading

Books

Articles

Direct method

Using exponential random variables

Sample Poisson random variable \(N\) with parameter (mean) \(\lambda\) using exponential random variables

Using uniform random variables

Sample Poisson random variable \(N\) with parameter (mean) \(\lambda\) using uniform random variables

Example in MATLAB

Origins

Other methods

Code

C

C#

Fortran

Further reading

Basics

Number of points

Locations of points

Transformation