GPU Archives – H. Paul Keeler

The return of the Box-Muller method due to GPUs

Sooner or later in life, you will need to generate a couple normal (or Gaussian) variables. Maybe you’re simulating some thermal noise or the stock market. Perhaps you’re sampling a point process with Gaussian coordinates. Or you’re running a generative artificial intelligence (Gen AI) model, such as a variational autoencoder, which needs to select a Gaussian point on an abstract mathematical space.

Whatever the reason, you can get your independent normal variables by applying the Box-Muller transform to independent uniform random variables. All you need to do is change from Cartesian to polar coordinates. Easy peasy, lemon squeezy.

Box-Muller was not the favourite

I previously wrote about the Box-Muller method, ending on the note that the method, though very cool, isn’t used so much as it relies upon mathematical functions that have been computationally expensive for processors, at least historically.

Box-Muller the favourite on GPUs

But that conventional wisdom has changed in recent years as processors can now (on a hardware level) readily evaluate such functions. More significantly, the fast rise of graphically processing units (GPUs) has also changed the rules of the game.

In a comment on my original Box-Muller post, it was pointed out that the Box-Muller method is the preferred choice for GPUs, giving a link to the Nvidia website. The reason for using the Box-Muller method is GPUs do not handle well algorithms with loops and branches, so you should use methods that avoid these algorithmic steps. And the Box-Muller method is one that does just that.

The NVDIA website says:

Because GPUs are so sensitive to looping and branching, it turns out that the best choice for the Gaussian transform is actually the venerable Box-Muller transform…

Indeed.

The Metropolis-Hastings algorithm in C

There’s an old saying, which I just made up. If you really want to understand something, code it up in C.¹

I wrote some C code that basically does what my code in this previous post does without the pretty pictures. The code performs a Metropolis-Hastings algorithm, simulating random variables (or, more correctly, generating random variates) for a joint probability density describing two random variables.

In previous posts, I have covered the topic of Markov chain Monte Carlo (MCMC) methods, particularly the central workhorse, the Metropolis(-Rosenbluth-Rosenbluth-Teller-Teller)-Hastings algorithm. These methods are frequently used in Bayesian statistics, high-dimensional integration, and optimization. For more details on how they work, I have written a couple posts, starting with this one and ending with this one, where I detail the mechanics of MCMC methods.

Let’s call this code MCMC C code².

Code considerations

The C programming language was not written for playing with random numbers. The standard uniform random number generator in C, simply called rand, is not good for research level random simulations. But it does the job for illustration purposes.

In lieu of this generator, the Mersenne Twister is a popular algorithm for producing such numbers, which is widely recommended and used. There are implementations of this algorithm in the CUDA and MKL libraries for Nvidia GPUs and Intel CPUs; see this CUDA page and this MKL page for details. Check out this PDF file for further details on the CUDA version.

In addition to that, my C code needs to simulate normal (or Gaussian) random variables. For that I wrote my own simple code using the Box-Muller transform, which I covered in a previous post, so the code would be self-contained and less opaque. But in reality, you should always use functions from a pre-written library for generating variates according to a normal or whichever distribution.

Finally, C was never intended as a scientific language, despite its wide use as one. (Historically, that was Fortran’s job, which is still the workhorse for many serious number-crunching institutes, hence why there’s Fortran -ready version of CUDA.) So when handling sets of numbers, such as vectors and matrices, one has to use pointers and malloc more often that not, which can be a tricky. This is the case here, though the use of pointers in this code is relatively simple.

Where are the pretty pictures?

Unfortunately, when number crunching in C, you don’t immediately have access to plotting libraries that are available in scientific programming languages such as Python and Julia.

But you can simply create .csv (or text) files, and then plot them using whichever library you prefer. And if you have gnuplot installed, you can perform simple one-dimensional histograms using one-line commands such as:

gnuplot -e “plot ‘MCMCData_1D.csv’ using 1 bins=20;” -persist

In the above command, the file MCMCData_1D.csv has the random variates (the simulated random variables) stored in a single column.

Code

I only present code for the more complicated two-dimensional case. The code can be found here.

Note that there’s an option in the code of MCMC_1D.c to plot the results using gnuplot, if it’s installed on your machine. I didn’t include code for plotting the results of the 2-D case as gnuplot doesn’t do a 2-D histogram.

Warning: The code uses the standard pseudo-random number generator in C, which is known for being bad. I only used built-in the C generator to keep my code self-contained. For that reason, I also wrote my code for generating Gaussian (or normal) random variables, by using the Box-Muller transform, but in reality one would never do that for research or industry purposes.

/***********************************************************************
 * Runs a simple Metropolis-Hastings (ie MCMC) algorithm to simulate two
 * jointly distributed random variables with probability density
 * p(x,y)=exp(-(x^4+x*y+y^2)/s^2)/consNorm, where s>0 and consNorm is a
 * normalization constant. The probability density function is defined in
 * the function pdf_single.
 *
 * NOTE: In practice, the value of the normalization constant is not needed, as it cancels out in the algorithm.
 *
 * NOTE: This code will *create* a local file (see variable strFilename) to store results. It will *overwrite* that file if it already exists.
 *
 * WARNING: This code uses the default C random number generator, which is known for failing various tests of randomness.
 * Strongly recommended to use another generator for purposes beyond simple illustration.
 *
 * Author: H. Paul Keeler, 2024.
 * Website: hpaulkeeler.com
 * Repository: github.com/hpaulkeeler/posts
 *
 ***********************************************************/

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include <stdbool.h>
#include <string.h>

const long double pi = 3.14159265358979323846; // constant pi for generating polar coordinates

// helper function declarations; see below for definitions
static double *unirand(double *randValues, unsigned numbRand);                           // generate  uniform random variables on (0,1)
static double *normrand(double *randValues, unsigned numbRand, double mu, double sigma); // generate normal random variables
static double pdf_single(double x, double y, double s);                      // define probability density to be simulated
static double mean_var(double *set_sample, unsigned numbSim, double *varX);              // calculate meana and variance

int main(int argc, char *argv[])
{

    if (argc > 1)
    {
        fprintf(stderr, "This program takes no arguments...\n");
        exit(1);
    }
    else
    {

        char strFilename[] = "MCMCData_2D.csv"; // filename for storing simulated random variates

        // intializes (pseudo)-random number generator
        time_t timeCPU; // use CPU time for seed
        srand((unsigned)time(&timeCPU));
        // srand(42); //to reproduce results

        bool booleWriteData = true; // write data to file
        bool booleStats = true;     // perform simple mean/std stats

        // parameters
        unsigned numbSim = 1e4;   // number of random variables simulated
        unsigned numbSteps = 200; // number of steps for the Markov process
        double sigma = 2;         // standard deviation for normal random steps

        // probability density parameters
        double s = .5; // scale parameter for distribution to be simulated

        // Metropolis-Hastings variables
        // proposal for a new position in the random walk
        double zxRandProposal;      
        double zyRandProposal;      
        double pdfProposal; // density for proposed position
        double pdfCurrent;  // density of current position
        double ratioAccept; // ratio of densities (ie acceptance probability)
        double uRand;       // uniform variable for Bernoulli trial (ie a coin flip)
        // random step (normally distributed)
        double *p_numbNormX = (double *)malloc(1 * sizeof(double));
        double *p_numbNormY = (double *)malloc(1 * sizeof(double));
//positions of the random walk (ie the simualted random variables after numbSteps)
        double *p_xRand = (double *)malloc(numbSim * sizeof(double));
        double *p_yRand = (double *)malloc(numbSim * sizeof(double));

        (void)unirand(p_xRand, numbSim); // random initial values
        (void)unirand(p_yRand, numbSim); // random initial values

        unsigned i, j; // loop varibales
        for (i = 0; i < numbSim; i++)
        {
            // loop through each random walk instance (or random variable to be simulated)

            pdfCurrent = pdf_single(*(p_xRand + i), *(p_yRand + i), s); // current probability density

            for (j = 0; j < numbSteps; j++)
            {
                // loop through each step of the random walk
                (void)normrand(p_numbNormX, 1, 0, sigma);
                (void)normrand(p_numbNormY, 1, 0, sigma);
                // take a(normally distributed) random step in x and y
                zxRandProposal = (*(p_xRand + i)) + (*p_numbNormX);
                zyRandProposal = (*(p_yRand + i)) + (*p_numbNormY);

                pdfProposal = pdf_single(zxRandProposal, zyRandProposal, s); // proposed probability density

                // acceptance rejection step
                (void)unirand(&uRand, 1);
                ratioAccept = pdfProposal / pdfCurrent;
                if (uRand < ratioAccept)
                {
                    // update state of random walk / Markov chain
                    *(p_xRand + i) = zxRandProposal;
                    *(p_yRand + i) = zyRandProposal;
                    pdfCurrent = pdfProposal;
                }
            }
        }

        free(p_numbNormX);
        free(p_numbNormY);

        if (booleStats)
        {

            // initialize statistics variables (for testing results)
            char strVariable[] = "XY";
            double *p_AllRand[] = {p_xRand, p_yRand};
            double meanTemp = 0;
            double varTemp = 0;
            double stdTemp = 0;
            char strTemp='X';
            for (i = 0; i < 2; i++)
            {
                meanTemp = mean_var(p_AllRand[i], numbSim, &varTemp);
                stdTemp = sqrt(varTemp);
                strTemp=strVariable[i];
                printf("The average of the %c random variables is %lf.\n", strTemp, meanTemp);
                printf("The standard deviation of the %c random  variables is %lf.\n", strTemp, stdTemp);
            }
        }

        if (booleWriteData)
        {
            // print to file
            FILE *outputFile;
            outputFile = fopen(strFilename, "w");
            for (i = 0; i < numbSim; i++)
            {
                fprintf(outputFile, "%lf,%lf\n", *(p_xRand + i), *(p_yRand + i)); // output to file
            }
            fclose(outputFile);
            printf("Data printed to file.\n");
        }
        free(p_xRand);
        free(p_yRand);

        return (0);
    }
}

static double pdf_single(double x, double y, double s)
{
    // returns the probability density of a single point (x,y) inside a simulation window defined below
    double pdf_output;

    // non-zero density window parameters
    double xMin = -1;
    double xMax = 1;
    double yMin = -1;
    double yMax = 1;

    if¹
    {
        pdf_output = exp(-((pow(x, 4) + x * y + pow(y, 2)) / (s * s)));
    }
    else
    {
        pdf_output = 0;
    }
    return pdf_output;
}

static double *normrand(double *randValues, unsigned numbRand, double mu, double sigma)
{
    // simulate pairs of iid normal variables using Box-Muller transform
    // https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform

    double U1, U2, thetaTemp, rhoTemp, Z1, Z2;
    int i = 0;
    while (i < numbRand)
    {
        // simulate variables in polar coordinates (theta, rho)
        (void)unirand(&U1, 1);
        thetaTemp = 2 * pi * U1; // create uniform theta values
        (void)unirand(&U2, 1);
        rhoTemp = sqrt(-2 * log(U2)); // create Rayleigh rho values

        // change to Cartesian coordinates
        Z1 = rhoTemp * cos(thetaTemp);
        Z1 = sigma * Z1 + mu;
        randValues[i] = Z1; // assign first of random variable pair
        i++;
        if (i < numbRand)
        {
            // if more variables are needed, generate second value of random pair
            Z2 = rhoTemp * sin(thetaTemp);
            Z2 = sigma * Z2 + mu;
            randValues[i] = Z2; // assign second of random variable pair
            i++;
        }
        else
        {
            break;
        }
    }
    return randValues;
}

static double *unirand(double *randValues, unsigned numbRand)
{ // simulate numbRand uniform random variables on the unit interval
  // storing them in randValues which must be allocated by the caller
  // with enough space for numbRand doubles

    for (int i = 0; i < numbRand; i++)
    {
        randValues[i] = (double)rand() / RAND_MAX;
    }
    return randValues;
}

static double mean_var(double *set_sample, unsigned numbSim, double *varX)
{
    // mean and variance of set_sample
    int i;
    // initialize statistics variables (for testing results)
    double meanX = 0;
    double meanXSquared = 0;
    double tempX;
    for (i = 0; i < numbSim; i++)
    {
        tempX = *(set_sample + i);
        meanX += tempX / ((double)numbSim);
        meanXSquared += tempX * tempX / ((double)numbSim);
    }

    *varX = meanXSquared - meanX * meanX;
    return meanX;
}

Acknowledg(e)ments

A hat tip to C and CUDA guru Alex Stivala who pointed out a couple of minor issues in my original C code.

x >= xMin) && (x <= xMax) && (y >= yMin) && (y <= yMax [↩]

The Metropolis-Hastings algorithm in C with multi-variable densities

Here’s a C implementation of the Metropolis(-Rosenbluth-Rosenbluth-Teller-Teller)-Hastings algorithm that can handle joint multi-variable probability densities, thus simulating a finite number of random variables. The Metropolis-Hastings algorithm is the central piece in Markov chain Monte Carlo (MCMC) methods. They have become essential in Bayesian statistics, where they are used to tackle high-dimensional integration. I have written a couple posts about these methods, starting with this one and ending with one.

My previous C code only works with densities of a couple variables. That code is covered in this post, which in turn is based on a previous post, in which I presented the algorithm implemented in the scientific programming languages Python, Julia, MATLAB and R. Those examples were for mostly illustration purposes, as there are already good pre-written libraries in those languages.

Code considerations

I’ll describe some considerations for implementing this algorithm in C. Some of this overlaps with what I wrote on the previous post. In writing this more general version of the code, I arguably cleaned up the old code, which is often the case when you generalize things in coding and mathematics.

Storing multi-dimensional values in C

In C, when handling sets of numbers, such as vectors and matrices, one has to use pointers and the malloc function more often that not, which can create headaches. For this algorithm, probably the most important data object is the one for storing the (current) positions of the multi-dimensional random walks across all the simulations run. This conceptually results in a rectangular grid or table of numbers. In mathematics, you would just use a $d \times n$ matrix, where is $d$ is the number of dimensions and $n$ is the number of simulation runs.

In most programming languages, this table of numbers typically corresponds to a $d \times n$ array. More specifically in C, you could do this with a multi-dimensional array or an array of pointers. But in this case, you don’t need to use the inherent structure of a matrix. Besides, any multi-dimensional array in C will be stored in memory as a one-dimensional array, so you can store the numbers in a simple one-dimensional array or vector with $d \cdot n$ elements in total.¹ The code does this using a single pointer.

Then using this vector, you just need to index (or map) between the vector and the matrix appropriately, which the code does with the (integer) variable indexSimDim. There are simple one-to-one mappings between elements in the vector and the elements in $d \times n$ matrix. For example, matrix row $i$ and column $k$ corresponds to the element number $m=i\cdot d+k$ in the vector; see the code below.

unsigned indexSimDim; // index for keeping track of two-dimensional data as a one-dimensional array // indexSimDim = i * numbDim + k, where i is simulation number and k is dimension number (minus one)

This is a standard trick. In fact, you’ll see this type of line of code as necessary boilerplate in CUDA-based (and similar) code in kernels (routines) due to the inherent grid hierarchy of graphical processing units (GPUs). But that discussion deserves a post on its own.

Randomness in C

In previous posts, I’ve remarked that the standard uniform random number generator in C, called rand, is not good enough for research level randomness. That said, it works fine for regular simulations. The Mersenne Twister is a widely recommended and used algorithm for producing such numbers.

To simulate normal (or Gaussian) random variables, I wrote my own simple function using the Box-Muller transform, which I covered in a previous post, so the code would be self-contained. But in practice, you should always use pre-written and tested functions.

Code

The code can be found here, whereas other MCMC code can be found here and here. The joint probability density is defined in the function pdf_single.

/***********************************************************************
 * Runs a simple Metropolis-Hastings (ie MCMC) algorithm to simulate n
 * jointly distributed random variables with probability density p(x,y).
 * For example:
 * p(x,y)=exp(-(x^4+x*y+y^2+y*z+z^4)/s^2)/consNorm, where s>0 and consNorm is a
 * normalization constant. The probability density function is defined in
 * the function pdf_single.
 *
 * NOTE: In practice, the value of the normalization constant is not needed, as it cancels out in the algorithm.
 *
 * NOTE: This code will *create* a local file (see variable strFilename) to store results. It will *overwrite* that file if it already exists.
 *
 * WARNING: This code uses the default C random number generator, which is known for failing various tests of randomness.
 * Strongly recommended to use another generator for purposes beyond simple illustration.
 *
 * Author: H. Paul Keeler, 2024.
 * Website: hpaulkeeler.com
 * Repository: github.com/hpaulkeeler/posts
 *
 ***********************************************************/

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include <stdbool.h>
#include <string.h>

const long double pi = 3.14159265358979323846; // constant pi for generating polar coordinates

// helper function declarations; see below for definitions
static double *unirand(double *randValues, unsigned numbRand);                           // generate  uniform random variables on (0,1)
static double *normrand(double *randValues, unsigned numbRand, double mu, double sigma); // generate normal random variables
static double pdf_single(double *x_input, unsigned numbDim, double *parameters);           // define probability density to be simulated
static double mean_var(double *set_sample, unsigned numbSim, double *varX);              // calculate meana and variance

int main(int argc, char *argv[])
{

    if (argc > 1)
    {
        fprintf(stderr, "This program takes no arguments...\n");
        exit(1);
    }
    else
    {
        char strFilename[] = "MCMCData_ND.csv"; // filename for storing simulated random variates

        // intializes (pseudo)-random number generator
        time_t timeCPU; // use CPU time for seed
        srand((unsigned)time(&timeCPU));
        // srand(42); //to reproduce results

        bool booleWriteData = true; // write data to file
        bool booleStats = true;     // perform simple mean/std stats
        unsigned numbDimMax = 3; //upper bound on number of dimensions for which stats are calculated and printed out

        // simulation parameters
        unsigned numbSim = 1e4;   // number of random variables simulated
        unsigned numbSteps = 200; // number of steps for the Markov process
        double sigma = 2;         // standard deviation for normal random steps

        // probability density parameters
        double s = .5; // scale parameter for distribution to be simulated
        unsigned numbDim = 3;

        // Metropolis-Hastings variables
        // proposal for a new position in the random walk
        double *tRandProposal = (double *)malloc(numbDim * sizeof(double));
        double pdfProposal;      // density for proposed position
        double pdfCurrent;       // density of current position
        double ratioAccept;      // ratio of densities (ie acceptance probability)
        double uRand;            // uniform variable for Bernoulli trial (ie a coin flip)
        // random step (normally distributed)
        double *p_numbNormT = (double *)malloc(1 * sizeof(double));
        // positions of the random walk (ie the simualted random variables after numbSteps)
        double *p_tRand = (double *)malloc(numbDim * numbSim * sizeof(double));

        unsigned i, j, k;     // loop varibales
        unsigned indexSimDim; // index for keeping track of two-dimensional data as a one-dimensional array
        // Typically indexSimDim = i * numbDim + k, where i is simulation number and k is dimension number (minus one)

        double *p_tRandCurrent = (double *)malloc(numbDim * sizeof(double));
        (void)unirand(p_tRand, numbDim * numbSim); // random initial values

        for (i = 0; i < numbSim; i++)
        {
            // loop through each random walk instance (or random variable to be simulated)
            for (k = 0; k < numbDim; k++)
            {
                // loop through dimensions
                indexSimDim = i * numbDim + k;
                // update state of random walk / Markov chain
                *(p_tRandCurrent + k) = *(p_tRand + indexSimDim);
            }

            pdfCurrent = pdf_single(p_tRandCurrent, numbDim, & s); // current probability density

            for (j = 0; j < numbSteps; j++)
            {
                // loop through each step of the random walk
                for (k = 0; k < numbDim; k++)
                {
                    // loop through dimensions
                    indexSimDim = i * numbDim + k;
                    (void)normrand(p_numbNormT, 1, 0, sigma);
                    // take a(normally distributed) random step in x, y and y
                    *(tRandProposal+k) = *(p_tRand + indexSimDim) + *(p_numbNormT);
                }
                pdfProposal = pdf_single(tRandProposal, numbDim, & s); // proposed probability density

                // acceptance rejection step
                (void)unirand(&uRand, 1);
                ratioAccept = pdfProposal / pdfCurrent;
                if (uRand < ratioAccept)
                {
                    for (k = 0; k < numbDim; k++)
                    {
                        // loop through dimensions
                        indexSimDim = i * numbDim + k;
                        // update state of random walk / Markov chain
                        *(p_tRand + indexSimDim) = tRandProposal[k];
                    }
                    pdfCurrent = pdfProposal;
                }
            }
        }

        free(p_numbNormT);

        if (booleStats)
        {
            // initialize statistics variables (for testing results)
            double *p_AllRand = (double *)malloc(numbSim * sizeof(double));
            double meanTemp = 0;
            double varTemp = 0;
            double stdTemp = 0;
            unsigned numbDimStats = fmin(numbDimMax, numbDim); //number of dimensions for which stats are calculated and printed out
            for (k = 0; k < numbDimStats; k++)
            {
                // loop through all the dimensions
                for (i = 0; i < numbSim; i++)
                {
                    // collect variables for dimension k+1
                    indexSimDim = i * numbDim + k;
                    *(p_AllRand + i) = *(p_tRand + indexSimDim);
                }
                meanTemp = mean_var(p_AllRand, numbSim, &varTemp);
                stdTemp = sqrt(varTemp);
                printf("The average of dimension %d random variables is %lf.\n", k + 1, meanTemp);
                printf("The standard deviation of dimension %d random  variables is %lf.\n", k + 1, stdTemp);
            }
        }

        if (booleWriteData)
        {
            // print to file
            FILE *outputFile;
            outputFile = fopen(strFilename, "w");

            // create string of spacers (ie commas and newlines)
            char *strSpacer = (char *)malloc((numbDim + 1) * sizeof(char));
            for (k = 0; k < numbDim - 1; k++)
            {
                *(strSpacer + k) = ',';
            }
            strSpacer[numbDim - 1] = '\n';
            strSpacer[numbDim] = '\0';
            for (i = 0; i < numbSim; i++)
            {
                for (k = 0; k < numbDim; k++)
                {
                    indexSimDim = i * numbDim + k;
                    fprintf(outputFile, "%lf%c", *(p_tRand + indexSimDim), strSpacer[k]);
                }
            }

            fclose(outputFile);
            printf("Data printed to file.\n");
        }
        free(p_tRand);

        return (0);
    }
}

static double pdf_single(double *x_input, unsigned numbDim, double *parameters)
{
    // returns the probability density of a single point inside a simulation window defined below
    
    double pdf_output = 0; //probability density at a single point
    
    // non-zero density square window parameters
    double xMin = -1;
    double xMax = 1;

    // retrieve variables
    double x = *(x_input + 0);
    double y = *(x_input + 1);
    double z = *(x_input + 2);
    // retrieve scale parameter
    double s = *(parameters + 0);

    int i;
    //check point is inside simulation window
    bool booleInsideWindow = true;
    for (i = 0; i < numbDim; i++){
        booleInsideWindow = booleInsideWindow &¹;
    }

    // define probability density
    if (booleInsideWindow)
    {
        // evaluate probability density
        pdf_output = exp(-((pow(x, 4) + x * y + pow(y, 2) + y * z + pow(z, 4)) / (s * s)));
    }
    else
    {
        pdf_output = 0;
    }
    return pdf_output;
}

static double *normrand(double *randValues, unsigned numbRand, double mu, double sigma)
{
    // simulate pairs of iid normal variables using Box-Muller transform
    // https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform

    double U1, U2, thetaTemp, rhoTemp, Z1, Z2;
    int i = 0;
    while (i < numbRand)
    {
        // simulate variables in polar coordinates (theta, rho)
        (void)unirand(&U1, 1);
        thetaTemp = 2 * pi * U1; // create uniform theta values
        (void)unirand(&U2, 1);
        rhoTemp = sqrt(-2 * log(U2)); // create Rayleigh rho values

        // change to Cartesian coordinates (Z1, Z2)
        Z1 = rhoTemp * cos(thetaTemp);
        Z1 = sigma * Z1 + mu;
        randValues[i] = Z1; // assign first of random variable pair
        i++;
        if (i < numbRand)
        {
            // if more variables are needed, generate second value of random pair
            Z2 = rhoTemp * sin(thetaTemp);
            Z2 = sigma * Z2 + mu;
            randValues[i] = Z2; // assign second of random variable pair
            i++;
        }
        else
        {
            break;
        }
    }
    return randValues;
}

static double *unirand(double *randValues, unsigned numbRand)
{ // simulate numbRand uniform random variables on the unit interval
  // storing them in randValues which must be allocated by the caller
  // with enough space for numbRand doubles

    for (int i = 0; i < numbRand; i++)
    {
        randValues[i] = (double)rand() / RAND_MAX;
    }
    return randValues;
}

static double mean_var(double *set_sample, unsigned numbSim, double *varX)
{
    // mean and variance of set_sample
    int i;
    // initialize statistics variables (for testing results)
    double meanX = 0;
    double meanXSquared = 0;
    double tempX;
    for (i = 0; i < numbSim; i++)
    {
        tempX = *(set_sample + i);
        meanX += tempX / ((double)numbSim);
        meanXSquared += tempX * tempX / ((double)numbSim);
    }

    *varX = meanXSquared - meanX * meanX;
    return meanX;
}

x_input[i] >= xMin) & (x_input[i] <= xMax [↩]

The Box-Muller method for simulating normal variables

In the previous post, I covered a simple but much used method for simulating random variables or, rather, generating random variates. To simulate a random variable, the method requires writing down, in a tractable manner, the inverse of its cumulative distribution function.

But in the case of the normal (or Gaussian) distribution, there is no closed-form expression for its cumulative distribution function nor its inverse. This means you cannot, in an elegant and fast way at least, generate with the inverse method a single normal random variable using a single uniform random variable.

Interestingly, however, you can generate two (independent) normal variables with two (independent) uniform variables using the Box-Muller method, originally proposed by George Box and Mervin E. Muller. This approach uses the inverse method, but in practice it’s not used much (see below). I detail this method because I find it neat and it highlights the connection between the normal distribution and rotational symmetry, which has been the subject of some recent 3Blue1Brown videos on YouTube.

(This method was also used to simulate the Thomas point process, which I covered in a previous post.)

Incidentally, this connection is also mentioned in a previous post on simulating a Poisson point process on the surface of a sphere. In that method post, Method 2 uses an observation by the Muller that normal random variables can be used to position points uniformly on spheres.

I imagine this method was first observed by transforming two normal variables, instead of guessing various distribution pairs that would work. Then I’ll sketch the proof in the opposite direction, though it works in both directions.

Proof outline

The joint probability density of two independent variables is simply the product of the two individual probabilities densities. Then the joint density of two standard normal variables is

$$\begin{align}f_{X,Y}(x,y)&=\left[\frac{1}{\sqrt{2\pi}}e^{-x^2/2}\right]\left[\frac{1}{\sqrt{2\pi}}e^{-y^2/2}\right]\\&=\frac{1}{{2\pi}}e^{-(x^2+y^2)/2}\,.\end{align}$$

Now it requires a change of coordinates in two dimensions (from Cartesian to polar) using a Jacobian determinant, which in this case is $|J(\theta,r)=r|$.¹ giving a new joint probability density

$$f_{\Theta,R}(\theta,r)=\left[\frac{1}{\sqrt{2\pi}}\right]\left[ r\,e^{-r^2/2}\right]\,.$$

Now we just identify the two probability densities. The first probability density corresponds to a uniform variable on $[0, 2\pi]$, whereas the second is that of a Rayleigh variable with parameter $\sigma=1$. Of course the proof works in the opposite direction because the transformation (between Cartesian and polar coordinates) is a one-to-one function.

Algorithm

Here’s the Box-Muller method for simulating two (independent) standard normal variables with two (independent) uniform random variables.

Two (independent) standard normal random variable $Z_1$ and $Z_2$

Generate two (independent) uniform random variables $U_1\sim U(0,1)$ and $U_2\sim U(0,1)$.

Return $Z_1=\sqrt{-2\ln U_1}\cos(2\pi U_2)$ and $Z_2=\sqrt{-2\ln U_1}\sin(2\pi U_2)$.

The method effectively samples a uniform angular variable $\Theta=2\pi U_2$ on the interval $[0,2\pi]$ and a radial variable $R=\sqrt{-2\ln U_1}$ with a Rayleigh distribution.

The algorithm produces two independent standard normal variables. Of course, as many of us learn in high school, if $Z$ is a standard normal variable, then the random variable $X=\sigma Z +\mu$ is a normal variable with mean $\mu$ and standard deviation $\sigma>0$ .

The fall of the Box-Muller method

Sadly this method was typically not used, as historically computer processors were slow at doing calculations involving the necessary mathematical functions. To avoid these functions researchers developed and employed other methods such as the ziggurat algorithm.

Also, although processors can now do such calculations much faster, many languages, not just scientific ones, come with functions for generating normal variables. Consequently, there had not been much need in implementing this method.

Update: The return of the Box-Muller method

The above conventional wisdom has changed in recent years as processors can now (on a hardware level) readily evaluate such functions. (I had been waiting to see if certain libraries would be re-written by using the Box-Muller method, but why bother if the old ones work so well?) When I used the term “processors”, I had central processor units (CPUs) in mind, but in recent years graphically processor units (GPUs) have become widely popular.

In a comment on this post, it pointed out that the Box-Muller method is the preferred choice for GPUs, as evidenced by its implementation in Nvidia’s CUDA library. The reason is GPUs do not handle well loops and branches in algorithms, so you should use methods that avoid these algorithmic steps. And the Box-Muller method is one that does just that.

The NVDIA website says:

Because GPUs are so sensitive to looping and branching, it turns out that the best choice for the Gaussian transform is actually the venerable Box-Muller transform

The Newman-Ziff algorithm for simulating percolation models

Imagine you’re trying to estimate some statistics about a certain percolation system. For example, estimating the probability of a given bond being open, which means connected, when a giant component forms. The system or model is too complicated for analytic results, which is true for all percolation models with the exception of a handful.

So you code up the percolation model in your favourite programming language, run a large number of stochastic simulations, collect the statistics, and then look at the results. Percolation systems are, by definition, large, but with fast computers, the simulation method works. You get your statistics.

Easy.

But that’s just for one parameter. What if you want to see what happens with a range of parameter values? Well, you do the same thing, but now just run the large number of simulations for a range of different parameter values, right?

No.

You could do that, and it would work. But it would be slow. How to make it faster?

Newman and Ziff proposed and implemented a fast percolation algorithm based on the simple ideas of keeping old simulations and using conditional probabilities.

Algorithm

The algorithm was presented in this paper:

2000 – Newman, Ziff – Efficient Monte Carlo Algorithm and High-Precision Results for Percolation

There exists a preprint with a slightly different title.

2001 – Newman, Ziff – A fast Monte Carlo algorithm for site or bond percolation.

Overview

The simulation method consists of three components.

Components of the Newman-Ziff algorithm

Re-use parts of previous simulation outcomes by shuffling.

Use fast union-find algorithms.

Find a smooth curve of results by using conditional probabilities.

None of them are entirely revolutionary, but put together they form a fast (and popular) simulation method for studying (monotonic) percolation models.

1. Randomly sampling sequences

The main insight was to sample in an incremental fashion, adding an open bond to the percolation model, such as a lattice, during each simulation run, without throwing away the previous simulation. More specifically, let’s say they sample a bond model with $k$ open bonds during one simulation run. Then in the next simulation, use the same configuration, and simply add one bond.

This is a nice trick, but there’s no free lunch here. On one hand, they don’t waste results. But on the other hand, they do throw away independence (and ergodicity) by doing this. But perhaps the numerical results don’t care.

To do this step, Newman and Ziff use a random shuffling algorithm called the Fisher-Yates shuffle. But it’s not clear if Newmand and Ziff thought of this shuffling algorithm independently or not. They never call it by its name nor do they cite any work on this well-known shuffling method. Both MATLAB and Python (SciPy) have functions for doing random shuffles.

2. Using union-find

Imagine you have a collection of things. Some of these things are connected to other things in your collection. You want to understand how these clusters of connected things behave.

To find the clusters, looking for ultimately the biggest one, Newman and Ziff use a union-find method. There are different types of these alogrithms, often hinging upon the use of recursion. They were developed mostly in the 1980s, particularly in work by Robert Tarjan.

These methods are very efficient at finding clusters. The algorithm speed or, rather, complexity is given in relation to the inverse of an Ackermann function, which is a famously fast growing function. These methods rely upon the monotonicity of growing unions.

(If you have a non-monotonic percolating system, then unfortunately you can’t use these algorithms, which is the case for percolation based on signal-to-interference-plus-noise ratio (SINR). )

3. Filling in the parameter gaps

For any random sample of a percolation model, the number of open bonds or sites is a natural number. But the parameter of the percolation model, such as the probability of a site or bond being open, is a real number.

Newman and Ziff use a simple conditioning argument to fill the gaps.

For any statistic $S$ of of a given percolation model, the Newman-Ziff algorithm finds the expected statistic conditioned on $n$ number of open bonds (or equivalent objects in other models). In other words, the algorithm finds the expected conditional statistics $E[S_n|N=n]$. Then we just need the probability of $N$ being $n$ as a function of the parameter we’re trying to vary, such as the bond probability $p$. With this probability $P(N=n)$, we immediately arrive at the expression for the statistic $S$ with introductory probability

$$ E(S_n)=E_N [E(S_n|N=n)].$$

For a nice discrete model with $m$ total, we get the expression

$$ E(S_n)=\sum _{n=1}^{m}[P(N=n)S_n].$$

But we know the probability (distribution) of $N$, as it’s simply the binomial variable. Assuming there are $m$ total bonds (or equivalent objects), then we arrive at

$$P(N=n)={m\choose n} p^n(1-p)^{m-n} .$$

Then the final statistic or result is simply

$$E(S_n)=\sum _{n=1}^{m} {m\choose n} p^n(1-p)^{m-n} S_n.$$

Code

I wrote the algorithm in MATLAB and Python. The code can be found here. I’ve included a script for plotting the results (for small square lattices).

MATLAB

It turns out that MATLAB doesn’t like recursion too much. Well, at least that’s my explanation for the slow results when I use the union-find method with recursion. So I used the union-find method without recursion.

Python

My Python code is mostly for illustration purposes. Check out this Python library. That website warns of the traps that multithreading in NumPy can pose.

C

In the other original paper, Newman and Ziff included C code of their algorithm. You can find it on Newman’s website, but I have also a copy of it (with added comments from me) located here.

Future work

I plan to implement this algorithm using the NVIDIA CUDA library. This is exactly the type of algorithm that we can implement and run using graphical processing units (GPUs), which is the entire point of the CUDA library.

At first glance, the spatial dependence of the problem suggests using texture memory to speed up the memory accessing on the GPUs. But I will explain why that’s probably not a good idea for this specific algorithm

News: Accelerated PyTorch learning on Mac

The PyTorch website reveals that I can now do accelerated model training with PyTorch on my Mac (which has one of the new Silicon M series chips):

In collaboration with the Metal engineering team at Apple, we are excited to announce support for GPU-accelerated PyTorch training on Mac. Until now, PyTorch training on Mac only leveraged the CPU, but with the upcoming PyTorch v1.12 release, developers and researchers can take advantage of Apple silicon GPUs for significantly faster model training. This unlocks the ability to perform machine learning workflows like prototyping and fine-tuning locally, right on Mac.

As the website says, this new PyTorch acceleration capability is due to Metal, which is Apple’s graphics processing (or GPU) technology.

This means I no longer need to use remote machines when training models. I can do it all on my trusty home computer. Let’s check.

The easiest way to check is, if you haven’t already, first install PyTorch. I would use Anaconda:

conda install pytorch torchvision -c pytorch

With PyTorch installed, run Python and then run the commands:

import torch
print(torch.backends.mps.is_available()) 
print(torch.backends.mps.is_built())

If the last commands works without a hitch, you have PyTorch installed. The last two commands should return True, meaning that PyTorch can use your graphics card for (accelerated) calculations.

New link: Dataflowr – Deep Learning DIY

Want to learn some deep learning? I recommend the Dataflowr website:

https://dataflowr.github.io/website/

It’s a good resource for learning the basics of neural networks using the PyTorch library in Python. The focus is on writing and running code. You can even play around with GPUs (graphical processing units) by running them on Google’s Colab, though that’s usually not needed.

It’s mostly run by a researcher who is a former colleague of mine and, while I was at Inria, was indirectly the reason I started using PyTorch for my machine learning work.

Sparse Spiking Gradient Descent

A preprint caught my eye:

2022 – Perez-Nieves and Goodman – Sparse Spiking Gradient Descent.

I love myself a short title. But you wouldn’t guess from the title that the paper is about artificial neural networks. But not just any type of neural network. Here’s the abstract:

There is an increasing interest in emulating Spiking Neural Networks (SNNs) on neuromorphic computing devices due to their low energy consumption. Recent advances have allowed training SNNs to a point where they start to compete with traditional Artificial Neural Networks (ANNs) in terms of accuracy, while at the same time being energy efficient when run on neuromorphic hardware. However, the process of training SNNs is still based on dense tensor operations originally developed for ANNs which do not leverage the spatiotemporally sparse nature of SNNs. We present here the first sparse SNN backpropagation algorithm which achieves the same or better accuracy as current state of the art methods while being significantly faster and more memory efficient. We show the effectiveness of our method on real datasets of varying complexity (Fashion-MNIST, Neuromophic-MNIST and Spiking Heidelberg Digits) achieving a speedup in the backward pass of up to 150x, and 85% more memory efficient, without losing accuracy.

OK. I’ll bite. So what’s a spiking neural network? And why should we care?

Improving on artificial neural networks

Artificial neural networks get all the glory. They are now everywhere. You can’t open up a newspaper or your laptop without seeing a reference to or being pestered by some agent of artificial intelligence (AI), which usually implies an artificial neural network is working in the background. Despite this, they are far from ideal.

In in some sense, mainstream artificial networks networks are rather brain-lite, as they only draw inspiration from how brains actually function. These statistical models are mostly linear and continuous, which makes them well-behaved mathematically (or algorithmically) speaking.

But in terms of energy and time required for training and using computational tools, the carbon-based grey matter is winning. Kids don’t need to read every book, newspaper and sonnet ever written to master a language. While understanding these words, our brains aren’t boiling volumes of water in excess heat output.

To make such statistical models more brain-heavy, researchers have proposed neural networks that run on spikes, drawing inspiration from the spiky electrical jolts that run through brain neurons. The proposal is something called a spiking neural network, which is also artificial neural networks, but they run on spikes with the aim of being more efficient at learning and doing tasks.

Spiking neural networks are not smooth

The problem is that these spiky models are hard to train (or fit), as their nice behaving property of smoothness vanishes when there’s no continuity. You cannot simply run the forward pass and then backward pass to find the gradients of your neural network model, as you do with regular neural networks when doing backpropagation.

Surrogate gradients

Despite this obstacle, there have been some proposals for training these statistical models. The training proposals often come down to proposing a continuous function to approximate the actual function being used in the spiking neural networks. The function’s gradients are found using standard methods. We can then use these surrogate gradients to infer in which direction we should move to to better train (or fit) the model.

I know. It sounds like cheating, using one function to guess something about another function. But there has been some success with this approach.

Sparse Spiking Gradients

The training method proposed by Perez-Nieves and Goodman is a type of surrogate method. Using the leaky integrate-and-fire (LIF) model for neuron firing, they develop an approximation for the gradients of their spiking neural network. A key feature of their approach is that, much like our brains, their model is sparse in the sense only a small fraction of neurons are every firing.

Provided the spiking neural network attains a certain degree of sparsity, their sparse spiking gradient descent gives faster, less memory-hungry results.

Perez-Nieves and Goodman support their claims by giving numerical results, which they obtained by running their method on graphical processing units (GPUs). These ferociously fast video game chips have become the standard hardware for doing big number-crunching tasks that are routinely required of working with models in machine learning and artificial intelligence.

Simulating Poisson random variables – Survey of methods

In this post I’ll cover the different ways for generating Poisson random variates. (I often just say simulating Poisson variables.) I’ll then list which methods different language libraries use, ranging from open source projects NumPy and R to industry-level libraries MLK (by Intel) and cuRand (CUDA) by Nvidia.

Direct method doesn’t scale well

In the previous post, I discussed how to sample or generate Poisson random variables or, more correctly, variates. I detailed a direct method that uses the fact that a Poisson stochastic process, which is directly related to a Poisson point process, has inter-arrival times that form independent and identically distributed exponential variables.

The direct method is an easy and intuitive sampling method, explaining why it is often used. I implemented the method in MATLAB, Python, C and C#, which can be found here. Later, in another post, I implemented the same Poisson sampling method in Fortran, which is located here.)

As elegant and exact as this simulation method is, it unfortunately decreases in speed as the Poisson parameter $\lambda$ increases. In a tutorial published in 1983, Brian D. Ripely, a major figure in spatial statistics, says this about the direct method:

This is simple, but has expected time proportional to $\lambda$. Some of its competitors use rejection methods with the envelope distribution that of the integer part of a continuous random variable, such as logistic, Laplace and normal mixed with exponential distributions.

We recall that acceptance-rejection or rejections methods involve simulating a random object, such as a random variable, by first simulating another random object of the same type that is easier to simulate. The simulation method then accepts or rejects these random objects based on a certain ratio. The distribution of the simpler random object that is first simulated is called the envelope distribution. Such rejection methods are one way to simulate Poisson variables.

In short, when simulating Poisson variables, the appropriate simulation algorithm should be chosen based on the Poisson parameter. Consequently, the code of most computer functions for generating Poisson variables will have an if-statement, using the direct method for small parameter values and another method for large parameter values. In addition to that, the method for large Poisson parameter values should be both fast but simple to implement.

We now consider the other methods.

Different methods

Over the years there have been different methods proposed for producing Poisson random variates. In the book Non-uniform random variate generation, Luc Devroye groups (in Section X.3 on page 502) the different methods into five categories coupled with his views. These methods are:

Direct methods based on the homogeneous Poisson stochastic process having exponential inter-arrival times. These methods are simple, but the expected time is proportional to the Poisson parameter $\lambda$.
Inversion methods that search through a table of cumulative Poisson probabilities. Examples include the papers by Fishman (1976) and Atkinson (1979)*.
Methods that use the recursive properties of the Poisson distribution. The paper by Ahrens and Dieter (1974) uses this approach, and its expected time of completion is proportional to $\log\lambda$.
Acceptance-rejection (or rejection) methods that give relatively fast but simple algorithms. Such methods are proposed in the papers by Atkinson (1979)*, Ahrens and Dieter (1980) and Devroye (1981) or the technical report by Schmeiser and Kachitvichyanukul (1981).
Acceptance-complement methods that uses a normal distribution as the starting distribution, such as the paper by Ahrens and Dieter (1982). This method is fast, but the code is rather long.

*Atkinson had (at least) two papers on generating Poisson variates published in 1979, but I believe Devroye is referring to the first paper, because in the second paper Atkinson compares methods proposed by others.

For the paper titles, see the Further reading section below.

Code

In a separate post, I present implementations in Python and MATLAB of algorithms found respectively in the papers by Atkinson (1979) and Hörmann (1991). But these are only for illustration purposes, as Python (with SciPy) and MATLAB only have good functions for generating Poisson variables.

Methods implemented in popular libraries

I’ll now state which methods are used in various programming languages and numerical methods. I won’t go into the details how the methods work, just citing the papers instead.

MATLAB

For small $\lambda$ values, the MATLAB function poissrnd uses the direct method (based on inter-arrival times) with a while-loop.

For $\lambda$ values greater than fifteen, I believe that the MATLAB function poissrnd uses Algorithm PG from the 1974 paper by Ahrens and Dieter, which uses the the generation of gamma and binomial random variates.

But to come to this conclusion, I had to do some investigating. You can skip to the next section if you’re not interested, but now I’ll explain my reasoning.

The MATLAB documentation says it uses a method proposed by Ahrens and Dieter, but these two researchers developed a number of methods for generating Poisson variables. The MATLAB code cites Volume 2 of the classic series by Knuth, who says the method is due to Ahrens and Dieter, but he doesn’t give an exact citation in that section of the book. Confusingly, Knuth cites in his book a couple papers by Ahrens and Dieter for generating different random variates. (Knuth later cites a seemingly relevant 1980 paper by Ahrens and Dieter, but that details another method.)

Both the MATLAB code and Knuth cite the book by Devroye. In his book (Exercise 3.5.2), Devroye discusses one method, among others, from a 1974 paper by Ahrens and Dieter. Another hint is given by examining the code of the MATLAB function poissrnd, which reveals that it uses the function randg to generate gamma variables. In the Ahrens and Dieter 1974 paper, their Algorithm PG (for producing Poisson variates) uses gamma random variables, and it’s suggested to use a parameter value of $7/8$. This is the same parameter used in the MATLAB code and mentioned by Knuth, confirming that this is the right paper by Ahrens and Dieter.

In summary, for large $\lambda$ the function MATLAB uses Algorithm PG from the 1974 paper by Ahrens and Dieter, whereas for small values it uses the direct method, which they refer to as the multiplication method.

R

In R, the function rpois use an algorithm outlined in the 1982 paper by Ahrens and Dieter. You can view the R source code here. The two cases for $\lambda$ (or $\mu$ in the paper) depend on whether $\lambda$ is greater than ten or not. For small $\lambda$, the R function rpois does not use the method based on inter-arrival times, but rather an inversion method based on a table of (cumulative) probabilities given by the Poisson probability distribution.

Python (NumPy)

In NumPy, the function numpy.random.poisson generates Poisson variates. The source code for the NumPy library is here, but for the Poisson function the underlying code is actually written in C; see the distributions.c file located here. For small Poisson parameter $\lambda$, the code uses the direct method; see the function random_poisson_mult in the code.

For Poisson parameter $\lambda \geq 10$, the comments in the code reveal that it uses a method from a 1993 paper by Hörmann; see Algorithm PTRS on page 43 of the paper. This is a transformation method, which for NumPy is implemented in the C code as the function random_poisson_ptrs. The method, which Hörmann calls the transformed rejection with squeeze, combines inversion and rejection methods.

Octave

Octave is intended to be a GNU clone of MATLAB, so you would suspect it uses the same methods as MATLAB for generating Poisson random variates. But the Octave function poissrnd uses different methods. The code reveals it generates the Poisson variates with a function called prand. It considers different cases depending on the value of the Poisson parameter $\lambda$ as well as whether a single variable (that is, a scalar) or vector or matrix of Poisson variates are being generated.

In total, the Octave function prand uses five different methods. For two of the methods, the documentation cites methods from the classic book Numerical Recipes in C (the 1992 edition); see next section. To generate a single Poisson variate with Poisson parameter $\lambda \leq 12$, the Octave function prand uses the direct method based on inter-arrival times.

Numerical Recipes (Fortran, C and C++)

The book Numerical Recipes is a classic by Press, Teukolsky, Vetterling and Flannery on numerical methods. The books comes in different editions reflecting different publication years and computer languages. (In the first two editions of the book, the authors implemented the algorithms respectively in Fortran and C.)

For generating Poisson variates, the book contents seems to have not changed over the editions that I looked at, which covered the programming languages Fortran (77 and 90), C, and C++. The authors cover Poisson generation in Section 7.3 in the Fortran and C editions. In the third edition of Numerical Recipes, they implement their methods in C++ in Section 7.3.12.

For small values of Poisson parameter $\lambda$, Numerical Recipes uses the direct method. For $\lambda >12$ values, an acceptance-rejection method is used, which relies upon finding a continuous version of the discrete Poisson probability distribution.

GSL Library (C)

In the GSL library, one can use the function gsl_ran_poisson, which uses the the direct method of exponential times. The code, which can be viewed here, cites simply Knuth (presumably the second volume). But it seems to use the aforementioend Algorithm PG (for producing Poisson variates) from the 1974 paper by Ahrens and Dieter 1974; see the section above on MATLAB.

NAG Library (C)

Although I didn’t see the code, it appears that the function nag_rand_poisson (g05tjc ) in the NAG library also uses the direct method, based on the material in the second volume of series by Knuth. But in a 1979 paper Atkinson says that the NAG library uses a method from the 1974 paper by Ahrens and Dieter.

Boost library Random (C++)

The Boost library Random uses the PTRD algorithm proposed in the 1993 paper by Hörmann to generate Poisson variates; see Algorithm PTRD on page 42 of the paper. In the same paper appears the PTRS method, which is used by Python (NumPy) (though implemented in C), as mentioned above.

MKL library (C)

In the MKL C library written by Intel, there seems to be three methods in use for generating Poisson variates.

The first function is called VSL_RNG_METHOD_POISSON_PTPE, which does the following for a Poisson distribution with parameter $\Lambda$:

If Λ ≥ 27, random numbers are generated by PTPE method. Otherwise, a combination of inverse transformation and table lookup methods is used. The PTPE method is a variation of the acceptance/rejection method that uses linear (on the fraction close to the distribution mode) and exponential (at the distribution tails) functions as majorizing functions. To avoid time-consuming acceptance/rejection checks, areas with zero probability of rejection are introduced and a squeezing technique is applied.

This function uses the so-called PTPE method, which is outlined in a 1981 technical report by Schmeiser and Kachitvichyanukul.

The second function is called VSL_RNG_METHOD_POISSON_POISNORM, which does the following :

If Λ < 1, the random numbers are generated by combination of inverse transformation and table lookup methods. Otherwise, they are produced through transformation of the normally distributed random numbers.

The third function is called VSL_RNG_METHOD_POISSONV_POISNORM, which does the following:

If Λ < 0.0625, the random numbers are generated by inverse transformation method. Otherwise, they are produced through transformation of normally distributed random numbers.

cuRAND (C)

Finally, there is the CUDA Random Number Generation library (cuRAND) developed by Nvidia for their (now ubiquitous) graphical processing units (GPUs). This C/C++ library has a function for generating Poisson variates. To see the C code, copies of it can be found in various GitHub repositories, such as this one. The cuRAND function curand_poisson uses the direct function for Poisson parameter values less than 64. For parameters values greater than 4000, it uses a normal approximation (rounded to the nearest integer).

For other values, the function curand_poisson uses a rejection method based on an approximation of the incomplete gamma function; see the function curand_poisson_gammainc. The book by Fishman is cited; see Section 8.16.

Box-Muller was not the favourite

Box-Muller the favourite on GPUs

Code considerations

Where are the pretty pictures?

Code

Further reading

Acknowledg(e)ments

Code considerations

Storing multi-dimensional values in C

Randomness in C

Code

Proof outline

Algorithm

The fall of the Box-Muller method

Update: The return of the Box-Muller method

Further reading

Websites

Papers

Books

Algorithm

Overview

1. Randomly sampling sequences

2. Using union-find

3. Filling in the parameter gaps

Code

MATLAB

Python

C

Future work

Further reading

Improving on artificial neural networks

Spiking neural networks are not smooth

Surrogate gradients

Sparse Spiking Gradients

Direct method doesn’t scale well

Different methods

Code

Methods implemented in popular libraries

MATLAB

R

Python (NumPy)

Octave

Numerical Recipes (Fortran, C and C++)

GSL Library (C)

NAG Library (C)

Boost library Random (C++)

MKL library (C)

cuRAND (C)

Further reading

Books

Articles