The Metropolis-Hastings algorithm in C with multi-variable densities

Here’s a C implementation of the Metropolis(-Rosenbluth-Rosenbluth-Teller-Teller)-Hastings algorithm that can handle joint multi-variable probability densities, thus simulating a finite number of random variables. The Metropolis-Hastings algorithm is the central piece in Markov chain Monte Carlo (MCMC) methods. They have become essential in Bayesian statistics, where they are used to tackle high-dimensional integration. I have written a couple posts about these methods, starting with this one and ending with one.

My previous C code only works with densities of a couple variables.  That code is covered in this post, which in turn is based on a previous post, in which I presented the algorithm implemented in the scientific programming languages Python, Julia, MATLAB and R. Those examples were for mostly illustration purposes, as there are already good pre-written libraries in those languages.

Code considerations

I’ll describe some considerations for implementing this algorithm in C. Some of this overlaps with what I wrote on the previous post. In writing this more general version of the code, I arguably cleaned up the old code, which is often the case when you generalize things in coding and mathematics.

Storing multi-dimensional values in C

In C, when handling sets of numbers, such as vectors and matrices, one has to use pointers and the malloc function more often that not, which can create headaches. For this algorithm, probably the most important data object is the one for storing the (current) positions of the multi-dimensional random walks across all the simulations run. This conceptually results in a rectangular grid or table of numbers. In mathematics, you would just use a \(d \times n\) matrix, where is \(d\) is the number of dimensions and \(n\) is the number of simulation runs.

In most programming languages, this table of numbers typically corresponds to a \(d \times n\) array. More specifically in C, you could do this with a multi-dimensional array or an array of pointers. But in this case, you don’t need to use the inherent structure of a matrix. Besides, any multi-dimensional array in C will be stored in memory as a one-dimensional array, so you can store the numbers in a simple one-dimensional array or vector with \(d \cdot n\) elements in total.1I’ve used the \(\cdot\) notation here to highlight the multiplication. The code does this using a single pointer.

Then using this vector, you just need to index (or map) between the vector and the matrix appropriately, which the code does with the (integer) variable indexSimDim. There are simple one-to-one mappings between elements in the vector and the elements in \(d \times n\) matrix. For example, matrix row \(i\) and column \(k\) corresponds to the element number \(m=i\cdot d+k\) in the vector; see the code below.

unsigned indexSimDim; // index for keeping track of two-dimensional data as a one-dimensional array // indexSimDim = i * numbDim + k, where i is simulation number and k is dimension number (minus one)

This is a standard trick. In fact, you’ll see this type of line of code as necessary boilerplate in CUDA-based (and similar) code in kernels (routines) due to the inherent grid hierarchy of graphical processing units (GPUs). But that discussion deserves a post on its own.

Randomness in C

In previous posts, I’ve remarked that the standard uniform random number generator in C, called rand,  is not good enough for research level randomness. That said, it works fine for regular simulations. The Mersenne Twister is a widely recommended and used algorithm for producing such numbers.

To simulate normal (or Gaussian) random variables, I wrote my own simple function using the Box-Muller transform, which I covered in a previous post, so the code would be self-contained. But in practice, you should always use pre-written and tested functions.

Code

The code can be found here, whereas other MCMC code can be found here and here. The joint probability density is defined in the function pdf_single.

/***********************************************************************
 * Runs a simple Metropolis-Hastings (ie MCMC) algorithm to simulate n
 * jointly distributed random variables with probability density p(x,y).
 * For example:
 * p(x,y)=exp(-(x^4+x*y+y^2+y*z+z^4)/s^2)/consNorm, where s>0 and consNorm is a
 * normalization constant. The probability density function is defined in
 * the function pdf_single.
 *
 * NOTE: In practice, the value of the normalization constant is not needed, as it cancels out in the algorithm.
 *
 * NOTE: This code will *create* a local file (see variable strFilename) to store results. It will *overwrite* that file if it already exists.
 *
 * WARNING: This code uses the default C random number generator, which is known for failing various tests of randomness.
 * Strongly recommended to use another generator for purposes beyond simple illustration.
 *
 * Author: H. Paul Keeler, 2024.
 * Website: hpaulkeeler.com
 * Repository: github.com/hpaulkeeler/posts
 *
 ***********************************************************/

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include <stdbool.h>
#include <string.h>

const long double pi = 3.14159265358979323846; // constant pi for generating polar coordinates

// helper function declarations; see below for definitions
static double *unirand(double *randValues, unsigned numbRand);                           // generate  uniform random variables on (0,1)
static double *normrand(double *randValues, unsigned numbRand, double mu, double sigma); // generate normal random variables
static double pdf_single(double *x_input, unsigned numbDim, double *parameters);           // define probability density to be simulated
static double mean_var(double *set_sample, unsigned numbSim, double *varX);              // calculate meana and variance

int main(int argc, char *argv[])
{

    if (argc > 1)
    {
        fprintf(stderr, "This program takes no arguments...\n");
        exit(1);
    }
    else
    {
        char strFilename[] = "MCMCData_ND.csv"; // filename for storing simulated random variates

        // intializes (pseudo)-random number generator
        time_t timeCPU; // use CPU time for seed
        srand((unsigned)time(&timeCPU));
        // srand(42); //to reproduce results

        bool booleWriteData = true; // write data to file
        bool booleStats = true;     // perform simple mean/std stats
        unsigned numbDimMax = 3; //upper bound on number of dimensions for which stats are calculated and printed out

        // simulation parameters
        unsigned numbSim = 1e4;   // number of random variables simulated
        unsigned numbSteps = 200; // number of steps for the Markov process
        double sigma = 2;         // standard deviation for normal random steps

        // probability density parameters
        double s = .5; // scale parameter for distribution to be simulated
        unsigned numbDim = 3;

        // Metropolis-Hastings variables
        // proposal for a new position in the random walk
        double *tRandProposal = (double *)malloc(numbDim * sizeof(double));
        double pdfProposal;      // density for proposed position
        double pdfCurrent;       // density of current position
        double ratioAccept;      // ratio of densities (ie acceptance probability)
        double uRand;            // uniform variable for Bernoulli trial (ie a coin flip)
        // random step (normally distributed)
        double *p_numbNormT = (double *)malloc(1 * sizeof(double));
        // positions of the random walk (ie the simualted random variables after numbSteps)
        double *p_tRand = (double *)malloc(numbDim * numbSim * sizeof(double));

        unsigned i, j, k;     // loop varibales
        unsigned indexSimDim; // index for keeping track of two-dimensional data as a one-dimensional array
        // Typically indexSimDim = i * numbDim + k, where i is simulation number and k is dimension number (minus one)

        double *p_tRandCurrent = (double *)malloc(numbDim * sizeof(double));
        (void)unirand(p_tRand, numbDim * numbSim); // random initial values

        for (i = 0; i < numbSim; i++)
        {
            // loop through each random walk instance (or random variable to be simulated)
            for (k = 0; k < numbDim; k++)
            {
                // loop through dimensions
                indexSimDim = i * numbDim + k;
                // update state of random walk / Markov chain
                *(p_tRandCurrent + k) = *(p_tRand + indexSimDim);
            }

            pdfCurrent = pdf_single(p_tRandCurrent, numbDim, & s); // current probability density

            for (j = 0; j < numbSteps; j++)
            {
                // loop through each step of the random walk
                for (k = 0; k < numbDim; k++)
                {
                    // loop through dimensions
                    indexSimDim = i * numbDim + k;
                    (void)normrand(p_numbNormT, 1, 0, sigma);
                    // take a(normally distributed) random step in x, y and y
                    *(tRandProposal+k) = *(p_tRand + indexSimDim) + *(p_numbNormT);
                }
                pdfProposal = pdf_single(tRandProposal, numbDim, & s); // proposed probability density

                // acceptance rejection step
                (void)unirand(&uRand, 1);
                ratioAccept = pdfProposal / pdfCurrent;
                if (uRand < ratioAccept)
                {
                    for (k = 0; k < numbDim; k++)
                    {
                        // loop through dimensions
                        indexSimDim = i * numbDim + k;
                        // update state of random walk / Markov chain
                        *(p_tRand + indexSimDim) = tRandProposal[k];
                    }
                    pdfCurrent = pdfProposal;
                }
            }
        }

        free(p_numbNormT);

        if (booleStats)
        {
            // initialize statistics variables (for testing results)
            double *p_AllRand = (double *)malloc(numbSim * sizeof(double));
            double meanTemp = 0;
            double varTemp = 0;
            double stdTemp = 0;
            unsigned numbDimStats = fmin(numbDimMax, numbDim); //number of dimensions for which stats are calculated and printed out
            for (k = 0; k < numbDimStats; k++)
            {
                // loop through all the dimensions
                for (i = 0; i < numbSim; i++)
                {
                    // collect variables for dimension k+1
                    indexSimDim = i * numbDim + k;
                    *(p_AllRand + i) = *(p_tRand + indexSimDim);
                }
                meanTemp = mean_var(p_AllRand, numbSim, &varTemp);
                stdTemp = sqrt(varTemp);
                printf("The average of dimension %d random variables is %lf.\n", k + 1, meanTemp);
                printf("The standard deviation of dimension %d random  variables is %lf.\n", k + 1, stdTemp);
            }
        }

        if (booleWriteData)
        {
            // print to file
            FILE *outputFile;
            outputFile = fopen(strFilename, "w");

            // create string of spacers (ie commas and newlines)
            char *strSpacer = (char *)malloc((numbDim + 1) * sizeof(char));
            for (k = 0; k < numbDim - 1; k++)
            {
                *(strSpacer + k) = ',';
            }
            strSpacer[numbDim - 1] = '\n';
            strSpacer[numbDim] = '\0';
            for (i = 0; i < numbSim; i++)
            {
                for (k = 0; k < numbDim; k++)
                {
                    indexSimDim = i * numbDim + k;
                    fprintf(outputFile, "%lf%c", *(p_tRand + indexSimDim), strSpacer[k]);
                }
            }

            fclose(outputFile);
            printf("Data printed to file.\n");
        }
        free(p_tRand);

        return (0);
    }
}

static double pdf_single(double *x_input, unsigned numbDim, double *parameters)
{
    // returns the probability density of a single point inside a simulation window defined below
    
    double pdf_output = 0; //probability density at a single point
    
    // non-zero density square window parameters
    double xMin = -1;
    double xMax = 1;

    // retrieve variables
    double x = *(x_input + 0);
    double y = *(x_input + 1);
    double z = *(x_input + 2);
    // retrieve scale parameter
    double s = *(parameters + 0);

    int i;
    //check point is inside simulation window
    bool booleInsideWindow = true;
    for (i = 0; i < numbDim; i++){
        booleInsideWindow = booleInsideWindow &1;
    }

    // define probability density
    if (booleInsideWindow)
    {
        // evaluate probability density
        pdf_output = exp(-((pow(x, 4) + x * y + pow(y, 2) + y * z + pow(z, 4)) / (s * s)));
    }
    else
    {
        pdf_output = 0;
    }
    return pdf_output;
}

static double *normrand(double *randValues, unsigned numbRand, double mu, double sigma)
{
    // simulate pairs of iid normal variables using Box-Muller transform
    // https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform

    double U1, U2, thetaTemp, rhoTemp, Z1, Z2;
    int i = 0;
    while (i < numbRand)
    {
        // simulate variables in polar coordinates (theta, rho)
        (void)unirand(&U1, 1);
        thetaTemp = 2 * pi * U1; // create uniform theta values
        (void)unirand(&U2, 1);
        rhoTemp = sqrt(-2 * log(U2)); // create Rayleigh rho values

        // change to Cartesian coordinates (Z1, Z2)
        Z1 = rhoTemp * cos(thetaTemp);
        Z1 = sigma * Z1 + mu;
        randValues[i] = Z1; // assign first of random variable pair
        i++;
        if (i < numbRand)
        {
            // if more variables are needed, generate second value of random pair
            Z2 = rhoTemp * sin(thetaTemp);
            Z2 = sigma * Z2 + mu;
            randValues[i] = Z2; // assign second of random variable pair
            i++;
        }
        else
        {
            break;
        }
    }
    return randValues;
}

static double *unirand(double *randValues, unsigned numbRand)
{ // simulate numbRand uniform random variables on the unit interval
  // storing them in randValues which must be allocated by the caller
  // with enough space for numbRand doubles

    for (int i = 0; i < numbRand; i++)
    {
        randValues[i] = (double)rand() / RAND_MAX;
    }
    return randValues;
}

static double mean_var(double *set_sample, unsigned numbSim, double *varX)
{
    // mean and variance of set_sample
    int i;
    // initialize statistics variables (for testing results)
    double meanX = 0;
    double meanXSquared = 0;
    double tempX;
    for (i = 0; i < numbSim; i++)
    {
        tempX = *(set_sample + i);
        meanX += tempX / ((double)numbSim);
        meanXSquared += tempX * tempX / ((double)numbSim);
    }

    *varX = meanXSquared - meanX * meanX;
    return meanX;
}
  1. x_input[i] >= xMin) & (x_input[i] <= xMax []

Markov goes to Monte Carlo

Numerical and scientific fields such as statistics, computational physics, and optimization methods heavily rely upon Markov chain Monte Carlo methods. These simulation techniques use the power of Markov chains to sample general probability distributions, which in turn give Monte Carlo methods for estimating integrals and optimal solutions.

This is the third part of a multi-part series of posts. The first part covers Markov chains. The second part covers the basics of Monte Carlo methods. This post combines the ideas from the first two parts. Overall, the posts sketch the mechanics of Markov chain Monte Carlo (MCMC) methods.

We will only examine Markov chains with countable state spaces. In this setting, we only need standard probability and matrix knowledge. But the results extend to general state spaces, such as Euclidean space.

By the way, I doubt that Andrey A. Markov ever went to Monte Carlo in the small country of Monaco.1But Herbie did.

Monte Carlo conditions

Sum

In the first post we covered Markov chains with countable state spaces, because the mathematics is more transparent. Consequently, we can use such Markov chains to estimate the sum of a function \(f\) over a countable (possibly infinite) set \(\mathbb{S}\). We can write the sum as

$$\begin{align}S(f)&=\sum_{x\in \mathbb{S}} f(x) \\&= \sum_{x\in\mathbb{S}} \frac{f(x)q(x)}{q(x)}\\& = \mathbb{E}_{q}[ \frac{f(Y)}{q(Y)}]\\& = \mathbb{E}_{q}[ g(Y)]\,,\end{align}$$

where \(g=f/q\) and \(Y\) is a suitable random variable that has a probability mass function \(q\) with support \(\mathbb{S}\).

Sampling the random variable (or point) \(Y\), gives the samples \(y_1,\dots,y_n\), which yields an unbiased Monte Carlo estimate of the sum, namely

$$S_n (f)=\frac{1}{n}\sum_{i=1}^n \frac{f(y_i)}{q(y_i)}.$$

The closer the function \(q\) is to \(|f|\), the better the estimate.

Integral

We can write an integral over some region \(\mathbb{S}\) as

$$\begin{align}\int_{\mathbb{S}} f(x) dx &= \int_{\mathbb{S}} \frac{f(x)}{p(x)}p(x)dx\\ & =\mathbb{E}_p[\frac{f(Y)}{p(Y)}]\,,\end{align}$$

where \(p\) is the probability density of a random variable (or point) \(Y\) with support \(A\). Then by sampling \(Y\), the resulting samples \(y_1, \dots, y_n\in [0,1]\) give the unbiased Monte Carlo estimate

$$I_n (f)=\frac{1}{n}\sum_{i=1}^n \frac{f(y_i)}{p(y_i)}\,.$$

Ergodic sequence

To estimate the sum or integral, we’ll need a sequence of ergodic random variables \(\{Y_i\}_{i\in\mathbb{N}}\), which we can achieve with an ergodic Markov chain with a state space \(\mathbb{S}\).

Markov chain conditions

In the context of Markov chains and, more generally, stochastic processes, we can think of ergodicity as convergence and a slightly more general version of the law of large numbers. A Markov chain with a countable state space needs some conditions to ensure ergodicity.

Regularity conditions for countable a Markov chain

  1. A stationary distribution \(\pi\)
  2. Aperiodicity
  3. Irreducibility
  4. Postive recurrence

In general, most of the needed structure for the Markov chain comes from irreducibility and a stationary distribution existing. Sometimes the above conditions are redundant. For example, an aperiodic, irreducible Markov chain with a finite state space is always positive recurrent. We look at ways on how to satisfy these conditions in the Monte Carlo setting when the underlying state space is countable.

Stationary distribution \(\pi\)

A stationary distribution \(\pi\) satisfies equation

$$ \pi=P\pi.$$

We know exactly what our stationary distribution \(\pi\) needs to be. It is the one used in the Monte Carlo method, namely the distribution of \(Y\). The challenge is constructing a Markov chain with such a stationary distribution. This will of course depend on the Markov (transition) kernel \(P\) of the Markov chain.

We’ll look at the other conditions on the Markov kernel \(P\).

Aperiodicity

We recall that the period \(d_x\) of a state \(x\in \mathbb{X}\) is the greatest common divisor of all \(n\) values such that \(P(x,x)^n>0\). We need every state of the countable Markov chain to be aperiodic, meaning \(d_x=1\) for all \(x\in\mathbb{X}\).

One sufficient way to achieve this requirement is to first have the countable Markov chain to be irreducible, which we’ll cover in the next section. Then if there exists at least one state \(x\in \mathbb{x}\) such that \(P(x,x)>0\), then the Markov chain is aperiodic.2Lemma 1.8.2 in Markov Chains by Norris or Remark 6.2.5 in Probability and Stochastic Processes by Brémaud.

Clearly the last condition is not difficult to satisfy, putting the focus now on the irreducibility requirement.

Irreducibility

The irreducibility property says that it is possible for the Marko chain to get from any point to another point in a finite number of steps. More precisely, for an irreducible Markov chain with countable state space, there must exist for all \(x,y\in\mathbb{X}\) a natural number \(s\) (possibly depending \(x\) and \(y\)) such that \(P(x,y)^s>0\).

We can achieve this requirement by introducing a stronger requirement, which is easier to verify. A countable Markov chain with a transition \(P\) will be irreducible if for all \(x,y\in\mathbb{X}\) the condition \(P(x,y)>0\) holds. We have simply required that the above number \(s=1\) for all \(x,y\in\mathbb{X}\).

Furthermore, if \(P(x,x)>0\) for all \(x\in\mathbb{X}\), then our irreducible countable Markov chain is also aperiodic.

Positive recurrence

For a point or state \(x\in\mathbb{X}\), we recall its first return time being defined as

$$ T_x^+=\min\{ t\geq 1: X_t=x\} \,.$$

A state \(x\) is called positive recurrent if the expected value of its first return time is finite, meaning \(\mathbb{E}_x(T_x^+)<\infty\). For a countable Markov chain, if all the states in the state space are positive recurrent, then we say the Markov chain is positive recurrent.

A countable irreducible Markov chain is positive recurrent if (and only if) it has a stationary distribution \(\pi\). Furthermore, that stationary distribution will be unique and nonzero, meaning \(\pi(x)>0\) for all \(x\in\mathbb{X}\).3Theorem 1.7.7 in Markov Chains by Norris or Theorem 6.3.14 in Probability and Stochastic Processes by Brémaud.

For positive recurrence, we need to prove that the Markov chain has a stationary distribution.

Ergodicity

A countable Markov chain that is aperiodic and irreducible is ergodic. In other words, an ergodic Markov process \(X\) with stationary distribution \(\pi\) is a random sequence \(X_0, X_1,\dots\) such that the following holds almost surely

$$ \frac{1}{t}\sum_{i=0}^{t-1} g(X_i) \rightarrow \sum_{x\in\mathbb{X}}g(x)\pi(x)$$

as \(t\rightarrow\infty\) for all bounded, nonnegative functions \(g\). Conveniently, wllle can use such an ergodic Markov chain to make statistical estimates.

Sampling with a Markov chain

For the Monte Carlo sample \(y_1,\dots,y_n\), we simply use the Markov chain samples \(x_0,x_1,\dots,x_{n-1}\), giving the Monte Carlo estimate. In the discrete case, this is simply

$$S_n (f)=\frac{1}{n}\sum_{i=1}^n g(x_{i-1}).$$

which, due to ergodicity, converges to \(S(f)=\mathbb{E}_{Y}[g(Y)]\).

Summary

In summary, a sufficient way to have a countable ergodic Markov chain is for the transition kernel to be such that \(P(x,y)>0\) for all \(x\in\mathbb{X}\) and there exists a (stationary) distribution \(\pi=P\pi\). For the first requirement, one imagines using a non-negative function such as \(e^{-(x-y)^2}/C\), where \(C>0\) is a suitable normalization constant.

The requirements seem possible, but to bring them all together is still a great challenge for the uninitiated. Fortunately, some great minds almost seven decades ago proposed the first Markov chain that meets all the requirements for a specific stationary distribution, which was used for a Monte Carlo estimate.

The first such Markov chain Monte Carlo method appeared in a paper written by five scientists, including two husband-wife pairs. It became known as the Metropolis or Metropolis-Hastings algorithm, and it will be the subject of a future post.

Further reading

Books

Markov chains

Any stochastic process book will include a couple of chapters on Markov chains. For more details, there are many books dedicated entirely to the subject of Markov chains. For example, introductory books include:

  • Brémaud – Markov Chains, Gibbs Fields, Monte Carlo Simulation and Queues;
  • Levin, Peres, and Wilmer – Markov Chains and Mixing Times;
  • Norris – Markov Chains.

Those books cover Markov chains with countable state spaces. If you want to read about discrete-time Markov chains with general state spaces, try the book:

  • Meyn, Tweedie – Markov chains and stochastic stability

All the above books have a section detailing a Markov chain Monte Carlo method, such as the Metropolis-Hastings algorithm and the Gibbs sampler.

Monte Carlo methods

Books dedicated to Monte Carlo approaches include:

For a statistics context, there’s also the good book:

  • Casella and Robert – Monte Carlo Statistical Methods.
Waw