I wrote some C code that basically does what my code in this previous post does without the pretty pictures. The code performs a Metropolis-Hastings algorithm, simulating random variables (or, more correctly, generating random variates) for a joint probability density describing two random variables.
I have covered the topic of Markov chain Monte Carlo (MCMC) methods, particularly the central workhorse the Metropolis(-Rosenbluth-Rosenbluth-Teller-Teller)-Hastings algorithm. For more details on these methods, I have written about these methods in a couple posts, starting with this one and ending with this particularly relevant one.
Let’s call this code MCMC C code1C:/DOS C:/DOS/RUN RUN/DOS/RUN.
Challenges in C
The C programming language was not written for playing with random nubmers. The standard uniform random number generator in C is not good, on which I will elaborate later in this post.
In addition to that, I needed to simulate normal (or Gaussian) random variables. I wrote my own simple code for, which uses the Box-Muller transform, so the code would be self-contained. But in reality, you should use a function from a pre-written library for generating normally-distributed variates.
C was never intended as a scientific language, despite its wide use as one. (Historically, that was Fortran’s job.) So when handling sets of numbers, one has to use pointers and malloc more often that not, which can be a tricky. This is the case here, though the use of pointers is relatively simple.
Where are the pretty pictures?
Unfortunately, when number crunching in C, you don’t have immediately access to plotting libraries that are available in scientific programming languages such as Python and Julia.
But you can simply create a .csv (or text) file, and then plot that using whichever library you prefer. And if you have gnuplot installed, you can perform simple one-dimensional histograms using one-line commands such as:
gnuplot -e “plot ‘MCMCData_1D.csv’ using 1 bins=20;” -persist
In the above command, the file MCMCData_1D.csv has the random variates (the simulated random variables) stored in a single column. There’s an option in the code of MCMC_1D.c to plot that data using gnuplot.
I didn’t include code for plotting the results of the 2-D case as gnuplot doesn’t (easily) do a 2-D histogram.
Code
I only present code for the more complicated two-dimensional case. The code can be found here.
Warning: The code uses the standard pseudo-random number generator in C, which is known for being bad. The Mersenne Twister is a popular algorithm for producing such numbers. I only used the C generator to keep the code self-contained. For that reason, I also wrote my code for generating Gaussian (or normal) random variables, by using the Box-Muller transform, but in reality one would never do that for research or industry purposes.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include <stdbool.h>
#include <string.h>
const long double pi = 3.14159265358979323846; // constant pi for generating polar coordinates
static double *unirand(double *randValues, unsigned numbRand); // generate uniform random variables on (0,1)
static double *normrand(double *randValues, unsigned numbRand, double mu, double sigma);
static double pdf_single(double x_input, double y_input, double s);
static double mean_var(double *set_sample, unsigned numbSim, double *varX);
int main(int argc, char *argv[])
{
if (argc > 1)
{
fprintf(stderr, "This program takes no arguments...\n");
exit(1);
}
else
{
char strFilename[] = "MCMCData_2D.csv"; // filename for storing simulated random variates
// intializes (pseudo)-random number generator
time_t timeCPU; // use CPU time for seed
srand((unsigned)time(&timeCPU));
// srand(42); //to reproduce results
bool booleWriteData = true; // write data to file
bool booleStats = true; //perform simple mean/std stats
// parameters
unsigned numbSim = 1e4; // number of random variables simulated
unsigned numbSteps = 200; // number of steps for the Markov process
double sigma = 2; // standard deviation for normal random steps
// probability density parameters
double s = .5; // scale parameter for distribution to be simulated
// Metropolis-Hastings variables
double zxRand; // random step
double zyRand; // random step
double pdfProposal; // density for proposed position
double pdfCurrent; // density of current position
double ratioAccept; // ratio of densities (ie acceptance probability)
double uRand; // uniform variable for Bernoulli trial (ie a coin flip)
double *p_numbNormX = (double *)malloc(1 * sizeof(double));
double *p_numbNormY = (double *)malloc(1 * sizeof(double));
double *p_xRand = (double *)malloc(numbSim * sizeof(double));
double *p_yRand = (double *)malloc(numbSim * sizeof(double));
(void)unirand(p_xRand, numbSim); // random initial values
(void)unirand(p_yRand, numbSim); // random initial values
unsigned i, j; // loop varibales
for (i = 0; i < numbSim; i++)
{
// loop through each random walk instance (or random variable to be simulated)
pdfCurrent = pdf_single(*(p_xRand + i), *(p_yRand + i), s); // current probability density
for (j = 0; j < numbSteps; j++)
{
// loop through each step of the random walk
(void)normrand(p_numbNormX, 1, 0, sigma);
(void)normrand(p_numbNormY, 1, 0, sigma);
// take a(normally distributed) random step in x and y
zxRand = (*(p_xRand + i)) + (*p_numbNormX);
zyRand = (*(p_yRand + i)) + (*p_numbNormY);
pdfProposal = pdf_single(zxRand, zyRand, s); // proposed probability density
// acceptance rejection step
(void)unirand(&uRand, 1);
ratioAccept = pdfProposal / pdfCurrent;
if (uRand < ratioAccept)
{
// update state of random walk / Markov chain
*(p_xRand + i) = zxRand;
*(p_yRand + i) = zyRand;
pdfCurrent = pdfProposal;
}
}
}
free(p_numbNormX);
free(p_numbNormY);
if (booleStats){
// initialize statistics variables (for testing results)
double meanX = 0;
double meanY = 0;
double varX = 0;
double varY = 0;
meanX = mean_var(p_xRand, numbSim, &varX);
meanY = mean_var(p_yRand, numbSim, &varY);
double stdX = sqrt(varX);
double stdY = sqrt(varY);
printf("The average of the X random variables is %lf.\n", meanX);
printf("The standard deviation of the X random variables is %lf.\n", stdX);
printf("The average of the Y random variables is %lf.\n", meanY);
printf("The standard deviation of the Y random variables is %lf.\n", stdY);
}
if (booleWriteData)
{
// print to file
FILE *outputFile;
outputFile = fopen(strFilename, "w+");
// fprintf(outputFile, "valueSim\n");
for (i = 0; i < numbSim; i++)
{
fprintf(outputFile, "%lf,%lf\n", *(p_xRand + i), *(p_yRand + i)); // output to file
}
printf("Data printed to file.\n");
}
free(p_xRand);
free(p_yRand);
return (0);
}
}
static double pdf_single(double x_input, double y_input, double s)
{
// returns the probability density of a single point (x,y) inside a simulation window defined below
double pdf_output;
// non-zero density window parameters
double xMin = -1;
double xMax = 1;
double yMin = -1;
double yMax = 1;
if1
{
pdf_output = exp(-((pow(x_input, 4) + x_input * y_input + pow(y_input, 2)) / (s * s)));
}
else
{
pdf_output = 0;
}
return pdf_output;
}
static double *normrand(double *randValues, unsigned numbRand, double mu, double sigma)
{
// simulate pairs of iid normal variables using Box-Muller transform
// https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform
double U1, U2, thetaTemp, rhoTemp, Z1, Z2;
int i = 0;
while (i < numbRand)
{
// simulate variables in polar coordinates (theta, rho)
(void)unirand(&U1, 1);
thetaTemp = 2 * pi * U1; // create uniform theta values
(void)unirand(&U2, 1);
rhoTemp = sqrt(-2 * log(U2)); // create Rayleigh rho values
// change to Cartesian coordinates
Z1 = rhoTemp * cos(thetaTemp);
Z1 = sigma * Z1 + mu;
randValues[i] = Z1; // assign first of random variable pair
i++;
if (i < numbRand)
{
// if more variables are needed, generate second value of random pair
Z2 = rhoTemp * sin(thetaTemp);
Z2 = sigma * Z2 + mu;
randValues[i] = Z2; // assign second of random variable pair
i++;
}
else
{
break;
}
}
return randValues;
}
static double *unirand(double *randValues, unsigned numbRand)
{ // simulate numbRand uniform random variables on the unit interval
// storing them in randValues which must be allocated by the caller
// with enough space for numbRand doubles
for (int i = 0; i < numbRand; i++)
{
randValues[i] = (double)rand() / RAND_MAX;
}
return randValues;
}
static double mean_var(double *set_sample, unsigned numbSim, double *varX)
{
//mean and variance of set_sample
int i;
// initialize statistics variables (for testing results)
double meanX = 0;
double meanXSquared = 0;
double tempX;
for (i = 0; i < numbSim; i++)
{
tempX = *(set_sample + i);
meanX += tempX / ((double)numbSim);
meanXSquared += tempX * tempX / ((double)numbSim);
}
*varX = meanXSquared - meanX * meanX;
return meanX;
}
Acknowledg(e)ment: A hat tip to Alex Stivala who pointed out a couple of minor issues in my original C code.
- x_input >= xMin) && (x_input <= xMax) && (y_input >= yMin) && (y_input <= yMax [↩]