The Metropolis-Hastings algorithm in C

There’s an old saying, which I just made up. If you really want to understand something, code it up in C.1Taking it to the next level, I recall an undergraduate subject on programming Motorola chips using assembly. But I wouldn’t recommend any of that.

I wrote some C code that basically does what my code in this previous post does without the pretty pictures. The code performs a Metropolis-Hastings algorithm, simulating random variables (or, more correctly, generating random variates) for a joint probability density describing two random variables.

In previous posts, I have covered the topic of Markov chain Monte Carlo (MCMC) methods, particularly the central workhorse, the Metropolis(-Rosenbluth-Rosenbluth-Teller-Teller)-Hastings algorithm.  These methods are frequently used in Bayesian statistics, high-dimensional integration, and optimization. For more details on how they work, I have written a couple posts, starting with this one and ending with this one, where I detail the mechanics of MCMC methods.

Let’s call this code MCMC C code2 Wise man once wrote: C:/DOS C:/DOS/RUN RUN/DOS/RUN.

Code considerations

The C programming language was not written for playing with random numbers. The standard uniform random number generator in C, simply called rand,  is not good for research level random simulations. But it does the job for illustration purposes.

In lieu of this generator, the Mersenne Twister is a popular algorithm for producing such numbers, which is widely recommended and used. There are implementations of this algorithm in the CUDA and MKL libraries for Nvidia GPUs and Intel CPUs; see this CUDA page and this MKL page for details. Check out this PDF file for further details on the CUDA version.

In addition to that, my C code needs to simulate normal (or Gaussian) random variables. For that I wrote my own simple code using the Box-Muller transform, which I covered in a previous post, so the code would be self-contained and less opaque. But in reality, you should always use functions from a pre-written library for generating variates according to a normal or whichever distribution.

Finally, C was never intended as a scientific language, despite its wide use as one. (Historically, that was Fortran’s job, which is still the workhorse for many serious number-crunching institutes, hence why there’s Fortran -ready version of CUDA.) So when handling sets of numbers, such as vectors and matrices, one has to use pointers and malloc more often that not, which can be a tricky. This is the case here, though the use of pointers in this code is relatively simple.

Where are the pretty pictures?

Unfortunately, when number crunching in C, you don’t immediately have access to plotting libraries that are available in scientific programming languages such as Python and Julia.

But you can simply create .csv (or text) files, and then plot them using whichever library you prefer. And if you have gnuplot installed, you can perform simple one-dimensional histograms using one-line commands such as:

gnuplot -e “plot ‘MCMCData_1D.csv’ using 1 bins=20;” -persist

In the above command, the file MCMCData_1D.csv has the random variates (the simulated random variables) stored in a single column.

Code

I only present code for the more complicated two-dimensional case. The code can be found here.

Note that there’s an option in the code of MCMC_1D.c to plot the results using gnuplot, if it’s installed on your machine. I didn’t include code for plotting the results of the 2-D case as gnuplot doesn’t do a 2-D histogram.

Warning: The code uses the standard pseudo-random number generator in C, which is known for being bad. I only used built-in the C generator to keep my code self-contained. For that reason, I also wrote my code for generating Gaussian (or normal) random variables, by using the Box-Muller transform, but in reality one would never do that for research or industry purposes.

/***********************************************************************
 * Runs a simple Metropolis-Hastings (ie MCMC) algorithm to simulate two
 * jointly distributed random variables with probability density
 * p(x,y)=exp(-(x^4+x*y+y^2)/s^2)/consNorm, where s>0 and consNorm is a
 * normalization constant. The probability density function is defined in
 * the function pdf_single.
 *
 * NOTE: In practice, the value of the normalization constant is not needed, as it cancels out in the algorithm.
 *
 * NOTE: This code will *create* a local file (see variable strFilename) to store results. It will *overwrite* that file if it already exists.
 *
 * WARNING: This code uses the default C random number generator, which is known for failing various tests of randomness.
 * Strongly recommended to use another generator for purposes beyond simple illustration.
 *
 * Author: H. Paul Keeler, 2024.
 * Website: hpaulkeeler.com
 * Repository: github.com/hpaulkeeler/posts
 *
 ***********************************************************/

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include <stdbool.h>
#include <string.h>

const long double pi = 3.14159265358979323846; // constant pi for generating polar coordinates

// helper function declarations; see below for definitions
static double *unirand(double *randValues, unsigned numbRand);                           // generate  uniform random variables on (0,1)
static double *normrand(double *randValues, unsigned numbRand, double mu, double sigma); // generate normal random variables
static double pdf_single(double x, double y, double s);                      // define probability density to be simulated
static double mean_var(double *set_sample, unsigned numbSim, double *varX);              // calculate meana and variance

int main(int argc, char *argv[])
{

    if (argc > 1)
    {
        fprintf(stderr, "This program takes no arguments...\n");
        exit(1);
    }
    else
    {

        char strFilename[] = "MCMCData_2D.csv"; // filename for storing simulated random variates

        // intializes (pseudo)-random number generator
        time_t timeCPU; // use CPU time for seed
        srand((unsigned)time(&timeCPU));
        // srand(42); //to reproduce results

        bool booleWriteData = true; // write data to file
        bool booleStats = true;     // perform simple mean/std stats

        // parameters
        unsigned numbSim = 1e4;   // number of random variables simulated
        unsigned numbSteps = 200; // number of steps for the Markov process
        double sigma = 2;         // standard deviation for normal random steps

        // probability density parameters
        double s = .5; // scale parameter for distribution to be simulated

        // Metropolis-Hastings variables
        // proposal for a new position in the random walk
        double zxRandProposal;      
        double zyRandProposal;      
        double pdfProposal; // density for proposed position
        double pdfCurrent;  // density of current position
        double ratioAccept; // ratio of densities (ie acceptance probability)
        double uRand;       // uniform variable for Bernoulli trial (ie a coin flip)
        // random step (normally distributed)
        double *p_numbNormX = (double *)malloc(1 * sizeof(double));
        double *p_numbNormY = (double *)malloc(1 * sizeof(double));
//positions of the random walk (ie the simualted random variables after numbSteps)
        double *p_xRand = (double *)malloc(numbSim * sizeof(double));
        double *p_yRand = (double *)malloc(numbSim * sizeof(double));

        (void)unirand(p_xRand, numbSim); // random initial values
        (void)unirand(p_yRand, numbSim); // random initial values

        unsigned i, j; // loop varibales
        for (i = 0; i < numbSim; i++)
        {
            // loop through each random walk instance (or random variable to be simulated)

            pdfCurrent = pdf_single(*(p_xRand + i), *(p_yRand + i), s); // current probability density

            for (j = 0; j < numbSteps; j++)
            {
                // loop through each step of the random walk
                (void)normrand(p_numbNormX, 1, 0, sigma);
                (void)normrand(p_numbNormY, 1, 0, sigma);
                // take a(normally distributed) random step in x and y
                zxRandProposal = (*(p_xRand + i)) + (*p_numbNormX);
                zyRandProposal = (*(p_yRand + i)) + (*p_numbNormY);

                pdfProposal = pdf_single(zxRandProposal, zyRandProposal, s); // proposed probability density

                // acceptance rejection step
                (void)unirand(&uRand, 1);
                ratioAccept = pdfProposal / pdfCurrent;
                if (uRand < ratioAccept)
                {
                    // update state of random walk / Markov chain
                    *(p_xRand + i) = zxRandProposal;
                    *(p_yRand + i) = zyRandProposal;
                    pdfCurrent = pdfProposal;
                }
            }
        }

        free(p_numbNormX);
        free(p_numbNormY);

        if (booleStats)
        {

            // initialize statistics variables (for testing results)
            char strVariable[] = "XY";
            double *p_AllRand[] = {p_xRand, p_yRand};
            double meanTemp = 0;
            double varTemp = 0;
            double stdTemp = 0;
            char strTemp='X';
            for (i = 0; i < 2; i++)
            {
                meanTemp = mean_var(p_AllRand[i], numbSim, &varTemp);
                stdTemp = sqrt(varTemp);
                strTemp=strVariable[i];
                printf("The average of the %c random variables is %lf.\n", strTemp, meanTemp);
                printf("The standard deviation of the %c random  variables is %lf.\n", strTemp, stdTemp);
            }
        }

        if (booleWriteData)
        {
            // print to file
            FILE *outputFile;
            outputFile = fopen(strFilename, "w");
            for (i = 0; i < numbSim; i++)
            {
                fprintf(outputFile, "%lf,%lf\n", *(p_xRand + i), *(p_yRand + i)); // output to file
            }
            fclose(outputFile);
            printf("Data printed to file.\n");
        }
        free(p_xRand);
        free(p_yRand);

        return (0);
    }
}

static double pdf_single(double x, double y, double s)
{
    // returns the probability density of a single point (x,y) inside a simulation window defined below
    double pdf_output;

    // non-zero density window parameters
    double xMin = -1;
    double xMax = 1;
    double yMin = -1;
    double yMax = 1;

    if1
    {
        pdf_output = exp(-((pow(x, 4) + x * y + pow(y, 2)) / (s * s)));
    }
    else
    {
        pdf_output = 0;
    }
    return pdf_output;
}

static double *normrand(double *randValues, unsigned numbRand, double mu, double sigma)
{
    // simulate pairs of iid normal variables using Box-Muller transform
    // https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform

    double U1, U2, thetaTemp, rhoTemp, Z1, Z2;
    int i = 0;
    while (i < numbRand)
    {
        // simulate variables in polar coordinates (theta, rho)
        (void)unirand(&U1, 1);
        thetaTemp = 2 * pi * U1; // create uniform theta values
        (void)unirand(&U2, 1);
        rhoTemp = sqrt(-2 * log(U2)); // create Rayleigh rho values

        // change to Cartesian coordinates
        Z1 = rhoTemp * cos(thetaTemp);
        Z1 = sigma * Z1 + mu;
        randValues[i] = Z1; // assign first of random variable pair
        i++;
        if (i < numbRand)
        {
            // if more variables are needed, generate second value of random pair
            Z2 = rhoTemp * sin(thetaTemp);
            Z2 = sigma * Z2 + mu;
            randValues[i] = Z2; // assign second of random variable pair
            i++;
        }
        else
        {
            break;
        }
    }
    return randValues;
}

static double *unirand(double *randValues, unsigned numbRand)
{ // simulate numbRand uniform random variables on the unit interval
  // storing them in randValues which must be allocated by the caller
  // with enough space for numbRand doubles

    for (int i = 0; i < numbRand; i++)
    {
        randValues[i] = (double)rand() / RAND_MAX;
    }
    return randValues;
}

static double mean_var(double *set_sample, unsigned numbSim, double *varX)
{
    // mean and variance of set_sample
    int i;
    // initialize statistics variables (for testing results)
    double meanX = 0;
    double meanXSquared = 0;
    double tempX;
    for (i = 0; i < numbSim; i++)
    {
        tempX = *(set_sample + i);
        meanX += tempX / ((double)numbSim);
        meanXSquared += tempX * tempX / ((double)numbSim);
    }

    *varX = meanXSquared - meanX * meanX;
    return meanX;
}

Further reading

Here’s some information on the rand function and random number generation:

  • https://codeforces.com/blog/entry/61587
  • https://c-faq.com/lib/randrange.html
  • https://www.codeproject.com/KB/recipes/pitfalls_random_number.aspx

Acknowledg(e)ments

A hat tip to C and CUDA guru Alex Stivala who pointed out a couple of minor issues in my original C code.

  1. x >= xMin) && (x <= xMax) && (y >= yMin) && (y <= yMax []

The Metropolis-Hastings algorithm in Python, Julia, R, and MATLAB

I wrote this code a little while ago, and I thought now would be a good time to present it, as I have covered the topic of Markov chain Monte Carlo (MCMC) methods, particularly the central workhorse the Metropolis(-Rosenbluth-Rosenbluth-Teller-Teller)-Hastings algorithm.  For more details on these methods, I have written about these methods in a couple posts, starting with this one and ending with this particularly relevant one.

Update: I re-wrote this code in C, located here, where I used my own code for generating Gaussian (or normal) variables, to keep the code entirely self-contained, but in practice you should never do that.

What the code does

The code basically does the same thing in four different (scientific) programming languages, namely Python, Julia, C, and MATLAB. It performs a Metropolis-Hastings algorithm, simulating random variables (or more correctly, generating random variates) for two respective probabilities densities in one dimension and two dimensions.

Two examples

The one-dimensional case is particularly simple, as the code only simulates an exponential variable. But for this random variable in practice, you would never ever do with any MCMC method, because simply simulate exponential random variables directly. I discussed in a previous post how this direct approach is used to simulate Poisson variables.

The two-dimensional case is slightly more complicated than the the classic joint Gaussian (or normal) probability density, which you would use other methods to simulate. But the idea can be extended to \(n\) dimensions, which is often the case when dealing with the probability distributions and their corresponding integrals that arise in Bayesian statistics.

Implementation considerations

Variance of the walk

To create the random walk, the code uses normal (or Gaussian) random variables, where the mean is simply the current position of the random walk. This is a standard approach due to the convenient properties of normal distribution.

There’s also the standard deviation \(\sigma>0\) of the normal variables.  In machine learning circles, this is what they call a hyperparameter. The random Metropolis-Hastings algorithm will, in theory, work regardless of the value, but some values result in faster results than others.  This issue is covered briefly in this question on Stats Exchange. And it raises an important question when using MCMC methods in general.

Convergence tests

How many simulations steps are enough to ensure that the random variables being simulated behave closely enough to the desired random variables? In other words, how long does it take for the algorithm to target distribution?

The simulation time elapsed before the algorithm has reached a certain level of sufficient called the burn-in period. Due to its vital importance, it is a central topic in the development and implementation of MCMC methods.

There are tests for assessing the degree of convergence during the simulation run such as the Gelman-Rubin test. This particular test involves taking simple empirical means and variances over the last \(m\) samples, across all simulation runs, and between simulation runs. Then a ratio is calculated of variances is calculated, which should be close to one if the algorithm is sufficiently converged. I haven’t implemented any such tests here, but it’s something that one should do in practice.

Perhaps I’ll cover that topic in a future post.

Testing the results

To test the results, you can empirical calculate the first two moments (that is, the mean and variance) of the simulated random variables. That acts as a very good sanity check.

But if you need more convincing, you can perform a histogram on the results, which effectively a way to empirically estimate the probability density of the simulated random variables. Fortunately, all four programming languages have built in histogram (counting) functions for one and two dimensions.

I already covered this topic in a previous posts on checking Poisson point process simulations.

Code

I’ll only include the code for Python and Julia, and refefr the reader to the rest of the code found here.

Python

I used the Matplotlib library to plot the probability density and its estimate.

For the histogram section, I used the histogram and histogram2d functions respectively to estimate the distribution of the number of points and the intensity function. I used the pdf option.

import numpy as np;  # NumPy package for arrays, random number generation, etc
import matplotlib.pyplot as plt  # for plotting
from matplotlib import cm  # for heatmap plotting
from mpl_toolkits import mplot3d  # for 3-D plots
from scipy import integrate  # for integrating

plt.close("all");  # close all previous plots

# Simulation window parameters
xMin = -1;
xMax = 1;
yMin = -1;
yMax = 1;

numbSim = 10 ** 4;  # number of random variables simulated
numbSteps = 200;  # number of steps for the Markov process
numbBins = 50;  # number of bins for histogram
sigma = 2;  # standard deviation for normal random steps

# probability density parameters
s = .5;  # scale parameter for distribution to be simulated

def fun_lambda(x, y):
    return np.exp(-(x ** 4 + x*y + y ** 2) / s ** 2);

# normalization constant
consNorm = integrate.dblquad(fun_lambda, xMin, xMax, lambda x: yMin, lambda y: yMax)[0];
#un-normalized joint density of variables to be simulated
def fun_p(x, y):
    return (fun_lambda(x, y) ) * (x >= xMin) * (y >= yMin) * (x <= xMax) * (y <= yMax);

xRand = np.random.uniform(xMin, xMax, numbSim);  # random initial values
yRand = np.random.uniform(yMin, yMax, numbSim);  # random initial values

pdfCurrent = fun_p(xRand, yRand);  # current transition (probability) densities

for jj in range(numbSteps):
    zxRand = xRand + sigma * np.random.normal(0, 1, numbSim);  # take a (normally distributed) random step
    zyRand = yRand + sigma * np.random.normal(0, 1, numbSim);  # take a (normally distributed) random step
    # Conditional random step needs to be symmetric in x and y
    # For example: Z|x ~ N(x,1) (or Y=x+N(0,1)) with probability density
    # p(z|x)=e(-(z-x)^2/2)/sqrt(2*pi)
    pdfProposal = fun_p(zxRand, zyRand);  # proposed probability

    # acceptance rejection step
    booleAccept = np.random.uniform(0, 1, numbSim) < pdfProposal / pdfCurrent;
    # update state of random walk/Markov chain
    xRand[booleAccept] = zxRand[booleAccept];
    yRand[booleAccept] = zyRand[booleAccept];
    # update transition (probability) densities
    pdfCurrent[booleAccept] = pdfProposal[booleAccept];

# for histogram, need to reshape as vectors
xRand = np.reshape(xRand, numbSim);
yRand = np.reshape(yRand, numbSim);

p_Estimate, xxEdges, yyEdges = np.histogram2d(xRand, yRand, bins=numbSteps, density=True);
xValues = (xxEdges[1:] + xxEdges[0:xxEdges.size - 1]) / 2;  # mid-points of bins
yValues = (yyEdges[1:] + yyEdges[0:yyEdges.size - 1]) / 2;  # mid-points of bins
X, Y = np.meshgrid(xValues, yValues);  # create x/y matrices for plotting

# analytic solution of (normalized) joint probability density
p_Exact = fun_p(X, Y) / consNorm;

# Plotting
# Plot empirical estimate
fig1 = plt.figure();
ax = plt.axes(projection="3d");
#plt.rc("text", usetex=True);
#plt.rc("font", family="serif");
surf = ax.plot_surface(X, Y, p_Estimate, cmap=plt.cm.plasma);
plt.xlabel("x");
plt.ylabel("y");
plt.title("p(x,y) Estimate");

# Plot exact expression
fig2 = plt.figure();
#plt.rc("text", usetex=True);
#plt.rc("font", family="serif")
ax = plt.axes(projection="3d");
surf = ax.plot_surface(X, Y, p_Exact, cmap=plt.cm.plasma);
plt.xlabel("x");
plt.ylabel("y");
plt.title("p(x,y) Exact Expression");

Julia

using Distributions #for random simulations
using PyPlot #for plotting
using StatsBase #for histograms etc
using Random
using LinearAlgebra
using HCubature #for numerical integration
#using LaTeXStrings #for LateX in labels/titles etc
PyPlot.close("all");  # close all PyPlot figures

#set random seed for reproducibility
#Random.seed!(1234)

# Simulation window parameters
xMin = -1;
xMax = 1;
yMin = -1;
yMax = 1;

numbSim = 10 ^ 5;  # number of random variables simulated
numbSteps = 25;  # number of steps for the Markov process
numbBins = 50;  # number of bins for histogram
sigma = 2;  # standard deviation for normal random steps

# probability density parameters
s = .5;  # scale parameter for distribution to be simulated

function fun_lambda(x,y)
    return (exp.(-(x.^4+x.*y+y.^2)./s^2));
end

#normalization constant -- UNDER CONSTRUCTION
consNorm,errorCub=hcubature(x -> fun_lambda(x[1],x[2]), [xMin, yMin], [xMax, yMax]);
#un-normalized joint density of variables to be simulated
function fun_p(x,y)
    return((fun_lambda(x,y)).*(x.>=xMin).*(y.>=yMin).*(x.<=xMax).*(y.<=yMax));
end
xRand=(xMax-xMin).*rand(numbSim).+xMin; #random initial values
yRand=(yMax-yMin).*rand(numbSim).+yMin; #random initial values

pdfCurrent=fun_p(xRand,yRand); #current transition (probability) densities
for jj=1:numbSteps
    zxRand= xRand.+sigma.*rand(Normal(),numbSim);#take a (normally distributed) random step
    zyRand= yRand.+sigma.*rand(Normal(),numbSim);#take a (normally distributed) random step

    # Conditional random step needs to be symmetric in x and y
    # For example: Z|x ~ N(x,1) (or Y=x+N(0,1)) with probability density
    # p(z|x)=e(-(z-x)^2/2)/sqrt(2*pi)
    pdfProposal = fun_p(zxRand, zyRand);  # proposed probability

    # acceptance rejection step
    booleAccept=rand(numbSim) .< pdfProposal./pdfCurrent;
    # update state of random walk/Markov chain
    xRand[booleAccept] = zxRand[booleAccept];
    yRand[booleAccept] = zyRand[booleAccept];
    # update transition (probability) densities
    pdfCurrent[booleAccept] = pdfProposal[booleAccept];
end

#histogram section: empirical probability density
histXY=fit(Histogram, (xRand,yRand),nbins=numbBins); #find histogram data
histXY=normalize(histXY,mode=:pdf); #normalize histogram
binEdges=histXY.edges; #retrieve bin edges
xValues=(binEdges[1][2:end]+binEdges[1][1:end-1])./2; #mid-points of bins
yValues=(binEdges[2][2:end]+binEdges[2][1:end-1])./2; #mid-points of bins
p_Estimate=(histXY.weights)
#create a meshgrid
X=[xValues[ii] for ii=1:length(xValues), jj=1:length(yValues)];
Y=[yValues[jj] for ii=1:length(xValues), jj=1:length(yValues)];

#analytic solution of (normalized) joint probability density
p_Exact = fun_p(X, Y)./consNorm;

# Plotting
# Plot empirical estimate
fig1 = PyPlot.figure();
PyPlot.rc("text", usetex=true);
PyPlot.rc("font", family="serif");
surf(X, Y, p_Estimate, cmap=PyPlot.cm.plasma);
PyPlot.xlabel("x");
PyPlot.ylabel("y");
PyPlot.title("p(x,y) Estimate");

# Plot exact expression
fig2 = PyPlot.figure();
PyPlot.rc("text", usetex=true);
PyPlot.rc("font", family="serif")
surf(X, Y, p_Exact, cmap=PyPlot.cm.plasma);
PyPlot.xlabel("x");
PyPlot.ylabel("y");
PyPlot.title("p(x,y) Exact Expression");

 

Wiener or Brownian (motion) process

One of the most important stochastic processes is the Wiener process or Brownian (motion) process. In a previous post I gave the definition of a stochastic process (also called a random process) with some examples of this important random object, including random walks. The Wiener process can be considered a continuous version of the simple random walk. This continuous-time stochastic process is a highly studied and used object. It plays a key role different probability fields, particularly those focused on stochastic processes such as stochastic calculus and the theories of Markov processes, martingales, Gaussian processes, and Levy processes.

The Wiener process is named after Norbert Wiener, but it is called the Brownian motion process or often just Brownian motion due to its historical connection as a model for Brownian movement in liquids, a physical phenomenon observed by Robert Brown. But the physical process is not true a Wiener process, which can be treated as an idealized model. I will use the terms Wiener process or Brownian (motion) process to differentiate the stochastic process from the physical phenomenon known as Brownian movement or Brownian process.

The Wiener process is arguably the most important stochastic process. The other important stochastic process is the Poisson (stochastic) process, which I cover in another post. I have written that and the current post with the same structure and style, reflecting and emphasizing the similarities between these two fundamental stochastic process.

In this post I will give a definition of the standard Wiener process. I will also describe some of its key properties and importance. In future posts I will cover the history and generalizations of this stochastic process.

Definition

In the stochastic processes literature there are different definitions of the Wiener process. These depend on the settings such as the level of mathematical rigour. I give a mathematical definition which captures the main characteristics of this stochastic process.

Definition: Standard Wiener or Brownian (motion) process

A real-valued stochastic process \(\{W_t:t\geq 0 \}\) defined on a probability space \((\Omega,\mathcal{A},\mathbb{P})\) is a standard Wiener (or Brownian motion) process if it has the following properties:

  1. The initial value of the stochastic process \(\{W_t:t\geq 0 \}\) is zero with probability one, meaning \(P(W_0=0)=1\).
  2. The increment \(W_t-W_s\) is independent of the past, that is, \(W_u\), where \(0\leq u\leq s\).
  3. The increment \(W_t-W_s\) is a normal variable with mean \(o\) and variance \(t-s\).

In some literature, the initial value of the stochastic process may not be given. Alternatively, it is simply stated as \(W_0=0\) instead of the more precise (probabilistic) statement given above.

Also, some definitions of this stochastic process include an extra property or two. For example, from the above definition, we can infer that increments of the standard Wiener process are stationary due to the properties of the normal distribution. But a definition may include something like the following property, which explicitly states that this stochastic process is stationary.

  1. For \(0\leq u\leq s\), the increment \(W_t-W_s\) is equal in distribution to \(W_{t-s}\).

The definitions may also describe the continuity of the realizations of the stochastic process, known as sample paths, which we will cover in the next section.

It’s interesting to compare these defining properties with the corresponding ones of the homogeneous Poisson stochastic process. Both stochastic processes build upon divisible probability distributions. Using this property, Lévy processes generalize these two stochastic processes.

Properties

The definition of the Wiener process means that it has stationary and independent increments. These are arguably the most important properties as they lead to the great tractability of this stochastic process. The increments are normal random variables, implying they can have both positive and negative (real) values.

The Wiener process exhibits closure properties, meaning you apply certain operations, you get another Wiener process. For example, if \(W= \{W_t:t\geq 0 \}\) is a Wiener process, then for a scaling constant \(c>0\), the resulting stochastic process \(\{W_{ct}/\sqrt{c}:t \geq 0 \}\)is also a Wiener process. Such properties are useful for proving mathematical results.

Two realizations of a Wiener (or Brownian motion) process. The sample paths are continuous (but non-differentiable) almost everywhere.

Properties such as independence and stationarity of the increments are so-called distributional properties. But the sample paths of this stochastic process are also interesting. A sample path of a Wiener process is continuous almost everywhere. (The term almost everywhere comes from measure theory, but it simply means that the only region where the property does not hold is mathematically negligible.) Despite the continuity of the sample paths, they are nowhere differentiable. (Historically, it was a challenge to find such a function, but a classic example is the Weierstrass function.)

The standard Wiener process has the Markov property, making it an example of a Markov process. The standard Wiener process \(W=\{ W_t\}_{t\geq 0}\) is a martingale. Interestingly, the stochastic process \(W=\{ W_t^2-t\}_{t\geq 0}\) is also a martingale. The Wiener process is a fundamental object in martingale theory.

There are many other properties of the Brownian motion process; see the Further reading section for, well, further reading.

Importance

Playing a main role in the theory of probability, the Wiener process is considered the most important and studied stochastic process. It has connections to other stochastic processes and is central in stochastic calculus and martingales. Its discovery led to the development to a family of Markov processes known as diffusion processes.

The Wiener process also arises as the mathematical limit of other stochastic processes such as random walks, which is the subject of Donsker’s theorem or invariance principle, also known as the functional central limit theorem.

The Wiener process is a member of some important families of stochastic processes, including Markov processes, Lévy processes, and Gaussian processes. This stochastic process also has many applications. For example, it plays a central role in quantitative finance. It is also used in the physical sciences as well as some branches of social sciences, as a mathematical model for various random phenomena.

Generalizations and modifications

For the Brownian motion process, the index set and state space are respectively the non-negative numbers and real numbers, that is \(T=[0,\infty)\) and \(S=[0,\infty)\), so it has both continuous index set and state space. Consequently, changing the state space, index set, or both offers an ways for generalizing or modifying the Wiener (stochastic) process.

A single realization of a two-dimensional Wiener (or Brownian motion) process. Each vector component is an independent standard Wiener process.

Simulating

The defining properties of the Wiener process, namely independence and stationarity of increments, results in it being easy to simulate. The Wiener can be simulated provided random variables can be simulated or sampled according to a normal distribution. The main challenge is that the Wiener process is a continuous-time stochastic process, but computer simulations run in a discrete universe.

I will leave the details of sampling this stochastic process for another post.

Further reading

A very quick history of Wiener process and the Poisson (point) process is covered in this talk by me.

There are books almost entirely dedicated to the subject of the Wiener or Brownian (motion) process, including:

Of course the stochastic process is also covered in any book on stochastic calculus:

More advanced readers can read about the Wiener process, its descrete-valued cousin, the Poisson (stochastic) process, as well as other Lévy processes:

On this topic, I recommend the introductory article:

  • 2004, Applebaum, Lévy Processes – From Probability to Finance and Quantum Groups.

The Wiener process is of course also covered in general books on stochastic process such as:

Simulating a Poisson point process on a n-dimensional sphere

In the previous post I outlined how to simulate or sample a homogeneous Poisson point process on the surface of a sphere. Now I will consider a homogeneous Poisson point process on the \((n-1)-\) sphere, which is the surface of the Euclidean ball in \(n\) dimensions.

This is a short post because it immediately builds off the previous post. For positioning the points uniformly, I will use Method 2 from that post, which uses normal random variables, as it immediately gives a fast method in \(n\) dimensions.

I wrote this post and the code more for curiosity than any immediate application. But simulating a Poisson point process in this setting requires placing points uniformly on a sphere. And there are applications in that, such as Monte Carlo integration methods, as mentioned in this post, which nicely details different sampling methods.

Steps

As is the case for other shapes, simulating a Poisson point process requires two steps.

Number of points

The number of points of a Poisson point process on the surface of a sphere of radius \(r>0\) is a Poisson random variable. The mean of this random variable is \(\lambda S_{n-1}\), where \(S_{n-1}\) is the surface area of the sphere.  For a ball embedded in \(n\) dimension, the area of the corresponding sphere is given by

$$S_{n-1} = \frac{2 \pi ^{n/2}  }{\Gamma(n/2)} r^{n-1},$$

where \(\Gamma\) is the gamma function, which is a natural generalization of the factorial. In MATLAB, we can simply use the function gamma.  In Python, we need to use the SciPy function scipy.special. gamma.

Locations of points

For each point on the sphere, we generate \(n\) standard normal or Gaussian random variables, say, \(W_1, \dots, W_n\), which are independent of each other. These random variables are the Cartesian components of the random point. We rescale the components by the Euclidean norm, then multiply by the radius \(r\).

For \(i=1,\dots, n\), we obtain

$$X_i=\frac{rW_i}{(W_1^2+\cdots+W_n^2)^{1/2}}.$$

These are the Cartesian coordinates of a point uniformly scattered on a  sphere with radius \(r\) and a centre at the origin.

How does it work?

In the post on the circle setting, I gave a more detailed outline of the proof, where I said the method is like the Box-Muller transform in reverse. The joint density of the normal random variables is from a multivariate normal distribution with zero correlation. This joint density a function of the Cartesian equation for a sphere. This means the density is constant on the sphere, implying that the angle of the point \((W_1,\dots, W_n)\) will be uniformly distributed.

The vector formed from the normal variables \((W_1,\dots,W_n)\) is a random variable with a chi distribution.  But the final vector, which stretches from the origin to the point \((X_1,\dots,X_n)\), has length one, because we rescaled it with the Euclidean norm.

Code

The code for all my posts is located online here. For this post, the code in MATLAB and Python is here.

Further reading

I recommend this blog post, which discusses different methods for randomly placing points on spheres and inside spheres (or, rather, balls) in a uniform manner.  (Embedded in two dimensions, a sphere is a circle and a ball is a disk.)

Our Method 2 for positioning points uniformly, which uses normal variables, comes from the paper:

  • 1959, Muller, A note on a method for generating points uniformly on n-dimensional spheres.

Two recent works on this approach are the papers:

  • 2010, Harman and Lacko, On decompositional algorithms for uniform sampling from -spheres and -balls;
  • 2017, Voelker, Gosman, Stewart, Efficiently sampling vectors and coordinates.

Simulating a Poisson point process on a sphere

In this post I’ll describe how to simulate or sample a homogeneous Poisson point process on the surface of a sphere. I have already simulated this point process on a rectangle, triangle disk, and circle.

Of course, by sphere, I mean the everyday object that is the surface of a three-dimensional ball, where this two-dimensional object is often denoted by \(S^2\).  Mathematically, this is a generalization from a Poisson point process on a circle, which is slightly simpler than randomly positioning points on a disk.  I recommend reading those two posts first, as a lot of the material presented here builds off them.

I have not needed such a simulation in my own work, but I imagine there are many reasons why you would want to simulate a Poisson point process on a sphere. As some motivation, we can imagine these points on a sphere representing, say, meteorites or lightning hitting the Earth.

The generating the number of points is not difficult. The trick is positioning the points on the sphere in a uniform way.  As is often the case, there are various ways to do this, and I recommend this post, which lists the main ones.  I will use two methods that I consider the most natural and intuitive ones, namely using spherical coordinates and normal random variables, which is what I did in the post on the circle.

Incidentally, a simple modification allows you to scatter the points uniformly inside the sphere, but you would typically say ball in mathematics, giving a Poisson point process inside a ball; see below for details.

Steps

As always, simulating a Poisson point process requires two steps.

Number of points

The number of points of a Poisson point process on the surface of a sphere of radius \(r>0\) is a Poisson random variable with mean \(\lambda S_2\), where \(S_2=4\pi r^2\) is the surface area of the sphere. (In this post I give some details for simulating or sampling Poisson random variables or, more accurately, variates.)

Locations of points

For any homogeneous Poisson point process, we need to position the points uniformly on the underlying space, which is in this case is the sphere. I will outline two different methods for positioning the points randomly and uniformly on a sphere.

Method 1: Spherical coordinates

The first method is based on spherical coordinates \((\rho, \theta,\phi)\), where the radial coordinate \(\rho\geq 0\), and the angular coordinates \(0 \leq \theta\leq 2\pi\) and \(0\leq \phi \leq \pi\). The change of coordinates gives \(x=\rho\sin(\theta)\cos(\phi)\), \(y=\rho\sin(\theta)\sin(\phi)\), and \(z=\rho\cos(\phi)\).

Now we use Proposition 1.1 in this paper. For each point, we generate two uniform variables \(V\) and \(\Theta\) on the respective intervals \((-1,1)\) and \((0,2\pi)\). Then we place the point with the Cartesian coordinates

$$X =  r  \sqrt{1-V^2} \cos\Theta, $$

$$Y =  r  \sqrt{1-V^2}\sin\Theta, $$

$$ Z=r V. $$

This method places a uniform point on a sphere with a radius \(r\).

How does it work?

I’ll skip the precise details, but give some interpretation of this method. The random variable \(\Phi := \arccos V\) is the \(\phi\)-coordinate of the uniform point, which implies \(\sin \Phi=\sqrt{1-V^2}\), due to basic trigonometric identities.  The area element in polar coordinates is \(dA = \rho^2 \sin\phi d\phi d\theta \), which is constant with respect to \(\theta\). After integrating with respect to \(\phi\),  we see that the random variable \(V=\cos\Phi\) needs to be uniform (instead of \(\Phi\)) to ensure the point is uniformly located on the surface.

Method 2: Normal random variables

For each point, we generate three standard normal or Gaussian random variables, say, \(W_x\), \(W_y\), and \(W_z\), which are independent of each other. (The term standard here means the normal random variables have mean \(\mu =0\) and standard deviation \(\sigma=1\).)  The three random variables are the Cartesian components of the random point. We rescale the components by the Euclidean norm, then multiply by the radius \(r\), giving

$$X=\frac{rW_x}{(W_x^2+W_y^2+W_z^2)^{1/2}},$$

$$Y=\frac{rW_y}{(W_x^2+W_y^2+W_z^2)^{1/2}},$$

$$Z=\frac{rW_z}{(W_x^2+W_y^2+W_z^2)^{1/2}}.$$

These are the Cartesian coordinates of a point uniformly scattered on a  sphere with radius \(r\) and a centre at the origin.

How does it work?

The procedure is somewhat like the Box-Muller transform in reverse. In the post on the circle setting,  I gave an outline of the proof, which I recommend reading. The joint density of the normal random variables is from a multivariate normal distribution with zero correlation. This joint density is constant on the sphere, implying that the angle of the point \((W_x, W_y, W_z)\) will be uniformly distributed.

The vector formed from the normal variables \((W_x, W_y,W_z)\) is a random variable with a chi distribution.  We can see that the vector from the origin to the point \((X,Y,Z)\) has length one, because we rescaled it with the Euclidean norm.

Plotting

Depending on your plotting software, the points may more resemble points on an ellipsoid than a sphere due to the different scaling of the x, y and z axes. To fix this in MATLAB, run the command: axis square. In Python, it’s not straightforward to do this, as it seems to lack an automatic function, so I have skipped it.

Results

I have presented some results produced by code written in MATLAB and Python. The blue points are the Poisson points on the sphere. I have used a surface plot (with clear faces) to illustrate the underling sphere in black.

MATLAB

Python

Note: The aspect ratio in 3-D Python plots tends to squash the sphere slightly, but it is a sphere.

Code

The code for all my posts is located online here. For this post, the code in MATLAB and Python is here.  In Python I used the library mpl_toolkits for doing 3-D plots.

Poisson point process inside the sphere

Perhaps you want to simulate a Poisson point process inside the ball.  There are different ways we can do this, but I will describe just one way, which builds off Method 1 for positioning the points uniformly. (In a later post, I will modify Method 2, giving a way to uniformly position points inside the ball.)

For this simulation method, you need to make two simple modifications to the simulation procedure.

Number of points

The number of points of a Poisson point process inside a sphere of radius \(r>0\) is a Poisson random variable with mean \(\lambda V_3\), where \(V_3=4\pi r^3\) is the volume of the sphere.

Locations of points

We will modify Method 1 as outlined above. To sample the points uniformly in the sphere, you need to generate uniform variables on the unit interval \((0,1)\), take their cubic roots, and then, multiply them by the radius \(r\). (This is akin to the step of taking the square root in the disk setting.) The random variables for the angular coordinates are generated as before.

Further reading

I recommend this blog post, which discusses different methods for randomly placing points on spheres and inside spheres (or, rather, balls) in a uniform manner.  (Embedded in two dimensions, a sphere is a circle and a ball is a disk.)

Our Method 2 for positioning points uniformly, which uses normal variables, comes from the paper:

  • 1959, Muller, A note on a method for generating points uniformly on n-dimensional spheres.

This sampling method relies upon old observations that normal variables are connected to spheres and circles. I also found this post on a similar topic. Perhaps not surprisingly, the above paper is written by the same Muller behind the Box-Muller method for sampling normal random variables.

Update: The connection between the normal distribution and rotational symmetry has been the subject of some recent 3Blue1Brown videos on YouTube.

Here is some sample Python code for creating a 3-D scatter plot.

Simulating a Poisson point process on a circle

In this post, I’ll take a break from the more theoretical posts. Instead I’ll describe how to simulate or sample a homogeneous Poisson point process on a circle.  I have already simulated this point process on a rectangle, triangle and disk. In some sense, I should have done this simulation method before the disk one, as it’s easier to simulate. I recommend reading that post first, as the material presented here builds off it.

Sampling a homogeneous Poisson point process on a circle is rather straightforward.  It just requires using a fixed radius and uniformly choose angles from interval \((0, 2\pi)\). But the circle setting gives an opportunity to employ a different method for positioning points uniformly on circles and, more generally, spheres. This approach uses Gaussian random variables, and it becomes much more efficient when the points are placed on high dimensional spheres.

Steps

Simulating a Poisson point process requires two steps: simulating the random number of points and then randomly positioning each point.

Number of points

The number of points of a Poisson point process on circle of radius \(r>0\) is a Poisson random variable with mean \(\lambda C\), where \(C=2\pi r\) is the circumference of the circle.  You just need to be able to need to produce (pseudo-)random numbers according to a Poisson distribution.

To generate Poisson variables in MATLAB,  use the poissrnd function with the argument \(\lambda C\).  In Python, use either the scipy.stats.poisson or numpy.random.poisson function from the SciPy or NumPy libraries. (If you’re curious how Poisson simulation works, I suggest seeing this post for details on sampling Poisson random variables or, more accurately, variates.)

Locations of points

For a homogeneous Poisson point process, we need to uniformly position points on the underlying space, which is this case is a circle. We will look at two different ways to position points uniformly on a circle. The first is arguably the most natural approach.

Method 1: Polar coordinates

We use polar coordinates due to the nature of the problem. To position all the points uniformly on a circle, we simple generate uniform numbers on the unit interval \((0,1)\). We then multiply these random numbers by \(2\pi\).

We have generated polar coordinates for points uniformly located on the circle. To plot the points, we have to convert the coordinates back to Cartesian form by using the change of coordinates:  \(x=\rho\cos(\theta)\) and \(y=\rho\sin(\theta)\).

Method 2: Normal random variables

For each point, we generate two standard normal or Gaussian random variables, say, \(W_x\) and \(W_y\), which are independent of each other. (The term standard here means the normal random variables have mean \(\mu =0\) and standard deviation \(\sigma=1\).) These two random variables are the Cartesian components of a random point.  We then rescale the two values by the Euclidean norm, giving

$$X=\frac{W_x}{(W_x^2+W_y^2)^{1/2}},$$

$$Y=\frac{W_y}{(W_x^2+W_y^2)^{1/2}}.$$

These are the Cartesian coordinates of points uniformly scattered around a unit circle with centre at the origin. We multiply the two random values \(X\) and \(Y\) by the \(r>0\)  for a circle with radius \(r\).

How does it work?

The procedure is somewhat like the Box-Muller transform in reverse. I’ll give an outline of the proof. The joint density of the random variables \(W_x\) and \(W_y\) is that of the bivariate normal distribution with zero correlation, meaning it has the joint density

$$ f(x,y)=\frac{1}{2\pi}e^{[-(x^2+y^2)/2]}.$$

We see that the function \(f\) is a constant when we trace around any line for which \((x^2+y^2)\) is a constant, which is simply the Cartesian equation for a circle (where the radius is the square root of the aforementioned constant). This means that the angle of the point \((W_x, W_y)\) will be uniformly distributed.

Now we just need to look at the distance of the random point. The vector formed from the pair of normal variables \((W_x, W_y)\) is a Rayleigh random variable.  We can see that the vector from the origin to the point \((X,Y)\) has length one, because we rescaled it with the Euclidean norm.

Results

I have presented some results produced by code written in MATLAB and Python. The blue points are the Poisson points on the sphere. I have used a surface plot (with clear faces) in black to illustrate the underling sphere.

MATLAB

Python

Code

The code for all my posts is located online here. For this post, the code in MATLAB and Python is here.

Further reading

I recommend this blog post, which discusses different methods for randomly placing points on spheres and inside spheres (or, rather, balls) in a uniform manner.  (Embedded in two dimensions, a sphere is a circle and a ball is a disk.) A key paper on using normal variables is the following:

  • 1959, Muller, A note on a method for generating points uniformly on n-dimensional spheres.

As I mentioned in the post on the disk, the third edition of the classic book Stochastic Geometry and its Applications by Chiu, Stoyan, Kendall and Mecke details on page 54 how to uniformly place points on a disk.  It just requires a small modification for the circle.