August 2023 – H. Paul Keeler

The study of Markov chains is generally the study of their long term behaviour, which, under certain conditions, is captured by them having a unique stationary distribution. Stationarity is an important property. It is, in a sense, a local property of a Markov chain.

For a Markov chain, a more global property is something called reversibility. Markov chains with this property must possess a stationary distribution, which, we see below, is an immediate consequence of reversibility. Reversible Markov chains (or processes) with discrete time¹ are the cornerstone of Markov chain Monte Carlo (MCMC) methods.

In this post we look at how a reversible Markov chain is constructed from a non-reversible (but irreducible) Markov chain by introducing an acceptance-rejection step. This post complements another post I wrote on the Metropolis-Hastings algorithm.

Reversibility

A Markov process on state space $\mathbb{X}$ with kernel $K$ is (time) reversible with respect to the distribution $\mu$ if the following holds

$$ \mu(x)K(x,y) = \mu (y) K(y,x)\quad x,y\in\mathbb{X}\,.$$

This reversibility condition is also called the detailed balance equation. If this condition is met, then the Markov process will have a stationary distribution $\mu$. By summing over $x$, we can verify this because we obtain

$$ \sum_{x\in\mathbb{X}}\mu(x)K(x,y) =\mu(y)\sum_{x\in\mathbb{X}} K(y,x)=\mu(y)\,.$$

This is just the balance equation, often written as $\mu=K\mu$, which says that the transition kernel $K$ has a stationary distribution $\mu $.

First Markov chain

We consider a Markov chain with kernel $J$ defined on a finite state space $\mathbb{X}$. If the Markov chain is at state $x\in \mathbb{X}$, it visits another state $y\in \mathbb{X}$ with the probability $J(x,y)$. This is a simple time-homogeneous finite Markov chain.

Irreducibility

For our Markov chain, we assume that every state $x$ in $\mathbb{X}$ where $\pi(x)>0$ is reachable with positive probability in a single step. This implies the easy-to-achieve condition $J(x,y)>0$ where $\pi(x)>0$ for all points $x,y \in \mathbb{X}$. This requirement is a stronger form of irreducibility.

Creating a new Markov chain with acceptance

We create a new Markov chain by introducing an acceptance step. For the Markov chain with kernel $J$, we assume that each time step, after choosing the jump direction but before jumping, a biased coin is flipped . The success probability $\alpha(x,y)$ depends on the current position $x\in \mathbb{X}$ and the (potential) next position $y\in \mathbb{X}$.

Transition kernel

For our new Markov chain, we can quickly reason the transition kernel $M$. We first look at the off-diagonal elements of the kernel (matrix) $M$. To go from state $x$ and to another state $y\neq x$, the probability is simply

$$ M(x,y) = \alpha(x,y) J(x,y), \quad x\neq y\,.$$

The transition matrix $M$ needs to be stochastic, so the rows sum to one, so $\sum_{y\in\mathbb{X}}M(x,y)=1$. That gives us the diagonal elements of $M$, although their exact form is not needed to show reversibility.

$\alpha(x,y)$ needs a symmetric function $s(x,y)$

For reversibility, we just need to swap rows and columns. Clearly we only need to look at the off-diagonal entries, which implies the requirement

$$ \pi(x)M(x,y) = \pi (y) M(y,x)\quad x\neq y\,.$$

Both sides are symmetric in $x$ and $y$, meaning they are equal to some non-negative symmetric $s(x,y)=s(y,x)$. Looking at the right-hand side, we get

$$\begin{aligned}\pi(y) M(y,x)&=\pi(y)J(y,x) \alpha(y,x)\\ &= s(y,x)\,.\end{aligned}$$

This implies that the function $\alpha$ is a non-negative function such that $\alpha\leq 1$, to ensure it’s a probability, with the form

$$ \alpha(x,y)=\frac{s(x,y)}{\pi(x)J(x,y)}\,.$$

The only task remaining now is to choose a reasonable symmetric function $s$ such that $\alpha\leq 1$, ensuring $\alpha$ is a probability. Of course, our choice for the symmetric function $s$ should also be a function of the stationary distribution $\pi$ and the underlying kernel $J$.

Examples

I’ll give two principal examples of the symmetric function $s(x,y)$. Working in reverse chronological order, I’ll give the simpler of the two examples first.

Barker

A somewhat natural example is

$$s(x,y) = \frac{\pi(x)J(x,y)\pi(y)J(y,x)}{\pi(x)J(x,y)+\pi(y)J(y,x)}\,.$$

This is clearly a symmetric function, which only has the terms $\pi$ and $J$. The acceptance probability becomes

$$\alpha(x,y) = \frac{\pi(y)J(y,x)}{\pi(x)J(x,y)+\pi(y)J(y,x)}\,.$$

A.A. Barker proposed this function in a 1965 paper as part of his PhD work in mathematical physics at the University of Adelaide. Barker had been inspired by a previous 1953 paper, which brings us to the next example.

Metropolis(-Rosenbluth-Rosenbluth-Teller-Teller)-Hastings

The now most important example is

$$s(x,y) = \min[\pi(x)J(x,y),\pi(y)J(y,x)]\,.$$

We can see that this is a symmetric function. The acceptance probability becomes

$$\alpha(x,y) = \min[1,\frac{\pi(y)J(y,x)}{\pi(x)J(x,y)}]\,.$$

This example is very famous in the world of Markov chain Monte Carlo methods. It is the main part of the so-called Metropolis-Hastings algorithm, which comes from a 1953 paper by Nicholas Metropolis, Arianna W. Rosenbluth, Marshall Rosenbluth, Augusta H. Teller, and Edward Teller (two husband-wife pairs), who looked at a special case, and a 1970 paper by W.K. Hastings, who generalized the method.

Month: August 2023

Creating a reversible Markov chain using acceptance(-rejection)

Reversibility

First Markov chain

Irreducibility

Creating a new Markov chain with acceptance

Transition kernel

\(\alpha(x,y)\) needs a symmetric function \(s(x,y)\)

Examples

Barker

Metropolis(-Rosenbluth-Rosenbluth-Teller-Teller)-Hastings

Further reading

Articles

Historical

Introductory

History

Books

Websites