The Box-Muller method for simulating normal variables

In the previous post, I covered a simple but much used method for simulating random variables or, rather, generating random variates. To simulate a random variable, the method requires writing down, in a tractable manner, the inverse of its cumulative distribution function.

But in the case of the normal (or Gaussian) distribution, there is no closed-form expression for its cumulative distribution function nor its inverse. This means you cannot, in an elegant and fast way at least, generate with the inverse method a single normal random variable using a single uniform random variable.

Interestingly, however, you can generate two (independent) normal variables with two (independent) uniform variables using the Box-Muller method, originally proposed by George Box and Mervin E. Muller. This approach uses the inverse method, but in practice it’s not used much (see below). I detail this method because I find it neat and it highlights the connection between the normal distribution and rotational symmetry, which has been the subject of some recent 3Blue1Brown videos on YouTube.

(This method was also used to simulate the Thomas point process, which I covered in a previous post.)

Incidentally, this connection is also mentioned in a previous post on simulating a Poisson point process on the surface of a sphere.  In that method post, Method 2 uses an observation by the Muller that normal random variables can be used to position points uniformly on spheres.

I imagine this method was first observed by transforming two normal variables, instead of guessing various distribution pairs that would work.  Then I’ll sketch the proof in the opposite direction, though it works in both directions.

Proof outline

The joint probability density of two independent variables is simply the product of the two individual probabilities densities. Then the joint density of two standard normal variables is

$$\begin{align}f_{X,Y}(x,y)&=\left[\frac{1}{\sqrt{2\pi}}e^{-x^2/2}\right]\left[\frac{1}{\sqrt{2\pi}}e^{-y^2/2}\right]\\&=\frac{1}{{2\pi}}e^{-(x^2+y^2)/2}\,.\end{align}$$

Now it requires a change of coordinates in two dimensions (from Cartesian to polar) using a Jacobian determinant, which in this case is \(|J(\theta,r)=r|\).1Alternatively, you can simply recall the so-called area element \(dA=r\,dr\,d\theta\).  giving a new joint probability density

$$f_{\Theta,R}(\theta,r)=\left[\frac{1}{\sqrt{2\pi}}\right]\left[ r\,e^{-r^2/2}\right]\,.$$

Now we just identify the two probability densities. The first probability density corresponds to a uniform variable on \([0, 2\pi]\), whereas the second is that of a Rayleigh variable with parameter \(\sigma=1\). Of course the proof works in the opposite direction because the transformation (between Cartesian and polar coordinates) is a one-to-one function.

Algorithm

Here’s the Box-Muller method for simulating two (independent) standard normal variables with two (independent) uniform random variables.

Two (independent) standard normal random variable \(Z_1\) and \(Z_2\)

  1. Generate two (independent) uniform random variables \(U_1\sim U(0,1)\) and \(U_2\sim U(0,1)\).
  2. Return \(Z_1=\sqrt{-2\ln U_1}\cos(2\pi U_2)\) and \(Z_2=\sqrt{-2\ln U_1}\sin(2\pi U_2)\).

The method effectively samples a uniform angular variable \(\Theta=2\pi U_2\) on the interval \([0,2\pi]\) and a radial variable \(R=\sqrt{-2\ln U_1}\) with a Rayleigh distribution.

The algorithm produces two independent standard normal variables. Of course, as many of us learn in high school, if \(Z\) is a standard normal variable, then the random variable \(X=\sigma Z +\mu\) is a normal variable with mean \(\mu\) and standard deviation \(\sigma>0\) .

The Box-Muller method is rarely used

Sadly this method isn’t typically used, as historically computer processors were slow at doing the calculations, so other methods were employed such as the ziggurat algorithm. Also, although processors can now do such calculations much faster, many languages, not just scientific ones, come with functions for generating normal variables. Consequently, there’s not much need in implementing this method.

Further reading

Websites

Many websites detail this method. Here’s a couple:

Papers

The original paper (which is freely available here) is:

  • 1958 – Box and Muller, A Note on the Generation of Random Normal Deviates.

Another paper by Muller connects normal variables and the (surface of a) sphere:

  • 1959 – Muller, A note on a method for generating points uniformly on n-dimensional spheres.

Books

Many books on stochastic simulation cover the Box-Muller method. The classic book by Devroye with the descriptive title Non-Uniform Random Variate Generation covers this method; see Section 4.1. There’s also the Handbook of Monte Carlo Methods by Kroese, Taimre and Botev; see Section 3.1.2.7. Ripley also covers the method (and he makes a remark with some snark that many people incorrectly spell it the Box-Müller method); see Section 3.1. The book Stochastic Simulation: Algorithms and Analysis by Asmussen and Glynn also mention the method and a variation by Marsaglia; see Examples 2.11 and 2.12.

Wiener or Brownian (motion) process

One of the most important stochastic processes is the Wiener process or Brownian (motion) process. In a previous post I gave the definition of a stochastic process (also called a random process) with some examples of this important random object, including random walks. The Wiener process can be considered a continuous version of the simple random walk. This continuous-time stochastic process is a highly studied and used object. It plays a key role different probability fields, particularly those focused on stochastic processes such as stochastic calculus and the theories of Markov processes, martingales, Gaussian processes, and Levy processes.

The Wiener process is named after Norbert Wiener, but it is called the Brownian motion process or often just Brownian motion due to its historical connection as a model for Brownian movement in liquids, a physical phenomenon observed by Robert Brown. But the physical process is not true a Wiener process, which can be treated as an idealized model. I will use the terms Wiener process or Brownian (motion) process to differentiate the stochastic process from the physical phenomenon known as Brownian movement or Brownian process.

The Wiener process is arguably the most important stochastic process. The other important stochastic process is the Poisson (stochastic) process, which I cover in another post. I have written that and the current post with the same structure and style, reflecting and emphasizing the similarities between these two fundamental stochastic process.

In this post I will give a definition of the standard Wiener process. I will also describe some of its key properties and importance. In future posts I will cover the history and generalizations of this stochastic process.

Definition

In the stochastic processes literature there are different definitions of the Wiener process. These depend on the settings such as the level of mathematical rigour. I give a mathematical definition which captures the main characteristics of this stochastic process.

Definition: Standard Wiener or Brownian (motion) process

A real-valued stochastic process \(\{W_t:t\geq 0 \}\) defined on a probability space \((\Omega,\mathcal{A},\mathbb{P})\) is a standard Wiener (or Brownian motion) process if it has the following properties:

  1. The initial value of the stochastic process \(\{W_t:t\geq 0 \}\) is zero with probability one, meaning \(P(W_0=0)=1\).
  2. The increment \(W_t-W_s\) is independent of the past, that is, \(W_u\), where \(0\leq u\leq s\).
  3. The increment \(W_t-W_s\) is a normal variable with mean \(o\) and variance \(t-s\).

In some literature, the initial value of the stochastic process may not be given. Alternatively, it is simply stated as \(W_0=0\) instead of the more precise (probabilistic) statement given above.

Also, some definitions of this stochastic process include an extra property or two. For example, from the above definition, we can infer that increments of the standard Wiener process are stationary due to the properties of the normal distribution. But a definition may include something like the following property, which explicitly states that this stochastic process is stationary.

  1. For \(0\leq u\leq s\), the increment \(W_t-W_s\) is equal in distribution to \(W_{t-s}\).

The definitions may also describe the continuity of the realizations of the stochastic process, known as sample paths, which we will cover in the next section.

It’s interesting to compare these defining properties with the corresponding ones of the homogeneous Poisson stochastic process. Both stochastic processes build upon divisible probability distributions. Using this property, Lévy processes generalize these two stochastic processes.

Properties

The definition of the Wiener process means that it has stationary and independent increments. These are arguably the most important properties as they lead to the great tractability of this stochastic process. The increments are normal random variables, implying they can have both positive and negative (real) values.

The Wiener process exhibits closure properties, meaning you apply certain operations, you get another Wiener process. For example, if \(W= \{W_t:t\geq 0 \}\) is a Wiener process, then for a scaling constant \(c>0\), the resulting stochastic process \(\{W_{ct}/\sqrt{c}:t \geq 0 \}\)is also a Wiener process. Such properties are useful for proving mathematical results.

Two realizations of a Wiener (or Brownian motion) process. The sample paths are continuous (but non-differentiable) almost everywhere.

Properties such as independence and stationarity of the increments are so-called distributional properties. But the sample paths of this stochastic process are also interesting. A sample path of a Wiener process is continuous almost everywhere. (The term almost everywhere comes from measure theory, but it simply means that the only region where the property does not hold is mathematically negligible.) Despite the continuity of the sample paths, they are nowhere differentiable. (Historically, it was a challenge to find such a function, but a classic example is the Weierstrass function.)

The standard Wiener process has the Markov property, making it an example of a Markov process. The standard Wiener process \(W=\{ W_t\}_{t\geq 0}\) is a martingale. Interestingly, the stochastic process \(W=\{ W_t^2-t\}_{t\geq 0}\) is also a martingale. The Wiener process is a fundamental object in martingale theory.

There are many other properties of the Brownian motion process; see the Further reading section for, well, further reading.

Importance

Playing a main role in the theory of probability, the Wiener process is considered the most important and studied stochastic process. It has connections to other stochastic processes and is central in stochastic calculus and martingales. Its discovery led to the development to a family of Markov processes known as diffusion processes.

The Wiener process also arises as the mathematical limit of other stochastic processes such as random walks, which is the subject of Donsker’s theorem or invariance principle, also known as the functional central limit theorem.

The Wiener process is a member of some important families of stochastic processes, including Markov processes, Lévy processes, and Gaussian processes. This stochastic process also has many applications. For example, it plays a central role in quantitative finance. It is also used in the physical sciences as well as some branches of social sciences, as a mathematical model for various random phenomena.

Generalizations and modifications

For the Brownian motion process, the index set and state space are respectively the non-negative numbers and real numbers, that is \(T=[0,\infty)\) and \(S=[0,\infty)\), so it has both continuous index set and state space. Consequently, changing the state space, index set, or both offers an ways for generalizing or modifying the Wiener (stochastic) process.

A single realization of a two-dimensional Wiener (or Brownian motion) process. Each vector component is an independent standard Wiener process.

Simulating

The defining properties of the Wiener process, namely independence and stationarity of increments, results in it being easy to simulate. The Wiener can be simulated provided random variables can be simulated or sampled according to a normal distribution. The main challenge is that the Wiener process is a continuous-time stochastic process, but computer simulations run in a discrete universe.

I will leave the details of sampling this stochastic process for another post.

Further reading

A very quick history of Wiener process and the Poisson (point) process is covered in this talk by me.

There are books almost entirely dedicated to the subject of the Wiener or Brownian (motion) process, including:

Of course the stochastic process is also covered in any book on stochastic calculus:

More advanced readers can read about the Wiener process, its descrete-valued cousin, the Poisson (stochastic) process, as well as other Lévy processes:

On this topic, I recommend the introductory article:

  • 2004, Applebaum, Lévy Processes – From Probability to Finance and Quantum Groups.

The Wiener process is of course also covered in general books on stochastic process such as:

Simulating a Poisson point process on a sphere

In this post I’ll describe how to simulate or sample a homogeneous Poisson point process on the surface of a sphere. I have already simulated this point process on a rectangle, triangle disk, and circle.

Of course, by sphere, I mean the everyday object that is the surface of a three-dimensional ball, where this two-dimensional object is often denoted by \(S^2\).  Mathematically, this is a generalization from a Poisson point process on a circle, which is slightly simpler than randomly positioning points on a disk.  I recommend reading those two posts first, as a lot of the material presented here builds off them.

I have not needed such a simulation in my own work, but I imagine there are many reasons why you would want to simulate a Poisson point process on a sphere. As some motivation, we can imagine these points on a sphere representing, say, meteorites or lightning hitting the Earth.

The generating the number of points is not difficult. The trick is positioning the points on the sphere in a uniform way.  As is often the case, there are various ways to do this, and I recommend this post, which lists the main ones.  I will use two methods that I consider the most natural and intuitive ones, namely using spherical coordinates and normal random variables, which is what I did in the post on the circle.

Incidentally, a simple modification allows you to scatter the points uniformly inside the sphere, but you would typically say ball in mathematics, giving a Poisson point process inside a ball; see below for details.

Steps

As always, simulating a Poisson point process requires two steps.

Number of points

The number of points of a Poisson point process on the surface of a sphere of radius \(r>0\) is a Poisson random variable with mean \(\lambda S_2\), where \(S_2=4\pi r^2\) is the surface area of the sphere. (In this post I give some details for simulating or sampling Poisson random variables or, more accurately, variates.)

Locations of points

For any homogeneous Poisson point process, we need to position the points uniformly on the underlying space, which is in this case is the sphere. I will outline two different methods for positioning the points randomly and uniformly on a sphere.

Method 1: Spherical coordinates

The first method is based on spherical coordinates \((\rho, \theta,\phi)\), where the radial coordinate \(\rho\geq 0\), and the angular coordinates \(0 \leq \theta\leq 2\pi\) and \(0\leq \phi \leq \pi\). The change of coordinates gives \(x=\rho\sin(\theta)\cos(\phi)\), \(y=\rho\sin(\theta)\sin(\phi)\), and \(z=\rho\cos(\phi)\).

Now we use Proposition 1.1 in this paper. For each point, we generate two uniform variables \(V\) and \(\Theta\) on the respective intervals \((-1,1)\) and \((0,2\pi)\). Then we place the point with the Cartesian coordinates

$$X =  r  \sqrt{1-V^2} \cos\Theta, $$

$$Y =  r  \sqrt{1-V^2}\sin\Theta, $$

$$ Z=r V. $$

This method places a uniform point on a sphere with a radius \(r\).

How does it work?

I’ll skip the precise details, but give some interpretation of this method. The random variable \(\Phi := \arccos V\) is the \(\phi\)-coordinate of the uniform point, which implies \(\sin \Phi=\sqrt{1-V^2}\), due to basic trigonometric identities.  The area element in polar coordinates is \(dA = \rho^2 \sin\phi d\phi d\theta \), which is constant with respect to \(\theta\). After integrating with respect to \(\phi\),  we see that the random variable \(V=\cos\Phi\) needs to be uniform (instead of \(\Phi\)) to ensure the point is uniformly located on the surface.

Method 2: Normal random variables

For each point, we generate three standard normal or Gaussian random variables, say, \(W_x\), \(W_y\), and \(W_z\), which are independent of each other. (The term standard here means the normal random variables have mean \(\mu =0\) and standard deviation \(\sigma=1\).)  The three random variables are the Cartesian components of the random point. We rescale the components by the Euclidean norm, then multiply by the radius \(r\), giving

$$X=\frac{rW_x}{(W_x^2+W_y^2+W_z^2)^{1/2}},$$

$$Y=\frac{rW_y}{(W_x^2+W_y^2+W_z^2)^{1/2}},$$

$$Z=\frac{rW_z}{(W_x^2+W_y^2+W_z^2)^{1/2}}.$$

These are the Cartesian coordinates of a point uniformly scattered on a  sphere with radius \(r\) and a centre at the origin.

How does it work?

The procedure is somewhat like the Box-Muller transform in reverse. In the post on the circle setting,  I gave an outline of the proof, which I recommend reading. The joint density of the normal random variables is from a multivariate normal distribution with zero correlation. This joint density is constant on the sphere, implying that the angle of the point \((W_x, W_y, W_z)\) will be uniformly distributed.

The vector formed from the normal variables \((W_x, W_y,W_z)\) is a random variable with a chi distribution.  We can see that the vector from the origin to the point \((X,Y,Z)\) has length one, because we rescaled it with the Euclidean norm.

Plotting

Depending on your plotting software, the points may more resemble points on an ellipsoid than a sphere due to the different scaling of the x, y and z axes. To fix this in MATLAB, run the command: axis square. In Python, it’s not straightforward to do this, as it seems to lack an automatic function, so I have skipped it.

Results

I have presented some results produced by code written in MATLAB and Python. The blue points are the Poisson points on the sphere. I have used a surface plot (with clear faces) to illustrate the underling sphere in black.

MATLAB

Python

Note: The aspect ratio in 3-D Python plots tends to squash the sphere slightly, but it is a sphere.

Code

The code for all my posts is located online here. For this post, the code in MATLAB and Python is here.  In Python I used the library mpl_toolkits for doing 3-D plots.

Poisson point process inside the sphere

Perhaps you want to simulate a Poisson point process inside the ball.  There are different ways we can do this, but I will describe just one way, which builds off Method 1 for positioning the points uniformly. (In a later post, I will modify Method 2, giving a way to uniformly position points inside the ball.)

For this simulation method, you need to make two simple modifications to the simulation procedure.

Number of points

The number of points of a Poisson point process inside a sphere of radius \(r>0\) is a Poisson random variable with mean \(\lambda V_3\), where \(V_3=4\pi r^3\) is the volume of the sphere.

Locations of points

We will modify Method 1 as outlined above. To sample the points uniformly in the sphere, you need to generate uniform variables on the unit interval \((0,1)\), take their cubic roots, and then, multiply them by the radius \(r\). (This is akin to the step of taking the square root in the disk setting.) The random variables for the angular coordinates are generated as before.

Further reading

I recommend this blog post, which discusses different methods for randomly placing points on spheres and inside spheres (or, rather, balls) in a uniform manner.  (Embedded in two dimensions, a sphere is a circle and a ball is a disk.)

Our Method 2 for positioning points uniformly, which uses normal variables, comes from the paper:

  • 1959, Muller, A note on a method for generating points uniformly on n-dimensional spheres.

This sampling method relies upon old observations that normal variables are connected to spheres and circles. I also found this post on a similar topic. Perhaps not surprisingly, the above paper is written by the same Muller behind the Box-Muller method for sampling normal random variables.

Update: The connection between the normal distribution and rotational symmetry has been the subject of some recent 3Blue1Brown videos on YouTube.

Here is some sample Python code for creating a 3-D scatter plot.