shuffling Archives – H. Paul Keeler

Imagine you’re trying to estimate some statistics about a certain percolation system. For example, estimating the probability of a given bond being open, which means connected, when a giant component forms. The system or model is too complicated for analytic results, which is true for all percolation models with the exception of a handful.

So you code up the percolation model in your favourite programming language, run a large number of stochastic simulations, collect the statistics, and then look at the results. Percolation systems are, by definition, large, but with fast computers, the simulation method works. You get your statistics.

Easy.

But that’s just for one parameter. What if you want to see what happens with a range of parameter values? Well, you do the same thing, but now just run the large number of simulations for a range of different parameter values, right?

No.

You could do that, and it would work. But it would be slow. How to make it faster?

Newman and Ziff proposed and implemented a fast percolation algorithm based on the simple ideas of keeping old simulations and using conditional probabilities.

Algorithm

The algorithm was presented in this paper:

2000 – Newman, Ziff – Efficient Monte Carlo Algorithm and High-Precision Results for Percolation

There exists a preprint with a slightly different title.

2001 – Newman, Ziff – A fast Monte Carlo algorithm for site or bond percolation.

Overview

The simulation method consists of three components.

Components of the Newman-Ziff algorithm

Re-use parts of previous simulation outcomes by shuffling.

Use fast union-find algorithms.

Find a smooth curve of results by using conditional probabilities.

None of them are entirely revolutionary, but put together they form a fast (and popular) simulation method for studying (monotonic) percolation models.

1. Randomly sampling sequences

The main insight was to sample in an incremental fashion, adding an open bond to the percolation model, such as a lattice, during each simulation run, without throwing away the previous simulation. More specifically, let’s say they sample a bond model with $k$ open bonds during one simulation run. Then in the next simulation, use the same configuration, and simply add one bond.

This is a nice trick, but there’s no free lunch here. On one hand, they don’t waste results. But on the other hand, they do throw away independence (and ergodicity) by doing this. But perhaps the numerical results don’t care.

To do this step, Newman and Ziff use a random shuffling algorithm called the Fisher-Yates shuffle. But it’s not clear if Newmand and Ziff thought of this shuffling algorithm independently or not. They never call it by its name nor do they cite any work on this well-known shuffling method. Both MATLAB and Python (SciPy) have functions for doing random shuffles.

2. Using union-find

Imagine you have a collection of things. Some of these things are connected to other things in your collection. You want to understand how these clusters of connected things behave.

To find the clusters, looking for ultimately the biggest one, Newman and Ziff use a union-find method. There are different types of these alogrithms, often hinging upon the use of recursion. They were developed mostly in the 1980s, particularly in work by Robert Tarjan.

These methods are very efficient at finding clusters. The algorithm speed or, rather, complexity is given in relation to the inverse of an Ackermann function, which is a famously fast growing function. These methods rely upon the monotonicity of growing unions.

(If you have a non-monotonic percolating system, then unfortunately you can’t use these algorithms, which is the case for percolation based on signal-to-interference-plus-noise ratio (SINR). )

3. Filling in the parameter gaps

For any random sample of a percolation model, the number of open bonds or sites is a natural number. But the parameter of the percolation model, such as the probability of a site or bond being open, is a real number.

Newman and Ziff use a simple conditioning argument to fill the gaps.

For any statistic $S$ of of a given percolation model, the Newman-Ziff algorithm finds the expected statistic conditioned on $n$ number of open bonds (or equivalent objects in other models). In other words, the algorithm finds the expected conditional statistics $E[S_n|N=n]$. Then we just need the probability of $N$ being $n$ as a function of the parameter we’re trying to vary, such as the bond probability $p$. With this probability $P(N=n)$, we immediately arrive at the expression for the statistic $S$ with introductory probability

$$ E(S_n)=E_N [E(S_n|N=n)].$$

For a nice discrete model with $m$ total, we get the expression

$$ E(S_n)=\sum _{n=1}^{m}[P(N=n)S_n].$$

But we know the probability (distribution) of $N$, as it’s simply the binomial variable. Assuming there are $m$ total bonds (or equivalent objects), then we arrive at

$$P(N=n)={m\choose n} p^n(1-p)^{m-n} .$$

Then the final statistic or result is simply

$$E(S_n)=\sum _{n=1}^{m} {m\choose n} p^n(1-p)^{m-n} S_n.$$

Code

I wrote the algorithm in MATLAB and Python. The code can be found here. I’ve included a script for plotting the results (for small square lattices).

MATLAB

It turns out that MATLAB doesn’t like recursion too much. Well, at least that’s my explanation for the slow results when I use the union-find method with recursion. So I used the union-find method without recursion.

Python

My Python code is mostly for illustration purposes. Check out this Python library. That website warns of the traps that multithreading in NumPy can pose.

C

In the other original paper, Newman and Ziff included C code of their algorithm. You can find it on Newman’s website, but I have also a copy of it (with added comments from me) located here.

Future work

I plan to implement this algorithm using the NVIDIA CUDA library. This is exactly the type of algorithm that we can implement and run using graphical processing units (GPUs), which is the entire point of the CUDA library.

At first glance, the spatial dependence of the problem suggests using texture memory to speed up the memory accessing on the GPUs. But I will explain why that’s probably not a good idea for this specific algorithm

Tag: shuffling

The Newman-Ziff algorithm for simulating percolation models