H. Paul Keeler

Who was S.R. Broadbent?

I found myself recently wondering who the first author was of a seminal paper that created a mathematical field. I did some online digging and I noted my findings here for others who may ask the same question.

Percolation theory

The 1957 paper by S.R. Broadbent and J.M. Hammersley gave birth to percolation theory. I noticed the author affiliations:

S.R. Broadbent, United Glass Bottle Manufacturers
J.M. Hammersley, United Kingdom Atomic Energy Research Establishment Harwell

I knew John Hammersley’s name already, as he had co-written a classic book on Monte Carlo methods. Indeed, if you look at the end of the paper, you see the paper’s idea was born at an early conference on Monte Carlo methods.

But who was S.R. Broadbent? And what was he doing at a bottle factory?

Probability work

This Broadbent character was a bit mysterious. At first, it seemed as though the writer always published as S.R. Broadbent — so much so, I thought he might be a she. Such things have happened before, as the story behind the influential 1960 paper by A.H. Land and A.G. Doig demonstrated.¹

From the start I could see S.R. Broadbent was strong in probability. In the 1950s Broadbent wrote probability papers, including a 1953 paper co-written by the British father of stochastic geometry David G. Kendall. ² The Broadbent and Kendall paper has the improbable title The Random Walk of Trichostrongylus retortaeformis. (It’s a type of larva.)

And then I was thrown by the title of a paper co-written by Broadbent in the 1960s: A computer assessment of media schedules.

Simon says ley lines are just spatial coincidences

Then I lost track of S.R. Broadbent in terms of academic papers. I had almost given up, assuming Broadbent had left academia. But then I saw Broadbent’s name mentioned in the acknowledgements in a famous spatial statistics paper written by David Kendall in 1989. In the paper Kendalls cites a 1980 paper written by S.R. Broadbent. The subject was the statistics of ley lines — it’s a thing — showing they can be explained by mere chance. And in the acknowledgements Kendall thanks a certain Simon Broadbent — presumably he’s S.R. Broadbent whose paper Kendall cited.

Had I found the S.R. Broadbent? I was a bit skeptical, but then he’s mentioned as Simon Broadbent in this obituary of Hammersley:

Despite the title of the paper, ‘Poor man’s Monte Carlo’, the lasting contributions are the clear statement of the problem of counting self-avoiding walks, the use of subadditivity to prove the existence of the connective constant, and the discussion of random media that culminated in Simon Broadbent’s formulation of the percolation model.

That reassured me.

And then I found two 1950s papers (1954 and 1958) by a Simon Broadbent, not S.R. Broadbent, published in the Journal of the Royal Statistical Society Series C: Applied Statistics. I had actually stumbled upon the 1958 paper earlier in my searches. Initially I didn’t think it was S.R. Broadbent because the journal referred to the author as Dr Simon Broadbent, and I thought only medical doctors do that to each other. But it’s a statistics journal. So it’s probably S.R. Broadbent. Also, for this 1954 paper in the same journal they call him Mr Simon Broadbent.

And then I looked at the paper by Broadbent that David Kendall cited, and in that paper Broadbent acknowledges Kendall. And then I noticed that this paper was actually written by Simon Broadbent, but Kendall had cited the paper and written its author as S.R. Broadbent. The final link. S.R. Broadbent was Simon Broadbent — whoever that was.

I thought I had come to the end. But I hadn’t.

Advertising guru

A non-scientific article had floated up in my search results talking about the death of some big name in advertising. Seeing the title and reading the start, I thought it couldn’t be the same Simon Broadbent.

Simon asked simple questions: How does advertising work? How much advertising is enough? How can we tell whether our ad campaign has succeeded? Is adspend profitable?
When he entered advertising in the 60s, none of these questions had satisfactory answers. Many doubted they could ever be answered. Simon did more than anyone to show that advertising can contribute to profit, and that it is not a cost that can be cut with no effect on sales.

But after reading more, I realized this had to be the very same Broadbent.

He answered advertising questions with the rigour of a mathematician and the clarity of a poet. He read engineering at Cambridge, took an applauded first in mathematics and a diploma in statistics at Magdalen, Oxford, and completed his doctorate in statistics at Imperial, London. His poetry was published by Blackwell’s while he was still an undergraduate.

His lifelong interest was applying statistics to problems that nobody had thought amenable to statistical analysis. His paper to the Royal Statistical Society, In Search of the Ley Hunter, debunked claims for the existence of megalithic ley lines, pointing out that there were fewer alleged lines than would be expected from a random distribution of points between which lines could be drawn. Preparing the paper also allowed him to crawl about in Greenwich Park, testing the likely accuracy of stone-age surveying devices using instruments he had built himself.

Simon R. Broadbent made quite the impression in advertising, even being called a global legend in media research. Some of his contributions to this field are mentioned in this published tribute. It seems this field remembered him better than his original field. I couldn’t find any tributes focusing on his mathematical contributions.

Mystery solved, except for what the R stands for.

Back to percolation

Now knowing the S stood for Simon, I noticed that Simon Broadbent is mentioned in the Wikipedia article on percolation theory. The article cites a tutorial article by John Hammersley and Dominic Welsh, which gives the original motivation of percolation theory:

The study of percolation originated from a question posed to one of us by S. R. Broadbent (see Hammersley and Morton 1954). Broadbent was at that time working at the British Coal Utilization Research Association on the design of gas masks for use in coal mines. These masks contained porous carbon granules (the medium) into which gas (the fluid) could penetrate. The pores in a granule constituted a random network of tiny interconnecting tunnels, into which the gas could move by surface adsorption; and the question was what properties of such a random network would assist or inhibit the uptake of gas. If the pores were large enough and richly enough connected, the gas could permeate the interior of a granule; but if the pores were too small or inadequately connected, the gas could not get beyond the outer surface of a granule. Thus there was a critical point, above which the mask worked well and below which it was ineffective. Critical phenomena of this sort are typical of percolation processes, and do not, normally arise in ordinary diffusion.

The original Broadbent and Hammersley paper did mention that Broadbent had previously worked at British Coal Utilisation Research Association (BCURA).

(Another aside, the same Wikipedia article says Rosalind Franklin, whose X-ray work helped reveal the structure of DNA, also worked on porous material at the British Coal Utilisation Research Association, before leaving in 1946, which I guess was before Broadbent started working there. You can read about her career, for example, here.)

Summary

Simon R. Broadbent was a trained engineer and mathematician who co-founded the thriving mathematical discipline of percolation theory, before leaving that world to go revolutionize the field of advertising by using quantitative methods and writing influential books.

Book manuscript: Random Measures, Point Processes, and Stochastic Geometry

Interested in the theory of point processes? There is the freely available online book manuscript:

Baccelli, Błaszczyszyn, and Karray – Random Measures, Point Processes, and Stochastic Geometry.

The authors are established names in the field having written many papers on using point processes to model wireless networks.

(Disclaimer: I have worked and published with them. I have even co-authored a book with one of them.)

Of course there are already books on point process, including the two-volume classic by the Daryl Daley and David Vere-Jones. Although that work still serves as a good reference, since its publication researchers have produced many new results. This explains the publication of two more recent books on point processes and their generalization random measures:

Last and Penrose – Lectures on the Poisson process, Cambridge University Press.
Kallenberg – Random Measures, Theory and Applications, Springer.

There’s a free online version of the book by Last and Penrose.

(Disclaimer: Günter Last graciously mentioned my name in the acknowledgements for my very minor contribution to the historical section.)

The manuscript by Baccelli, Błaszczyszyn, and Karray covers newer topics and results by using mathematically rigorous proofs. For example, there are interesting results on determinantal point processes, also called fermion point processes, which started off as a model for subatomic particles in seminal work by Odile Macchi, but they have found applications in wireless network models and machine learning. (I mentioned these later applications in this and this post.)

Despite applications being a strong motivation, the book is not for the faint of hearted in terms of mathematics. In the tradition of French probability, there is plenty of rigour, which means mathematical abstraction, analysis and measure theory.

Update: Mohamed Karray uploaded some lecture slides based on these lectures. Karray is a researcher at Orange Labs in Paris who has research interests in telecommunications (naturally), as well as machine (or statistical) learning.

New link – almostsuremath.com

This probability blog came up in my news feed:

https://almostsuremath.com/

It seems to focus on stochastic processes such as Brownian motion and friends.

A Fields Medal goes to another percolation researcher

The Fields Medal is a prize in mathematics awarded every four years to two to four outstanding researchers (forty years old or younger) working in mathematics. One of the medals this year was awarded to French mathematician Hugo Duminil-Copin who has solved problems and obtained new results in the percolation theory which lies in the intersection of probability and statistical physics. Here’s a good Quanta article on Duminil-Copin and some of his work.

(The other winners are June Huh, James Maynard, and Maryna Viazovska.)

The Fields Medal people has been kind to probability researchers in recent years. Previous winners working in probability have included Wendelin Werner (2006), Stanislav Smirov (2010), and Martin Hairer (2014), while other winners in recent years have also made contributions to probability.

All in all, that’s not too shabby for a discipline that for a long, long time wasn’t considered part of mathematics. (That story deserves a post on its own.)

I work nowhere near Duminil-Copin, but I have used some percolation theory in my work. I will write a couple of posts on percolation theory. Eventually, I may even mention some recent work that my collaborators and I have been working on.

News: Accelerated PyTorch learning on Mac

The PyTorch website reveals that I can now do accelerated model training with PyTorch on my Mac (which has one of the new Silicon M series chips):

In collaboration with the Metal engineering team at Apple, we are excited to announce support for GPU-accelerated PyTorch training on Mac. Until now, PyTorch training on Mac only leveraged the CPU, but with the upcoming PyTorch v1.12 release, developers and researchers can take advantage of Apple silicon GPUs for significantly faster model training. This unlocks the ability to perform machine learning workflows like prototyping and fine-tuning locally, right on Mac.

As the website says, this new PyTorch acceleration capability is due to Metal, which is Apple’s graphics processing (or GPU) technology.

This means I no longer need to use remote machines when training models. I can do it all on my trusty home computer. Let’s check.

The easiest way to check is, if you haven’t already, first install PyTorch. I would use Anaconda:

conda install pytorch torchvision -c pytorch

With PyTorch installed, run Python and then run the commands:

import torch
print(torch.backends.mps.is_available()) 
print(torch.backends.mps.is_built())

If the last commands works without a hitch, you have PyTorch installed. The last two commands should return True, meaning that PyTorch can use your graphics card for (accelerated) calculations.

New link: xianblog.wordpress.com

I have come across posts on this blog at least three or four times:

https://xianblog.wordpress.com/

It happens, I later discovered, to be maintained by Christian P. Robert, a senior research figure in Markov chain Monte Carlo methods and Bayesian statistics.

New link: Dataflowr – Deep Learning DIY

Want to learn some deep learning? I recommend the Dataflowr website:

https://dataflowr.github.io/website/

It’s a good resource for learning the basics of neural networks using the PyTorch library in Python. The focus is on writing and running code. You can even play around with GPUs (graphical processing units) by running them on Google’s Colab, though that’s usually not needed.

It’s mostly run by a researcher who is a former colleague of mine and, while I was at Inria, was indirectly the reason I started using PyTorch for my machine learning work.

New link: extremelearning.com.au

When researching topics for my work (and for posts), I sometimes stumble upon the same blog more than once for different reasons. One of these is this one:

http://extremelearning.com.au/

It’s run by a Tasmanian physicist turned data scientist. Topics include quasi-random sequences, the Fisher-Yates sampling algorithm, and sampling points uniformly on a triangle.

Update: A post on the multi-arm bandit problem, which is a prototypical problem in reinforcement learning.

Sparse Spiking Gradient Descent

A preprint caught my eye:

2022 – Perez-Nieves and Goodman – Sparse Spiking Gradient Descent.

I love myself a short title. But you wouldn’t guess from the title that the paper is about artificial neural networks. But not just any type of neural network. Here’s the abstract:

There is an increasing interest in emulating Spiking Neural Networks (SNNs) on neuromorphic computing devices due to their low energy consumption. Recent advances have allowed training SNNs to a point where they start to compete with traditional Artificial Neural Networks (ANNs) in terms of accuracy, while at the same time being energy efficient when run on neuromorphic hardware. However, the process of training SNNs is still based on dense tensor operations originally developed for ANNs which do not leverage the spatiotemporally sparse nature of SNNs. We present here the first sparse SNN backpropagation algorithm which achieves the same or better accuracy as current state of the art methods while being significantly faster and more memory efficient. We show the effectiveness of our method on real datasets of varying complexity (Fashion-MNIST, Neuromophic-MNIST and Spiking Heidelberg Digits) achieving a speedup in the backward pass of up to 150x, and 85% more memory efficient, without losing accuracy.

OK. I’ll bite. So what’s a spiking neural network? And why should we care?

Improving on artificial neural networks

Artificial neural networks get all the glory. They are now everywhere. You can’t open up a newspaper or your laptop without seeing a reference to or being pestered by some agent of artificial intelligence (AI), which usually implies an artificial neural network is working in the background. Despite this, they are far from ideal.

In in some sense, mainstream artificial networks networks are rather brain-lite, as they only draw inspiration from how brains actually function. These statistical models are mostly linear and continuous, which makes them well-behaved mathematically (or algorithmically) speaking.

But in terms of energy and time required for training and using computational tools, the carbon-based grey matter is winning. Kids don’t need to read every book, newspaper and sonnet ever written to master a language. While understanding these words, our brains aren’t boiling volumes of water in excess heat output.

To make such statistical models more brain-heavy, researchers have proposed neural networks that run on spikes, drawing inspiration from the spiky electrical jolts that run through brain neurons. The proposal is something called a spiking neural network, which is also artificial neural networks, but they run on spikes with the aim of being more efficient at learning and doing tasks.

Spiking neural networks are not smooth

The problem is that these spiky models are hard to train (or fit), as their nice behaving property of smoothness vanishes when there’s no continuity. You cannot simply run the forward pass and then backward pass to find the gradients of your neural network model, as you do with regular neural networks when doing backpropagation.

Surrogate gradients

Despite this obstacle, there have been some proposals for training these statistical models. The training proposals often come down to proposing a continuous function to approximate the actual function being used in the spiking neural networks. The function’s gradients are found using standard methods. We can then use these surrogate gradients to infer in which direction we should move to to better train (or fit) the model.

I know. It sounds like cheating, using one function to guess something about another function. But there has been some success with this approach.

Sparse Spiking Gradients

The training method proposed by Perez-Nieves and Goodman is a type of surrogate method. Using the leaky integrate-and-fire (LIF) model for neuron firing, they develop an approximation for the gradients of their spiking neural network. A key feature of their approach is that, much like our brains, their model is sparse in the sense only a small fraction of neurons are every firing.

Provided the spiking neural network attains a certain degree of sparsity, their sparse spiking gradient descent gives faster, less memory-hungry results.

Perez-Nieves and Goodman support their claims by giving numerical results, which they obtained by running their method on graphical processing units (GPUs). These ferociously fast video game chips have become the standard hardware for doing big number-crunching tasks that are routinely required of working with models in machine learning and artificial intelligence.

Poisson (stochastic) process

One of the most important stochastic processes is Poisson stochastic process, often called simply the Poisson process. In a previous post I gave the definition of a stochastic process (also called a random process) alongside some examples of this important random object, including counting processes. The Poisson (stochastic) process is a counting process. This continuous-time stochastic process is a highly studied and used object. It plays a key role in different probability fields, particularly those focused on stochastic processes such as stochastic calculus (with jumps) and the theories of Markov processes, queueing, point processes (on the real line), and Levy processes.

The points in time when a Poisson stochastic process increases form a Poisson point process on the real line. In this setting the stochastic process and the point process can be considered two interpretations of the same random object. The Poisson point process is often just called the Poisson process, but a Poisson point process can be defined on more generals spaces. In some literature, such as the theory of Lévy processes, a Poisson point process is called a Poisson random measure, differentiating the Poisson point process from the Poisson stochastic process. Due to the connection with the Poisson distribution, the two mathematical objects are named after Simeon Poisson, but he never studied these random objects.

The other important stochastic process is the Wiener process or Brownian (motion process), which I cover in another post. The Wiener process is arguably the most important stochastic process. I have written that post and the current one with the same structure and style, reflecting and emphasizing the similarities between these two fundamental stochastic process.

In this post I will give a definition of the homogenous Poisson process. I will also describe some of its key properties and importance. In future posts I will cover the history and generalizations of this stochastic process.

Definition

In the stochastic processes literature there are different definitions of the Poisson process. These depend on the settings such as the level of mathematical rigour. I give a mathematical definition which captures the main characteristics of this stochastic process.

Definition: Homogeneous Poisson (stochastic) process

An integer-valued stochastic process \(\{N_t:t\geq 0 \}\) defined on a probability space \((\Omega,\mathcal{A},\mathbb{P})\) is a homogeneous Poisson (stochastic) process if it has the following properties:

The initial value of the stochastic process \(\{N_t:t\geq 0 \}\) is zero with probability one, meaning \(P(N_0=0)=1\).

The increment \(N_t-N_s\) is independent of the past, that is, \(N_u\), where \(0\leq u\leq s\).

The increment \(N_t-N_s\) is a Poisson variable with mean \(\lambda (t-s)\).

In some literature, the initial value of the stochastic process may not be given. Alternatively, it is simply stated as \(N_0=0\) instead of the more precise (probabilistic) statement given above.

Also, some definitions of this stochastic process include an extra property or two. For example, from the above definition, we can infer that increments of the homogeneous Poisson process are stationary due to the properties of the Poisson distribution. But a definition may include something like the following property, which explicitly states that this stochastic process is stationary.

For \(0\leq u\leq s\), the increment \(N_t-N_s\) is equal in distribution to \(N_{t-s}\).

The definitions may also describe the continuity of the realizations of the stochastic process, known as sample paths, which we will cover in the next section.

It’s interesting to compare these defining properties with the corresponding ones of the standard Wiener stochastic process. Both stochastic processes build upon divisible probability distributions. Using this property, Lévy processes generalize these two stochastic processes.

Properties

The definition of the Poisson (stochastic) process means that it has stationary and independent increments. These are arguably the most important properties as they lead to the great tractability of this stochastic process. The increments are Poisson random variables, implying they can have only positive (integer) values.

The Poisson (stochastic) process exhibits closure properties, meaning you apply certain operations, you get another Poisson (stochastic) process. For example, if we sum two independent Poisson processes \(X= \{X_t:t\geq 0 \}\) and \(Y= \{Y_t:t\geq 0 \}\), then the resulting stochastic process \(Z=Z+Y = \{N_t:t\geq 0 \}\) is also a Poisson (stochastic) process. Such properties are useful for proving mathematical results.

A single realization of a (homogeneous) Poisson stochastic process, where the blue marks show where the process jumps to the next value. In any finite time interval, there are a finite number of jumps.

Properties such as independence and stationarity of the increments are so-called distributional properties. But the sample paths of this stochastic process are also interesting. A sample path of a Poisson stochastic process is almost surely non-decreasing, being constant except for jumps of size one. (The term almost surely comes from measure theory, but it means with probability one.) There are only finitely number of jumps in each finite time interval.

The homogeneous Poisson (stochastic) process has the Markov property, making it an example of a Markov process. The homogenous Poisson process \(N=\{ N_t\}_{t\geq 0}\)s not a martingale. But interestingly, the stochastic process is \(\{ W_t – \lambda t\}_{t\geq 0}\) is a martingale. (Such relations have been used to study such stochastic processes with tools from martingale theory.)

Stochastic or point process?

The Poisson (stochastic) process is a discrete-valued stochastic process in continuous time. The relation these types of stochastic processes and point process is a subtle one. For example, David Cox and Valerie Isham write on page 3 of their monograph:

The borderline between point processes and a number of other kinds of stochastic process is not sharply defined. In particular, any stochastic process in continuous time in which the sample paths are step functions, and therefore any any process with a discrete state space, is associated with a point process, where a point is a time of transition or, more generally, a time of entry into a pre-assigned state or set of states. Whether it is useful to look at a particular process in this way depends on the purpose of the analysis.

For the Poisson case, this association is presented in the diagram below. We can see the Poisson point process (in red) associated with the Poisson (stochastic) process (in blue) by simply looking at the time points where jumps occur.

A single realization of a (homogeneous) Poisson stochastic process (in blue). The jumps of the process form a (homogeneous) Poisson point process (in red) on the real line representing time.

Importance

Playing a prominent role in the theory of probability, the Poisson (stochastic) process is a highly important and studied stochastic process. It has connections to other stochastic processes and is central in queueing theory and random measures.

The Poisson process is a building block for more complex continuous-time Markov processes with discrete state spaces, which are used as mathematical models. It is also essential in the study of jump processes and subordinators.

The Poisson (stochastic) process is a member of some important families of stochastic processes, including Markov processes, Lévy processes, and birth-death processes. This stochastic process also has many applications. For example, it plays a central role in quantitative finance. It is also used in the physical sciences as well as some branches of social sciences, as a mathematical model for various random phenomena.

Generalizations and modifications

For the Poisson (stochastic) process, the index set and state space are respectively the non-negative numbers and counting numbers, that is \(T=[0,\infty)\) and \(S=0, 1, \dots\), so it has a continuous index set but a discrete state space. Consequently, changing the state space, index set, or both offers an ways for generalizing and modifying the Poisson (stochastic) process.

Simulation

The defining properties of the Poisson stochastic process, namely independence and stationarity of increments, results in it being easy to simulate. The Poisson stochastic process can be simulated provided random variables can be simulated or sampled according to a Poisson distributions, which I have covered in this and this post.

Simulating a Poisson stochastic process is similar to simulating a Poisson point process. (Basically, it is the same method in a one-dimensional setting.) But I will leave the details of sampling this stochastic process for another post.