The return of the Box-Muller method due to GPUs

Sooner or later in life, you will need to generate a couple normal (or Gaussian) variables. Maybe you’re simulating some thermal noise or the stock market. Perhaps you’re sampling a point process with Gaussian coordinates. Or you’re running a generative artificial intelligence (Gen AI) model, such as a variational autoencoder, which needs to select a Gaussian point on an abstract mathematical space.

Whatever the reason, you can get your independent normal variables by applying the Box-Muller transform to independent uniform random variables. All you need to do is change from Cartesian to polar coordinates. Easy peasy, lemon squeezy.

Box-Muller was not the favourite

I previously wrote about the Box-Muller method, ending on the note that the method, though very cool, isn’t used so much as it relies upon mathematical functions that have been computationally expensive for processors, at least historically.

Box-Muller the favourite on GPUs

But that conventional wisdom has changed in recent years as processors can now (on a hardware level) readily evaluate such functions. More significantly, the fast rise of graphically processing units (GPUs) has also changed the rules of the game.

In a comment on my original Box-Muller post,  it was pointed out that the Box-Muller method is the preferred choice for GPUs, giving a link to the Nvidia website. The reason for using the Box-Muller method is GPUs do not handle well algorithms with loops and branches, so you should use methods that avoid these algorithmic steps. And the Box-Muller method is one that does just that.

The NVDIA website says:

Because GPUs are so sensitive to looping and branching, it turns out that the best choice for the Gaussian transform is actually the venerable Box-Muller transform…

Indeed.

Determinantal learning for selecting subsets in wireless networks

We can’t all talk all the time, while still being heard. That’s true for people and it’s true for wireless networks.

Networks need to schedule when their network nodes can transmit and receive data. In other words, a collection or subset of network nodes is chosen to transmit, while another subset is chosen to receive. (In telecommunication language, this is part of what is called the medium access control or MAC layer.)

The classic scheduler (spatial) Aloha involves each node flipping a biased coin with probability \(p\) of heads occurring. If heads comes up, a node can transmit. If not, it must stand ready to receive data. Simple.

But can we do better?

Determinantal scheduling

A few years ago, a couple collaborators and I become interested in this problem. To develop an intelligent (or, at least, not stupid) scheduler, we adapted a machine (or statistical) learning framework and published the results in a conference paper:

  • 2020 – Błaszczyszyn, Brochard, and Keeler, Coverage probability in wireless networks with determinantal scheduling.

The hint is in the title. Using a determinantal scheduler, we derived mathematical expressions for the coverage probability. I discussed that paper in an earlier post. I produced the numerical results with code located here and here.

From fermions to generative model

We used a learning framework based on determinantal point process, which were originally used to model subatomic particles called fermions such as electrons. A key property is that determinantal points, like fermions, repel each other, so this is a random model for diversity, anti-clustering or negative correlations.

Defined with a kernel matrix and determinants, these random models have convenient mathematical properties. For example, you can both simulate them (exactly) and train (or fit) them with maximum likelihood methods. Using this fermion model, Kulesa and Taskar proposed and pioneered a machine learning framework.

When properly trained (or fitted), determinantal models can act as generative artificial intelligence (or Gen AI) models. In some sense, these point processes are special types of Gibbsian point processes, which are defined with a general energy function called a Hamiltonian. In the field of AI and computer science more broadly, there are many energy-based models, some of which are used for generative models such as Boltzmann machines.

Recent work

A few days ago I was curious what was happening in this research area, and by chance, I noticed this recently uploaded preprint:

  • 2025 – Tu, Saha, and Dhillon –  Determinantal Learning for Subset Selection in Wireless Networks.

I feel more work remains to be done in this area.

Further reading

Wireless networks

Our work was inspired by an earlier paper that we did:

    • 2019 – Błaszczyszyn and Keeler, Determinantal thinning of point processes with network learning applications.

In that paper, which I mentioned in an earlier post, we discussed the potential of using determinantal point processes for scheduling network transmission (or selecting subsets). We pointed out that such a model could be trained on optimized network configurations.

The new work by Tu, Saha, and Dhillon is based on previous work, such as this ambitiously-titled paper:

  • 2019 – Saha and Dhillon – Machine Learning meets Stochastic Geometry: Determinantal Subset Selection for Wireless Networks.

This work by Saha and Dhillon work was independent of our work on scheduling, but came out around the same time.

Machine learning

I should stress that none of the above work was possible without the pioneering efforts by Kulesza and Taskar in a series of papers, much of which appears in their book:

  • 2012 – Kulesza and Taskar – Determinantal point processes for machine learning, Now Publishers.

You can find a version of the book here.

Coverage probability in wireless networks with determinantal scheduling

My collaborators and I uploaded a manuscript:

  • Błaszczyszyn, Brochard, and Keeler, Coverage probability in wireless networks with determinantal scheduling.

https://arxiv.org/abs/2006.05038

Details

The paper builds off some previous work by us that uses a (relatively) new model in machine learning:

  • Błaszczyszyn and Keeler, Determinantal thinning of point processes with network learning applications.

https://arxiv.org/abs/1810.08672

The new machine learning model is based on a special type of point process called a determinantal point process. It was originally called a Fermion point process. These are useful point processes as the exhibit certain closure properties under certain operations such independent thinning.

Determinantal point processes are random objects and, when they are properly trained or fitted, can serve as generative artificial intelligence (or Gen AI) models. In fact, you can interpret these point processes as a special Gibbsian point processes, which have at their core a Hamiltonian function. A Hamiltonian is generalized type of energy function, which is also the basis for other generative (or stochastic) models such as Boltzmann machines.

Kulesza and Taskar introduced and developed the framework for using determinantal point processes for machine learning models.

Code

The MATLAB code for the producing the results in the paper can be found here:

https://github.com/hpaulkeeler/detcov_matlab

I also re-wrote the MATLAB code into Python:

https://github.com/hpaulkeeler/detcov_python

Further reading

Our work started with this related paper:

  • Błaszczyszyn and Keeler, Determinantal thinning of point processes with network learning applications.

I wrote about this work briefly in an earlier post.

Parallel (or orthogonal, perhaps) to our work, is this paper:

  • 2019 – Saha and Dhillon – Machine Learning meets Stochastic Geometry: Determinantal Subset Selection for Wireless Networks.

https://arxiv.org/abs/1905.00504

Update: A manuscript extending the above line of research by Saha and Dhillon was uploaded:

  • 2025 – Tu, Saha and Dhillon – Determinantal Learning for Subset Selection in Wireless Networks.

https://arxiv.org/abs/2503.03151