empirical distribution

In previous posts I described how to simulate homogeneous Poisson point processes on a rectangle, disk and triangle. Then I covered how to randomly thin a point process in a spatially dependent manner. Building off these posts, I wrote in my last post how to simulate an inhomogeneous or nonhomogeneous Poisson point process. Now I’ll describe how to verify that the simulation code correctly simulates the desired Poisson point process.

Although this post is focused on the Poisson point process, I stress that parts of the material hold for all point processes. Also, not surprisingly, some of the material and the code overlaps with that presented in the post on the inhomogeneous point process.

Basics

Any Poisson point process is defined with a measure called the intensity measure or mean measure, which I’ll denote by $\Lambda$. For practical purposes, I assume that the intensity measure $\Lambda$ has a derivative $\lambda(x,y)$, where $x$ and $y$ denote the Cartesian coordinates. The function $\lambda(x,y)$ is often called the intensity function or just intensity. I assume this function is bounded, so $\lambda(x,y)<\infty$ for all points in a simulation window $W$. Finally, I assume that the simulation window $W$ is a rectangle.

Several times before I have mentioned that simulating a Poisson point process requires simulating two random components: the number of points and the locations of points. Working backwards, to check a Poisson simulation, we must run the Poisson simulation a large number of times (say $10^3$ or $10^4$), and collect the statistics on these two properties. We’ll start by examining the easiest of the two random components.

Number of points

For any Poisson point process, the number of points is a Poisson random variable with a parameter (that is, a mean) $\Lambda(W)$. Under our previous assumptions, this is given by the surface integral

$$\Lambda(W)=\int_W \lambda(x,y)dxdy.$$

Presumably we can evaluate such an integral analytically or numerically in order to simulate the Poisson point process. To check that we correctly simulate the random the number of points, we just need to simulate the point process a large number of times, and compare the statistics to those given by the analytic expressions.

Moments

The definition of the intensity measure of a point process is a measure that gives the average or expected number of points in some region. As the number of simulations increases, the (sample) average number of points will converge to the intensity measure $\Lambda(W)$. I should stress that this is a test for the intensity measure, a type of first moment, which will work for the intensity measure of any point process.

For Poisson point processes, there is another moment test that can be done. It can be shown mathematically that the variance of the number of points will also converge to the intensity measure $\Lambda(W)$, giving a second empirical test based on second moments. There is no point process theory here, as this moment result is simply due to the number of points being distributed according to a Poisson distribution. The second moment is very good for checking Poissonness, forming the basis for statistical tests. If this and the first moment test hold, then there’s a very strong chance the number of points is a Poisson variable.

Beyond the first two moments, an even more thorough test is to estimate an empirical distribution for the number of points. That is, we perform a histogram on the number of points, and then we normalize it, so the total mass sums to one, producing an empirical probability distribution for the number of points.

The results should closely match the probability mass function of a Poisson distribution with parameter $\Lambda(W)$. This is simply

$$ P(N=n)= \frac{[\Lambda(W)]^n}{n!} e^{-\Lambda(W)}, $$

where $N$ is the random number of points in the window $W$, which has the area $|W|$. If the empirical distribution is close to results given by the above expression, then we can confidently say that the number of points is a Poisson random variable with the right parameter or mean.

Locations of points

To check the positioning of points, we can empirically estimate the intensity function $\lambda(x,y)$. A simple way to do this is to perform a two-dimensional histogram on the point locations. This is very similar to the one-dimensional histogram, which I suggested to do for testing the number of points. It just involves counting the number of points that fall into two-dimensional non-overlapping subsets called bins.

To estimate the intensity function, each bin count needs to be rescaled by the area of the bin, giving a density value for each bin. This empirical estimate of the intensity function should resemble the true intensity function $\lambda(x,y)$. For a visual comparison, we can use a surface plot to illustrate the two sets of results.

This procedure will work for estimating the intensity function of any point process, not just a Poisson one.

Advanced tests

In spatial statistics there are more advanced statistical tests for testing how Poisson a point pattern is. But these tests are arguably too complicated for checking a simple point process that is rather easy to simulate. Furthermore, researchers usually apply these tests to a small number of point patterns. In this setting, it is not possible to accurately obtain empirical distributions without further assumptions. But with simulations, we can generate many simulations and obtain good empirical distributions. In short, I would not use such tests for just checking that I have properly coded a Poisson simulation.

Results

I produced the results with ten thousand simulations, which gave good results and took only a few seconds to complete on a standard desktop computer. Clearly increasing the number of simulations increases the accuracy of the statistics, but it also increases the computation time.

For the results, I used the intensity function

$$\lambda(x,y)=\lambda_1(x,y)+\lambda_2(x,y),$$

where

$$\lambda_1(x,y)=80e^{-((x+0.5)^2+(y+0.5)^2)/s^2},$$

$$\lambda_2(x,y)=100e^{-((x-0.5)^2+(y-0.5)^2)/s^2},$$

and $s>0$ is a scale parameter. We can see that this function has two maxima or peaks at $(-0.5,-0.5)$ and $(0.5,0.5)$.

MATLAB

Python

Code

All code from my posts, as always, can be found on the my GitHub repository. The code for this post is located here.

The estimating the statistical moments is standard. Performing the histograms is also routine, but when normalizing, you have to choose the option that returns the empirical estimate of the probability density function (pdf).

Fortunately, scientific programming languages usually have functions for performing two-dimensional histograms. What is a bit tricky is how to normalize or rescale the bin counts. The histogram functions can, for example, divide by the number of simulations, the area of each bin, or both. In the end, I chose the pdf option in both MATLAB and Python to give an empirical estimate of the probability density function, and then multiplied it by the average number of points, which was calculated in the previous check. (Although, I could have done this in a single step in MATLAB, but not in Python, so I chose to do it in a couple of steps in both languages so the code matches more closely.)

MATLAB

I used the surf function to plot the intensity function and its estimate; see below for details on the histograms.

close all;

%Simulation window parameters
xMin=-1;xMax=1;
yMin=-1;yMax=1;
xDelta=xMax-xMin;yDelta=yMax-yMin; %rectangle dimensions
areaTotal=xDelta*yDelta; %area of rectangle

numbSim=10^4; %number of simulations

s=0.5; %scale parameter

%Point process parameters
fun_lambda=@(x,y)(100*exp(-((x).^2+(y).^2)/s^2));%intensity function

%%%START -- find maximum lambda -- START %%%
%For an intensity function lambda, given by function fun_lambda,
%finds the maximum of lambda in a rectangular region given by
%[xMin,xMax,yMin,yMax].
funNeg=@(x)(-fun_lambda(x(1),x(2))); %negative of lambda
%initial value(ie centre)
xy0=[(xMin+xMax)/2,(yMin+yMax)/2];%initial value(ie centre)
%Set up optimization step
options=optimoptions('fmincon','Display','off');
%Find largest lambda value
[~,lambdaNegMin]=fmincon(funNeg,xy0,[],[],[],[],...
[xMin,yMin],[xMax,yMax],'',options);
lambdaMax=-lambdaNegMin;
%%%END -- find maximum lambda -- END%%%

%define thinning probability function
fun_p=@(x,y)(fun_lambda(x,y)/lambdaMax);

%for collecting statistics -- set numbSim=1 for one simulation
numbPointsRetained=zeros(numbSim,1); %vector to record number of points
for ii=1:numbSim
%Simulate Poisson point process
numbPoints=poissrnd(areaTotal*lambdaMax);%Poisson number of points
xx=xDelta*(rand(numbPoints,1))+xMin;%x coordinates of Poisson points
yy=xDelta*(rand(numbPoints,1))+yMin;%y coordinates of Poisson points

%calculate spatially-dependent thinning probabilities
p=fun_p(xx,yy);

%Generate Bernoulli variables (ie coin flips) for thinning
booleRetained=rand(numbPoints,1)<p; %points to be retained %x/y locations of retained points xxRetained=xx(booleRetained); yyRetained=yy(booleRetained); %collect number of points simulated numbPointsRetained(ii)=length(xxRetained); end %Plotting plot(xxRetained,yyRetained,'bo'); %plot retained points xlabel('x');ylabel('y'); %run empirical test on number of points generated if numbSim&gt;=10
%total mean measure (average number of points)
LambdaNumerical=integral2(fun_lambda,xMin,xMax,yMin,yMax)
%Test: as numbSim increases, numbPointsMean converges to LambdaNumerical
numbPointsMean=mean(numbPointsRetained)
%Test: as numbSim increases, numbPointsVar converges to LambdaNumerical
numbPointsVar=var(numbPointsRetained)

end

For the histogram section, I used the histcounts and histcounts2 functions respectively to estimate the distribution of the number of points and the intensity function. I used the pdf option.

Number of points

histcounts(numbPointsRetained,binEdges,'Normalization','pdf');

Locations of points

histcounts2(xxVectorAll,yyVectorAll,numbBins,'Normalization','pdf');

Python

I used the Matplotlib library to plot the intensity function and its estimate; see below for details on the histograms.

import numpy as np; #NumPy package for arrays, random number generation, etc
import matplotlib.pyplot as plt #for plotting
from matplotlib import cm #for heatmap plotting
from mpl_toolkits import mplot3d #for 3-D plots
from scipy.optimize import minimize #for optimizing
from scipy import integrate #for integrating
from scipy.stats import poisson #for the Poisson probability mass function

plt.close("all"); #close all plots

#Simulation window parameters
xMin=-1;xMax=1;
yMin=-1;yMax=1;
xDelta=xMax-xMin;yDelta=yMax-yMin; #rectangle dimensions
areaTotal=xDelta*yDelta;

numbSim=10**4; #number of simulations
numbBins=30; #number of bins for histogram

#Point process parameters
s=0.5; #scale parameter

def fun_lambda(x,y):
#intensity function
lambdaValue=80*np.exp(-((x+0.5)**2+(y+0.5)**2)/s**2)+100*np.exp(-((x-0.5)**2+(y-0.5)**2)/s**2);
return lambdaValue;

###START -- find maximum lambda -- START ###
#For an intensity function lambda, given by function fun_lambda,
#finds the maximum of lambda in a rectangular region given by
#[xMin,xMax,yMin,yMax].
def fun_Neg(x):
return -fun_lambda(x[0],x[1]); #negative of lambda

xy0=[(xMin+xMax)/2,(yMin+yMax)/2];#initial value(ie centre)
#Find largest lambda value
resultsOpt=minimize(fun_Neg,xy0,bounds=((xMin, xMax), (yMin, yMax)));
lambdaNegMin=resultsOpt.fun; #retrieve minimum value found by minimize
lambdaMax=-lambdaNegMin;
###END -- find maximum lambda -- END ###

#define thinning probability function
def fun_p(x,y):
return fun_lambda(x,y)/lambdaMax;

#for collecting statistics -- set numbSim=1 for one simulation
numbPointsRetained=np.zeros(numbSim); #vector to record number of points
xxAll=[]; yyAll=[];

### START -- Simulation section -- START ###
for ii in range(numbSim):
#Simulate a Poisson point process
numbPoints = np.random.poisson(lambdaMax*areaTotal);#Poisson number of points
xx = xDelta*np.random.uniform(0,1,numbPoints)+xMin;#x coordinates of Poisson points
yy = yDelta*np.random.uniform(0,1,numbPoints)+yMin;#y coordinates of Poisson points

#calculate spatially-dependent thinning probabilities
p=fun_p(xx,yy);

#Generate Bernoulli variables (ie coin flips) for thinning
booleRetained=np.random.uniform(0,1,numbPoints)<p; #points to be thinned

#x/y locations of retained points
xxRetained=xx[booleRetained]; yyRetained=yy[booleRetained];
numbPointsRetained[ii]=xxRetained.size;
xxAll.extend(xxRetained); yyAll.extend(yyRetained);
### END -- Simulation section -- END ###

#Plotting a simulation
fig1 = plt.figure();
plt.scatter(xxRetained,yyRetained, edgecolor='b', facecolor='none');
plt.xlabel("x"); plt.ylabel("y");
plt.title('A single realization of a Poisson point process');
plt.show();

#run empirical test on number of points generated
###START -- Checking number of points -- START###
#total mean measure (average number of points)
LambdaNumerical=integrate.dblquad(fun_lambda,xMin,xMax,lambda x: yMin,lambda y: yMax)[0];
#Test: as numbSim increases, numbPointsMean converges to LambdaNumerical
numbPointsMean=np.mean(numbPointsRetained);
#Test: as numbSim increases, numbPointsVar converges to LambdaNumerical
numbPointsVar=np.var(numbPointsRetained);
binEdges=np.arange(numbPointsRetained.min(),(numbPointsRetained.max()+1))-0.5;
pdfEmp, binEdges=np.histogram(numbPointsRetained, bins=binEdges,density=True);

nValues=np.arange(numbPointsRetained.min(),numbPointsRetained.max());
#analytic solution of probability density
pdfExact=(poisson.pmf(nValues,LambdaNumerical));

#Plotting
fig2 = plt.figure();
plt.scatter(nValues,pdfExact, color='b', marker='s',facecolor='none',label='Exact');
plt.scatter(nValues,pdfEmp, color='r', marker='+',label='Empirical');
plt.xlabel("n"); plt.ylabel("P(N=n)");
plt.title('Distribution of the number of points');
plt.legend();
plt.show();
###END -- Checking number of points -- END###

###START -- Checking locations -- START###
#2-D Histogram section
p_Estimate, xxEdges, yyEdges = np.histogram2d(xxAll, yyAll,bins=numbBins,density=True);
lambda_Estimate=p_Estimate*numbPointsMean;

xxValues=(xxEdges[1:]+xxEdges[0:xxEdges.size-1])/2;
yyValues=(yyEdges[1:]+yyEdges[0:yyEdges.size-1])/2;
X, Y = np.meshgrid(xxValues,yyValues) #create x/y matrices for plotting

#analytic solution of probability density
lambda_Exact=fun_lambda(X,Y);

#Plot empirical estimate
fig3 = plt.figure();
plt.rc('text', usetex=True);
plt.rc('font', family='serif');
ax=plt.subplot(211,projection='3d');
surf = ax.plot_surface(X, Y,lambda_Estimate,cmap=plt.cm.plasma);
plt.xlabel("x"); plt.ylabel("y");
plt.title('Estimate of $\lambda(x)$');
plt.locator_params(axis='x', nbins=5);
plt.locator_params(axis='y', nbins=5);
plt.locator_params(axis='z', nbins=3);
#Plot exact expression
ax=plt.subplot(212,projection='3d');
surf = ax.plot_surface(X, Y,lambda_Exact,cmap=plt.cm.plasma);
plt.xlabel("x"); plt.ylabel("y");
plt.title('True $\lambda(x)$');
plt.locator_params(axis='x', nbins=5);
plt.locator_params(axis='y', nbins=5);
plt.locator_params(axis='z', nbins=3);
###END -- Checking locations -- END###

For the histogram section, I used the histogram and histogram2d functions respectively to estimate the distribution of the number of points and the intensity function. I used the pdf option. (The SciPy website recommends not using the the normed option.)

Number of points

np.histogram(numbPointsRetained, bins=binEdges,density=True);

Locations of points

np.histogram2d(xxArrayAll, yyArrayAll,bins=numbBins,density=True);

Tag: empirical distribution

Checking Poisson point process simulations

Basics

Number of points

Moments

Empirical distribution

Locations of points

Advanced tests

Results

MATLAB

Python

Further reading

Code

MATLAB

Python