Simulating a Poisson line process

Instead of points, we can consider other objects randomly scattered on some underlying mathematical space. If we take a Poisson point process, then we can use (infinitely long) straight lines instead of points, giving a Poisson line process. Researchers have studied and used this random object to model physical phenomena. In this post I’ll cover how to simulate a homogeneous Poisson line process in MATLAB, R and Python. The code can be downloaded from here

Overview

For simulating a Poisson line process, the key question is how to randomly position the lines.  This is related to a classic problem in probability theory called the Bertrand paradox.  I discussed this illustration in probability in a previous post, where I also included code for simulating it. I highly recommend reading that post first.

The Bertrand paradox involves three methods for positioning random lines. We use Method 2 to achieve a uniform positioning of lines, meaning the number of lines and orientation of lines is statistically uniform. Then it also makes sense to use this method for simulating a homogeneous (or uniform) Poisson line process.  

We can interpret a line process as a point process. For a line process on the plane \(\textbf{R}^2\), it can be described by a point process on \((0,\infty)\times (0,2\pi)\), which is an an infinitely long cylinder. In other words, the Poisson line process can be described as a Poisson point process.

For simulating a Poisson line process, it turns out the disk is the most natural setting. (Again, this goes back to the Bertrand paradox.) More specifically, how the (infinitely long) lines intersect a disk of a fixed radius \(r>0\). The problem of simulating a Poisson line process reduces to randomly placing chords in a disk. For other simulation windows in the plane, we can always bound any non-circular region with a sufficiently large disk.

Steps

Number of lines

Of course, with all things Poisson, the number of lines will be  a Poisson random variable, but what will its parameter be? This homogeneous (or uniform) Poisson line process forms a one-dimensional homogeneous (or uniform) Poisson point process around the edge of the disk with a circumference \(2 \pi r \). Then the number of lines is simply a Poisson variable with parameter \(\lambda 2 \pi r \).

Locations of points

To position a single line uniformly in a disk, we need to generate two uniform random variables. One random variable gives the angle describing orientation of the line, so it’s a uniform random variable \(\Theta\) on the interval \((0,2\pi)\). 

The other random variable gives the distance from the origin to the disk edge, meaning it’s a uniform random variable \(P\) on the interval \((0,r)\), where \(r\) is the radius of the disk.  The random radius and its perpendicular chord create a right-angle triangle.  The distance from the point \((\Theta, P)\) to the disk edge (that is, the circle) along the chord is:

$$Q=\sqrt{r^2-P^2}.$$

The endpoints of the chord (that is, the points on the disk edge) are then:

Point 1: \(X_1=P \cos \Theta+ Q\sin \Theta\), \(Y_1= P \sin \Theta- Q\cos \Theta\),

Point 2: \(X_2=P \cos \Theta- Q\sin \Theta\), \(Y_2= P \sin \Theta+Q \cos \Theta\).

Code

I have implemented the simulation procedure in MATLAB, R and Python, which, as usual, are all very similar. I haven’t put my code here, because the software behind my website keeps mangling it.  As always, I have uploaded my code to a repository; for this post, it’s in this directory.

I have written the code in R, but I wouldn’t use it in general. That’s because if you’re using R, then, as I have said before, I strongly recommend using the powerful spatial statistics library spatstat. For a simulating Poisson line process, there’s the function rpoisline.  

The chief author of spatstat, Adrian Baddeley, has written various lectures and books on the related topics of point processes, spatial statistics, and geometric probability. In this post, he answered why the angular coordinates have to be chosen uniformly. 

Results

MATLAB

R

Python

Further reading

To read about the Poisson line process, it’s best to start with the Bertrand problem, which is covered in many works on geometric probability and related fields. A good and recent introduction is given by Calka in (Section 1.3) of the lectures titled Stochastic Geometry: Modern Research Frontiers, which were edited by Coupier and published by Springer.  Also see, for example, problem 1.2 in Geometrical Probability by Kendall and Moran or page 44 in Geometric Probability by Solomon.  

For the Poisson line process, I highly recommend Section 7.2 in the monograph Poisson Processes by Kingman. Also see Example 8.2 in the standard textbook Stochastic Geometry and its Applications by Chiu, Stoyan, Kendall and Mecke. The Poisson line process book also appears in Exercise 5.2 in the book Stochastic Simulation – Algorithms and Analysis by Asmussen and Glynn. 

For online resources, this set of lectures by Christian Lantuéjoul
covers the Poisson line process. Wilfrid Kendall covers the Poisson line process in this talk in relation to so-called Poisson cities. 

Creating a simple feedforward neural network (without using libraries)

Deep learning refers several classes of statistical models, most often neural networks, which are deep in the sense that they have many layers between the input and output. We can think of these layers as computational stages.

(More precisely, you can think of them as very big matrices.)

These statistical models have now become interchangeable with the much loved term artificial intelligence (AI). But how do neural networks work?

Neural networks are collections of large — often extraordinarily large — matrices with values that have been carefully chosen (or found) so that running inputs through these matrices generates accurate outputs for certain problems like classifying photos or writing readable sentences. There are many types of neural networks, but to understand them, it’s best to start with a basic one, which is an approach I covered in a previous post.

Without using any of the neural network libraries in MATLAB or Python (of which there are several), I wrote a simple four-layer feedforward (neural) network, which is also called a multilayer perceptron. This is an unadorned neural network without any of the fancy bells and whistles that they now possess.

And that’s all you need to understand the general gist of deep learning.

Binary classification problem

I applied the simple neural network to the classic problem of binary classification. More specifically, I created a simple problem by divided two-dimensional (Cartesian) square into two non-overlapping regions. My neural network needs to identify in region a point lies in a square. A simple problem, but it is both fundamental and challenging.

I created three such problems with increasing difficulty. Here they are:

  1. Simple: Ten points of two types located in a square, where the classification border  between the two types is curve lined.
  2. Voronoi: A Voronoi tessellation is (deterministically) created with a small number of seeds (or Voronoi cells). A point can either be in a given Voronoi cell or not, hence there are two types of points.
  3. Rose: A simple diagram of a four-petal flower or rose is created using a polar coordinates. Located entirely inside a square, points can either be inside the rose or not, giving two types of points.

Components of the neural network model

Optimization method

I fitted or or trained my four-layer neural network model by using a simple stochastic gradient descent method. The field of optimization (or operations research) abounds with methods based on gradients (or derivatives), which usually fall under the category of convex optimziation.

In the code I wrote,  a random coordinate is chosen, and then the fitting (or optimization) method takes a step where the gradient is the largest, where the size of step is dictated by a fixed constant called a learning rate. This procedure is repeated for some pre-fixed number of steps. The random choosing of the coordinate is done with replacement.

This is basically the simplest training method you can use. In practice, you never use such simple methods, relying upon more complex gradient-based methods such as Adam.

Cost function

For training, the so-called cost function or loss function is a root-mean-square function,1Or \(L^2\) norm. which goes against the flow today, as most neural networks now employ cost functions based on the maximum likelihood and cross-entropy. There is a good reason why these are used instead of the root-mean-square. They work much better.

Activation function

For the activation function, the code uses the sigmoid function, but you can easily change it to use recitified linear unit (ReLU), which is what most neural networks use today. For decades, it commonly accepted to use the sigmoid function, partly because it’s smooth. But it seems only work well for small networks. The rectified linear unit reigns supreme now, as it is considerably better than the sigmoid function and others in large neural networks.

Code

I stress that this code is only for educational purposes. It’s designed to gain intuition into how (simple) neural networks do what they do.

If you have a problem that requires neural networks (or deep learning), then use one of the libraries such as PyTorch or TensorFlow. Do not use my code for such problems.

I generally found that the MATLAB code run much faster and smoother than the Python code. Python isn’t great for speed. (Serious libraries are not written in Python but C or, like much of the matrix libraries in Python, Fortran.)

Here’s my code.

MATLAB

% This code runs a simple feedforward neural network inspired.
%
% The neural network is a multi-layer perceptron designed for a simple
% binary classification. The classification is determining whether a
% two-dimensional (Cartesian) point is located inside some given region or
% not.
%
% For the simple binary examples used here, the code seems to work for
% sufficiently large number of optimization steps, typically in the 10s and
% 100s of thousands.
%
% This neural network only has 4 layers (so 2 hidden layers). The
% trainning procedure, meaning the forward and backward passes, are
% hard-coded. This means changing the number of layers requires additional
% information on the neural network architecture.
%
% It uses the sigmoid function; see the activate function.
%
% NOTE: This code is for illustration and educational purposes only. Use a
% industry-level library for real problems.
%
% Author: H. Paul Keeler, 2019.
% Website: hpaulkeeler.com
% Repository: github.com/hpaulkeeler/posts

clearvars; clc; close all;

choiceData=3; %choose 1, 2 or 3
numbSteps = 2e5; %number of steps for the (gradient) optimization method
rateLearn = 0.05; %hyperparameter
numbTest = 1000; %number of points for testing the fitted model
boolePlot=true;

rng(42); %set seed for reproducibility of results

%generating or retrieving the training data
switch choiceData
    case 1
        %%% simple example
        %recommended minimum number of optimization steps: numbSteps = 1e5
        x1 = 2*[0.1,0.3,0.1,0.6,0.4,0.6,0.5,0.9,0.4,0.7]-1;
        x2 = 2*[0.1,0.4,0.5,0.9,0.2,0.3,0.6,0.2,0.4,0.6]-1;
        numbTrainAll = 10; %number of training data
        %output (binary) data
        y = [ones(1,5) zeros(1,5); zeros(1,5) ones(1,5)];
    case 2
        %%% voronoi example
        %recommended minimum number of optimization steps: numbSteps = 1e5
        numbTrainAll = 1000; %number of training data
        x1 = 2*rand(1,numbTrainAll)-1;
        x2=2*rand(1,numbTrainAll)-1;
        booleInsideVoroni=checkVoronoiInside(x1,x2);
        % output (binary) data
        y=[booleInsideVoroni;~booleInsideVoroni];
    case 3
        %%% rose/flower example
        %recommended minimum number of optimization steps: numbSteps = 2e5
        numbTrainAll=1000; %number of training data
        x1=2*rand(1,numbTrainAll)-1;
        x2=2*rand(1,numbTrainAll)-1;
        booleInsideFlower=checkFlowerInside(x1,x2);
        %output (binary) data
        y=[booleInsideFlower;~booleInsideFlower];

end
x=[x1;x2];
booleTrain = logical(y);

%dimensions/widths of neural network
numbWidth1 = 2; %must be 2 for 2-D input data
numbWidth2 = 4;
numbWidth3 = 4;
numbWidth4 = 2; %must be 2 for binary output
numbWidthAll=[numbWidth1,numbWidth2,numbWidth3,numbWidth4];
numbLayer=length(numbWidthAll);

%initialize weights and biases
scaleModel = .5;
W2 = scaleModel * randn(numbWidth2,numbWidth1);
b2 = scaleModel * rand(numbWidth2,1);
W3 = scaleModel * randn(numbWidth3,numbWidth2);
b3 = scaleModel * rand(numbWidth3,1);
W4 = scaleModel * randn(numbWidth4,numbWidth3);
b4 = scaleModel * rand(numbWidth4,1);

%create arrays of matrices
%w matrices
arrayW{1}=W2;
arrayW{2}=W3;
arrayW{3}=W4;
%b vectors
array_b{1}=b2;
array_b{2}=b3;
array_b{3}=b4;
%activation matrices
arrayA=mat2cell(zeros(sum(numbWidthAll),1),numbWidthAll);
%delta (gradient) vectors
array_delta=mat2cell(zeros(sum(numbWidthAll(2:end)),1),numbWidthAll(2:end));

costHistory = zeros(numbSteps,1); %record the cost for each step
for i = 1:numbSteps
    %forward and back propagation
    indexTrain=randi(numbTrainAll);
    xTrain = x(:,indexTrain);

    %run forward pass
    A2 = activate(xTrain,W2,b2);
    A3 = activate(A2,W3,b3);
    A4 = activate(A3,W4,b4);

    %new (replaces above three lines)
    arrayA{1}=xTrain;
    for j=2:numbLayer
        arrayA{j}=activate(arrayA{j-1},arrayW{j-1},array_b{j-1});
    end

    %run backward pass (to calculate the gradient) using Hadamard products
    delta4 = A4.*(1-A4).*(A4-y(:,indexTrain));
    delta3 = A3.*(1-A3).*(transpose(W4)*delta4);
    delta2 = A2.*(1-A2).*(transpose(W3)*delta3);

    %new (replaces above three lines)
    A_temp=arrayA{numbLayer};
    array_delta{numbLayer-1}=A_temp.*(1-A_temp).*(A_temp-y(:,indexTrain));
    for j=numbLayer-1:-1:2
        A_temp=arrayA{j};
        delta_temp=array_delta{j};
        W_temp=arrayW{j};
        array_delta{j-1}= A_temp.*(1-A_temp).*(transpose(W_temp)*delta_temp);
    end

    %update using steepest descent
    %transpose of xTrains used for W matrices
    W2 = W2 - rateLearn*delta2*transpose(xTrain);
    W3 = W3 - rateLearn*delta3*transpose(A2);
    W4 = W4 - rateLearn*delta4*transpose(A3);
    b2 = b2 - rateLearn*delta2;
    b3 = b3 - rateLearn*delta3;
    b4 = b4 - rateLearn*delta4;

    %new (replaces above six lines)
    for j=1:numbLayer-1
        A_temp=arrayA{j};
        delta_temp=array_delta{j};
        arrayW{j}=arrayW{j} - rateLearn*delta_temp*transpose(A_temp);
        array_b{j}=array_b{j} - rateLearn*delta_temp;
    end

    %update cost
    costCurrent_i = getCost(x,y,arrayW,array_b,numbLayer);
    costHistory(i) = costCurrent_i;
end

%generating test data
xTest1 = 2*rand(1,numbTest)-1;
xTest2 = 2*rand(1,numbTest)-1;
xTest = [xTest1;xTest2];
booleTest = (false(2,numbTest));

%testing every point
for k=1:numbTest
    xTestSingle = xTest(:,k);
    %apply fitted model
    yTestSingle = feedForward(xTestSingle,arrayW,array_b,numbLayer);
    [~,indexTest] = max(yTestSingle);

    booleTest(1,k) = (indexTest==1);
    booleTest(2,k) = (indexTest==2);
end

if boolePlot

    %plot history of cost
    figure;
    semilogy(1:numbSteps,costHistory);
    xlabel('Number of optimization steps');
    ylabel('Value of cost function');
    title('History of the cost function')

    %plot training data
    figure;
    tiledlayout(1,2);
    nexttile;
    hold on;
    plot(x1(booleTrain(1,:)),x2(booleTrain(1,:)),'rs');
    plot(x1(booleTrain(2,:)),x2(booleTrain(2,:)),'b+');
    xlabel('x');
    ylabel('y');
    title('Training data');
    axis([-1,1,-1,1]);
    axis square;

    %plot test data

    nexttile;
    hold on;
    plot(xTest1(booleTest(1,:)),xTest2(booleTest(1,:)),'rd');
    plot(xTest1(booleTest(2,:)),xTest2(booleTest(2,:)),'bx');
    xlabel('x');
    ylabel('y');
    title('Testing data');
    axis([-1,1,-1,1]);
    axis square;

end

function costTotal = getCost(x,y,arrayW,array_b,numbLayer)
numbTrainAll = max(size(x));
%originally y, x1 and x2 were defined outside this function
costAll = zeros(numbTrainAll,1);

for i = 1:numbTrainAll
    %loop through every training point
    x_i = x(:,i);

    A_Last=feedForward(x_i,arrayW,array_b,numbLayer);
    %find difference between algorithm output and training data
    costAll(i) = norm(y(:,i) - A_Last,2);
end
costTotal = norm(costAll,2)^2;
end

function y = activate(x,W,b)
z=(W*x+b);
%evaluate sigmoid (ie logistic) function
y = 1./(1+exp(-z));

%evaluate ReLU (rectified linear unit)
%y = max(0,z);
end

function A_Last= feedForward(xSingle,arrayW,array_b,numbLayer)
%run forward pass
A_now=xSingle;
for j=2:numbLayer
    A_next=activate(A_now,arrayW{j-1},array_b{j-1});
    A_now=A_next;
end


A_Last=A_next;
end

function booleFlowerInside=checkFlowerInside(x1,x2)
% This function creates a simple binary classification problem.
% A single point (x1,x2) is either inside a flower petal or not, where the
% flower (or rose) is defined in polar coordinates by the equation:
%
% rho(theta) =cos(k*theta)), where k is a positive integer.

orderFlower = 2;
[thetaFlower, rhoFlower] = cart2pol(x1(:)',x2(:)');
% check if point (x1,x2) is inside the rose
booleFlowerInside = abs(cos(orderFlower*thetaFlower)) <= rhoFlower;

end

function booleVoronoiInside=checkVoronoiInside(x1,x2)
% This function creates a simple binary classification problem.
% A single point (x1,x2) is either inside a chosen Voronoi cell or not,
% where the Voronoi cells are defined deterministically below.

numbPoints = length(x1);
% generat some deterministic seeds for the Voronoi cells
numbSeed = 6;
indexCellChosen = 4; %arbitrary chosen cells from first up to this number

t = 1:numbSeed;
% a deterministic points located on the square [-1,+1]x[-1,+1]
xSeed1 = sin(27*t.*pi.*cos(t.^2*pi+.4)+1.4);
xSeed2 = sin(4.7*t.*pi.*sin(t.^3*pi+.7)+.3);
xSeed = [xSeed1;xSeed2];

% find which Voroinoi cells each point (x1,x2) belongs to
indexVoronoi = zeros(1,numbPoints);
for i = 1:numbPoints
    x_i = [x1(i);x2(i)]; %retrieve single point
    diff_x = xSeed-x_i;
    distSquared_x = diff_x(1,:).^2+diff_x(2,:).^2; %relative distance squared
    [~,indexVoronoiTemp] = min(distSquared_x); %find closest seed
    indexVoronoi(i) = indexVoronoiTemp;
end

% see if points are inside the Voronoi cell number indexCellSingle
booleVoronoiInside = (indexVoronoi==indexCellChosen);
end

Python

The use of matrices in Python are kindly described as an afterthought. So it takes a bit of Python sleight of hand to get MATLAB code translated into working Python code.

(This stems partly from the fact that Python is row-major, as the language it’s built upon, C, whereas MATLAB is column-major, as its inspiration, Fortran.)

# This code runs a simple feedforward neural network.
#
# The neural network is a multi-layer perceptron designed for a simple
# binary classification. The classification is determining whether a
# two-dimensional (Cartesian) point is located inside some given region or
# not.
#
# For the simple binary examples used here, the code seems to work for
# sufficiently large number of optimization steps, typically in the 10s and
# 100s of thousands.
#
# This neural network only has 4 layers (so 2 hidden layers). The
# trainning procedure, meaning the forward and backward passes, are
# hard-coded. This means changing the number of layers requires additional
# information on the neural network architecture.
#
# It uses the sigmoid function; see the activate function.
#
# NOTE: This code is for illustration and educational purposes only. Use a
# industry-level library for real problems.
#
# For more details, see the post:
#
# https://hpaulkeeler.com/creating-a-simple-feedforward-neural-network-without-libraries/
#
# Author: H. Paul Keeler, 2019.
# Website: hpaulkeeler.com
# Repository: github.com/hpaulkeeler/posts

import numpy as np;  # NumPy package for arrays, random number generation, etc
import matplotlib.pyplot as plt  # For plotting
#from FunctionsExamples import checkFlowerInside, checkVoronoiInside

plt.close('all');  # close all figures

choiceData=1; #choose 1, 2 or 3

numbSteps = 1e5; #number of steps for the (gradient) optimization method
rateLearn = 0.05; #hyperparameter
numbTest = 1000; #number of points for testing the fitted model
boolePlot = True;

np.random.seed(42); #set seed for reproducibility of results

#helper functions for generating problems
def checkFlowerInside(x1,x2):
    # This function creates a simple binary classification problem.
    # A single point (x1,x2) is either inside a flower petal or not, where the
    # flower (or rose) is defined in polar coordinates by the equation:
    #
    # rho(theta) =cos(k*theta)), where k is a positive integer.

    orderFlower = 2;
    thetaFlower = np.arctan2(x2,x1);
    rhoFlower = np.sqrt(x1**2+x2**2);
    # check if point (x1,x2) is inside the rose
    booleFlowerInside = np.abs(np.cos(orderFlower*thetaFlower)) <= rhoFlower;
    return booleFlowerInside

def checkVoronoiInside(x1,x2):
    # This function creates a simple binary classification problem.
    # A single point (x1,x2) is either inside a chosen Voronoi cell or not,
    # where the Voronoi cells are defined deterministically below.
    numbPoints = len(x1);
    # generat some deterministic seeds for the Voronoi cells
    numbSeed = 6;
    indexCellChosen = 4; #arbitrary chosen cells from first up to this number

    t = np.arange(numbSeed)+1;
    # a deterministic points located on the square [-1,+1]x[-1,+1]
    xSeed1 = np.sin(27*t*np.pi*np.cos(t**2*np.pi+.4)+1.4);
    xSeed2 = np.sin(4.7*t*np.pi*np.sin(t**3*np.pi+.7)+.3);
    xSeed = np.stack((xSeed1,xSeed2),axis=0);

    # find which Voroinoi cells each point (x1,x2) belongs to
    indexVoronoi = np.zeros(numbPoints);
    for i in range(numbPoints):
        x_i = np.stack((x1[i],x2[i]),axis=0).reshape(-1, 1); #retrieve single point
        diff_x = xSeed-x_i;
        distSquared_x = diff_x[0,:]**2+diff_x[1,:]**2; #relative distance squared
        indexVoronoiTemp = np.argmin(distSquared_x); #find closest seed
        indexVoronoi[i] = indexVoronoiTemp;

    # see if points are inside the Voronoi cell number indexCellSingle
    booleVoronoiInside = (indexVoronoi==indexCellChosen);
    return booleVoronoiInside


#generating or retrieving the training data
if (choiceData==1):
    ### simple example
    #recommended minimum number of optimization steps: numbSteps = 1e5
    x1 = 2*np.array([0.1,0.3,0.1,0.6,0.4,0.6,0.5,0.9,0.4,0.7])-1;
    x2 = 2*np.array([0.1,0.4,0.5,0.9,0.2,0.3,0.6,0.2,0.4,0.6])-1;
    numbTrainAll = 10; #number of training data
    #output (binary) data
    y = np.array([[1,1,1,1,1,0,0,0,0,0],[0,0,0,0,0,1,1,1,1,1]]);

    x=np.stack((x1,x2),axis=0);

if (choiceData==2):
    ### voronoi example
    #recommended minimum number of optimization steps: numbSteps = 1e5
    numbTrainAll = 500; #number of training data
    x=2*np.random.uniform(0, 1,(2,numbTrainAll))-1;
    x1 = x[0,:];
    x2 = x[1,:];
    booleInsideVoroni=checkVoronoiInside(x1,x2);
    # output (binary) data
    y=np.stack((booleInsideVoroni,~booleInsideVoroni),axis=0);

if (choiceData==3):
    ### rose/flower example
    #recommended minimum number of optimization steps: numbSteps = 2e5
    numbTrainAll=100; #number of training data
    x=2*np.random.uniform(0, 1,(2,numbTrainAll))-1;
    x1 = x[0,:];
    x2 = x[1,:];
    booleInsideFlower=checkFlowerInside(x1,x2);
    #output (binary) data
    y=np.stack((booleInsideFlower,~booleInsideFlower),axis=0);

#helper functions for training and running the neural network
def activate(x,W,b):
    z=np.matmul(W,x)+b;
    #evaluate sigmoid (ie logistic) function
    y = 1/(1+np.exp(-z));

    #evaluate ReLU (rectified linear unit)
    #y = np.max(0,z);
    return y

def feedForward(xSingle,arrayW,array_b,numbLayer):
    #run forward pass
    A_now=xSingle;
    for j in np.arange(1,numbLayer):
        A_next=activate(A_now,arrayW[j-1],array_b[j-1]);
        A_now=A_next;
    A_Last=A_next;

    return A_Last;


def getCost(x,y,arrayW,array_b,numbLayer):
    numbTrainAll = np.max(x.shape);
    #originally y, x1 and x2 were defined outside this function
    costAll = np.zeros((numbTrainAll,1));

    for i in range(numbTrainAll):
        #loop through every training point
        x_i = x[:,i].reshape(-1, 1);

        A_Last=feedForward(x_i,arrayW,array_b,numbLayer);
        #find difference between algorithm output and training data
        costAll[i] = np.linalg.norm(y[:,i].reshape(-1, 1) - A_Last,2);

        costTotal = np.linalg.norm(costAll,2)**2;

    return costTotal

#dimensions/widths of neural network
numbWidth1 = 2; #must be 2 for 2-D input data
numbWidth2 = 4;
numbWidth3 = 4;
numbWidth4 = 2; #must be 2 for binary output
numbWidthAll=np.array([numbWidth1,numbWidth2,numbWidth3,numbWidth4]);
numbLayer=len(numbWidthAll);

#initialize weights and biases
scaleModel = .5;
W2 = scaleModel * np.random.normal(0, 1,(numbWidth2,numbWidth1));
b2 = scaleModel * np.random.uniform(0, 1,(numbWidth2,1));
W3 = scaleModel * np.random.normal(0, 1,(numbWidth3,numbWidth2));
b3 = scaleModel * np.random.uniform(0, 1,(numbWidth3,1));
W4 = scaleModel * np.random.normal(0, 1,(numbWidth4,numbWidth3));
b4 = scaleModel * np.random.uniform(0, 1,(numbWidth4,1));

#create lists of matrices
#w matrices
arrayW=[];
arrayW.append(W2);
arrayW.append(W3);
arrayW.append(W4);
#b vectors
array_b=[];
array_b.append(b2);
array_b.append(b3);
array_b.append(b4);
#activation matrices
arrayA=[];
for j in range(numbLayer):
    arrayA.append(np.zeros(numbWidthAll[j]).reshape(-1, 1).T);
#delta (gradient) vectors
array_delta=[];
for j in range(numbLayer-1):
    array_delta.append(np.zeros(numbWidthAll[j+1]).reshape(-1, 1).T);

numbSteps=int(numbSteps)
costHistory = np.zeros((numbSteps,1));
booleTrain = y;
for i in range(numbSteps):
    #forward and back propagation
    indexTrain=np.random.randint(numbTrainAll);
    xTrain = x[:,indexTrain].reshape(-1, 1);

    #run forward pass
    arrayA[0]=xTrain; #treat input as the A1 matrix
    for j in (range(numbLayer-1)):
        arrayA[j+1]=activate(arrayA[j],arrayW[j],array_b[j]);

    #run backward pass (to calculate the gradient) using Hadamard products
    A_temp=arrayA[-1];
    array_delta[-1]=A_temp*(1-A_temp)*(A_temp-y[:,indexTrain].reshape(-1, 1));
    for j in np.arange(numbLayer-2,0,-1):
        A_temp=arrayA[j];
        delta_temp=array_delta[j];
        W_temp=arrayW[j];
        array_delta[j-1]= A_temp*(1-A_temp)*np.matmul(np.transpose(W_temp),delta_temp);


    #update using steepest descent
    #transpose of xTrains used for W matrices
    for j in range(numbLayer-1):
        A_temp=arrayA[j];
        delta_temp=array_delta[j];
        arrayW[j]=arrayW[j] - rateLearn*np.matmul(delta_temp,np.transpose(A_temp));
        array_b[j]=array_b[j] - rateLearn*delta_temp;


    #update cost
    costCurrent_i = getCost(x,y,arrayW,array_b,numbLayer);
    costHistory[i] = costCurrent_i;

#generating test data
xTest = 2*np.random.uniform(0,1,(2,numbTest))-1;
xTest1 = xTest[0,:];
xTest2 = xTest[1,:];

booleTest = np.zeros((2,numbTest));

#testing every point
for k in range(numbTest):
    xTestSingle = xTest[:,k].reshape(-1, 1);
    #apply fitted model
    yTestSingle = feedForward(xTestSingle,arrayW,array_b,numbLayer);
    indexTest = np.argmax(yTestSingle, axis=0);
    booleTest[0,k] = ((indexTest==0)[0]);
    booleTest[1,k] = ((indexTest==1)[0]);

if boolePlot:
    #plot history of cost
    plt.figure();
    plt.semilogy(np.arange(numbSteps)+1,costHistory);
    plt.xlabel('Number of optimization steps');
    plt.ylabel('Value of cost function');
    plt.title('History of the cost function')

    #plot training data
    fig, ax = plt.subplots(1, 2);
    ax[0].plot(x1[np.where(booleTrain[0,:])],x2[np.where(booleTrain[0,:])],\
        'rs', markerfacecolor='none');
    ax[0].plot(x1[np.where(booleTrain[1,:])],x2[np.where(booleTrain[1,:])],\
        'b+');
    ax[0].set_xlabel('x');
    ax[0].set_ylabel('y');
    ax[0].set_title('Training data');
    ax[0].set_xlim([-1, 1])
    ax[0].set_ylim([-1, 1])
    ax[0].axis('equal');

    #plot test data
    ax[1].plot(xTest1[np.where(booleTest[0,:])],xTest2[np.where(booleTest[0,:])],\
        'rd', markerfacecolor='none');
    ax[1].plot(xTest1[np.where(booleTest[1,:])],xTest2[np.where(booleTest[1,:])],\
        'bx');
    ax[1].set_xlabel('x');
    ax[1].set_ylabel('y');
    ax[1].set_title('Testing data');
    ax[1].set_xlim([-1, 1])
    ax[1].set_ylim([-1, 1])
    ax[1].axis('equal');

Results

After fitting or training the models, I tested them by applying the fitted neural network to other data. (In machine learning and statistics, you famously don’t use the same data to train and test a model.)

The number of steps needed to obtain reasonable results was about 100 thousand. To get a feeling of this, look at the plot of the cost function value over the number of optimization steps. For more complicated problems, such as the flower or rose problem, even more steps were needed.

I found the Python code, which is basically a close literal translation (by me) of the MATLAB code, too slow, so I generated the results using MATLAB.

Note: These results were generated with stochastic methods (namely, the training method), so my results will not agree exactly with yours, but hopefully they’ll be close enough.

Simple

Voronoi

Rose

The rose problem was the most difficult to classify. Sometimes it was a complete fail. Sometimes the training method could only see one component of the rose. It worked a couple of times, namely with 20 thousand optimization steps and a random seed of 42.

I imagine this problem is difficult due to the obvious complex geometry. Specifically, the both ends of each petal become thin, which is where the testing breaks down.

Fail

Half-success

Success (mostly)

Further reading

There is just so many things to read on neural networks, particularly on the web. This post was inspired by this tutorial paper:

  • 2019 – Higham and Higham, Deep Learning: An Introduction for Applied Mathematicians.

I covered this paper in a previous post.

For a readable book, I recommend this book:

  • Goodfellow, Bengio, Courville – Deep Learning – MIT Press.