steepest descent Archives – H. Paul Keeler

Deep learning refers several classes of statistical models, most often neural networks, which are deep in the sense that they have many layers between the input and output. We can think of these layers as computational stages.

(More precisely, you can think of them as very big matrices.)

These statistical models have now become interchangeable with the much loved term artificial intelligence (AI). But how do neural networks work?

Neural networks are collections of large — often extraordinarily large — matrices with values that have been carefully chosen (or found) so that running inputs through these matrices generates accurate outputs for certain problems like classifying photos or writing readable sentences. There are many types of neural networks, but to understand them, it’s best to start with a basic one, which is an approach I covered in a previous post.

Without using any of the neural network libraries in MATLAB or Python (of which there are several), I wrote a simple four-layer feedforward (neural) network, which is also called a multilayer perceptron. This is an unadorned neural network without any of the fancy bells and whistles that they now possess.

And that’s all you need to understand the general gist of deep learning.

Binary classification problem

I applied the simple neural network to the classic problem of binary classification. More specifically, I created a simple problem by divided two-dimensional (Cartesian) square into two non-overlapping regions. My neural network needs to identify in region a point lies in a square. A simple problem, but it is both fundamental and challenging.

I created three such problems with increasing difficulty. Here they are:

Simple: Ten points of two types located in a square, where the classification border between the two types is curve lined.
Voronoi: A Voronoi tessellation is (deterministically) created with a small number of seeds (or Voronoi cells). A point can either be in a given Voronoi cell or not, hence there are two types of points.
Rose: A simple diagram of a four-petal flower or rose is created using a polar coordinates. Located entirely inside a square, points can either be inside the rose or not, giving two types of points.

Components of the neural network model

Optimization method

I fitted or or trained my four-layer neural network model by using a simple stochastic gradient descent method. The field of optimization (or operations research) abounds with methods based on gradients (or derivatives), which usually fall under the category of convex optimziation.

In the code I wrote, a random coordinate is chosen, and then the fitting (or optimization) method takes a step where the gradient is the largest, where the size of step is dictated by a fixed constant called a learning rate. This procedure is repeated for some pre-fixed number of steps. The random choosing of the coordinate is done with replacement.

This is basically the simplest training method you can use. In practice, you never use such simple methods, relying upon more complex gradient-based methods such as Adam.

Cost function

For training, the so-called cost function or loss function is a root-mean-square function,¹ which goes against the flow today, as most neural networks now employ cost functions based on the maximum likelihood and cross-entropy. There is a good reason why these are used instead of the root-mean-square. They work much better.

Activation function

For the activation function, the code uses the sigmoid function, but you can easily change it to use recitified linear unit (ReLU), which is what most neural networks use today. For decades, it commonly accepted to use the sigmoid function, partly because it’s smooth. But it seems only work well for small networks. The rectified linear unit reigns supreme now, as it is considerably better than the sigmoid function and others in large neural networks.

Code

I stress that this code is only for educational purposes. It’s designed to gain intuition into how (simple) neural networks do what they do.

If you have a problem that requires neural networks (or deep learning), then use one of the libraries such as PyTorch or TensorFlow. Do not use my code for such problems.

I generally found that the MATLAB code run much faster and smoother than the Python code. Python isn’t great for speed. (Serious libraries are not written in Python but C or, like much of the matrix libraries in Python, Fortran.)

Here’s my code.

MATLAB

% This code runs a simple feedforward neural network inspired.
%
% The neural network is a multi-layer perceptron designed for a simple
% binary classification. The classification is determining whether a
% two-dimensional (Cartesian) point is located inside some given region or
% not.
%
% For the simple binary examples used here, the code seems to work for
% sufficiently large number of optimization steps, typically in the 10s and
% 100s of thousands.
%
% This neural network only has 4 layers (so 2 hidden layers). The
% trainning procedure, meaning the forward and backward passes, are
% hard-coded. This means changing the number of layers requires additional
% information on the neural network architecture.
%
% It uses the sigmoid function; see the activate function.
%
% NOTE: This code is for illustration and educational purposes only. Use a
% industry-level library for real problems.
%
% Author: H. Paul Keeler, 2019.
% Website: hpaulkeeler.com
% Repository: github.com/hpaulkeeler/posts

clearvars; clc; close all;

choiceData=3; %choose 1, 2 or 3
numbSteps = 2e5; %number of steps for the (gradient) optimization method
rateLearn = 0.05; %hyperparameter
numbTest = 1000; %number of points for testing the fitted model
boolePlot=true;

rng(42); %set seed for reproducibility of results

%generating or retrieving the training data
switch choiceData
    case 1
        %%% simple example
        %recommended minimum number of optimization steps: numbSteps = 1e5
        x1 = 2*[0.1,0.3,0.1,0.6,0.4,0.6,0.5,0.9,0.4,0.7]-1;
        x2 = 2*[0.1,0.4,0.5,0.9,0.2,0.3,0.6,0.2,0.4,0.6]-1;
        numbTrainAll = 10; %number of training data
        %output (binary) data
        y = [ones(1,5) zeros(1,5); zeros(1,5) ones(1,5)];
    case 2
        %%% voronoi example
        %recommended minimum number of optimization steps: numbSteps = 1e5
        numbTrainAll = 1000; %number of training data
        x1 = 2*rand(1,numbTrainAll)-1;
        x2=2*rand(1,numbTrainAll)-1;
        booleInsideVoroni=checkVoronoiInside(x1,x2);
        % output (binary) data
        y=[booleInsideVoroni;~booleInsideVoroni];
    case 3
        %%% rose/flower example
        %recommended minimum number of optimization steps: numbSteps = 2e5
        numbTrainAll=1000; %number of training data
        x1=2*rand(1,numbTrainAll)-1;
        x2=2*rand(1,numbTrainAll)-1;
        booleInsideFlower=checkFlowerInside(x1,x2);
        %output (binary) data
        y=[booleInsideFlower;~booleInsideFlower];

end
x=[x1;x2];
booleTrain = logical(y);

%dimensions/widths of neural network
numbWidth1 = 2; %must be 2 for 2-D input data
numbWidth2 = 4;
numbWidth3 = 4;
numbWidth4 = 2; %must be 2 for binary output
numbWidthAll=[numbWidth1,numbWidth2,numbWidth3,numbWidth4];
numbLayer=length(numbWidthAll);

%initialize weights and biases
scaleModel = .5;
W2 = scaleModel * randn(numbWidth2,numbWidth1);
b2 = scaleModel * rand(numbWidth2,1);
W3 = scaleModel * randn(numbWidth3,numbWidth2);
b3 = scaleModel * rand(numbWidth3,1);
W4 = scaleModel * randn(numbWidth4,numbWidth3);
b4 = scaleModel * rand(numbWidth4,1);

%create arrays of matrices
%w matrices
arrayW{1}=W2;
arrayW{2}=W3;
arrayW{3}=W4;
%b vectors
array_b{1}=b2;
array_b{2}=b3;
array_b{3}=b4;
%activation matrices
arrayA=mat2cell(zeros(sum(numbWidthAll),1),numbWidthAll);
%delta (gradient) vectors
array_delta=mat2cell(zeros(sum(numbWidthAll(2:end)),1),numbWidthAll(2:end));

costHistory = zeros(numbSteps,1); %record the cost for each step
for i = 1:numbSteps
    %forward and back propagation
    indexTrain=randi(numbTrainAll);
    xTrain = x(:,indexTrain);

    %run forward pass
    A2 = activate(xTrain,W2,b2);
    A3 = activate(A2,W3,b3);
    A4 = activate(A3,W4,b4);

    %new (replaces above three lines)
    arrayA{1}=xTrain;
    for j=2:numbLayer
        arrayA{j}=activate(arrayA{j-1},arrayW{j-1},array_b{j-1});
    end

    %run backward pass (to calculate the gradient) using Hadamard products
    delta4 = A4.*(1-A4).*(A4-y(:,indexTrain));
    delta3 = A3.*(1-A3).*(transpose(W4)*delta4);
    delta2 = A2.*(1-A2).*(transpose(W3)*delta3);

    %new (replaces above three lines)
    A_temp=arrayA{numbLayer};
    array_delta{numbLayer-1}=A_temp.*(1-A_temp).*(A_temp-y(:,indexTrain));
    for j=numbLayer-1:-1:2
        A_temp=arrayA{j};
        delta_temp=array_delta{j};
        W_temp=arrayW{j};
        array_delta{j-1}= A_temp.*(1-A_temp).*(transpose(W_temp)*delta_temp);
    end

    %update using steepest descent
    %transpose of xTrains used for W matrices
    W2 = W2 - rateLearn*delta2*transpose(xTrain);
    W3 = W3 - rateLearn*delta3*transpose(A2);
    W4 = W4 - rateLearn*delta4*transpose(A3);
    b2 = b2 - rateLearn*delta2;
    b3 = b3 - rateLearn*delta3;
    b4 = b4 - rateLearn*delta4;

    %new (replaces above six lines)
    for j=1:numbLayer-1
        A_temp=arrayA{j};
        delta_temp=array_delta{j};
        arrayW{j}=arrayW{j} - rateLearn*delta_temp*transpose(A_temp);
        array_b{j}=array_b{j} - rateLearn*delta_temp;
    end

    %update cost
    costCurrent_i = getCost(x,y,arrayW,array_b,numbLayer);
    costHistory(i) = costCurrent_i;
end

%generating test data
xTest1 = 2*rand(1,numbTest)-1;
xTest2 = 2*rand(1,numbTest)-1;
xTest = [xTest1;xTest2];
booleTest = (false(2,numbTest));

%testing every point
for k=1:numbTest
    xTestSingle = xTest(:,k);
    %apply fitted model
    yTestSingle = feedForward(xTestSingle,arrayW,array_b,numbLayer);
    [~,indexTest] = max(yTestSingle);

    booleTest(1,k) = (indexTest==1);
    booleTest(2,k) = (indexTest==2);
end

if boolePlot

    %plot history of cost
    figure;
    semilogy(1:numbSteps,costHistory);
    xlabel('Number of optimization steps');
    ylabel('Value of cost function');
    title('History of the cost function')

    %plot training data
    figure;
    tiledlayout(1,2);
    nexttile;
    hold on;
    plot(x1(booleTrain(1,:)),x2(booleTrain(1,:)),'rs');
    plot(x1(booleTrain(2,:)),x2(booleTrain(2,:)),'b+');
    xlabel('x');
    ylabel('y');
    title('Training data');
    axis([-1,1,-1,1]);
    axis square;

    %plot test data

    nexttile;
    hold on;
    plot(xTest1(booleTest(1,:)),xTest2(booleTest(1,:)),'rd');
    plot(xTest1(booleTest(2,:)),xTest2(booleTest(2,:)),'bx');
    xlabel('x');
    ylabel('y');
    title('Testing data');
    axis([-1,1,-1,1]);
    axis square;

end

function costTotal = getCost(x,y,arrayW,array_b,numbLayer)
numbTrainAll = max(size(x));
%originally y, x1 and x2 were defined outside this function
costAll = zeros(numbTrainAll,1);

for i = 1:numbTrainAll
    %loop through every training point
    x_i = x(:,i);

    A_Last=feedForward(x_i,arrayW,array_b,numbLayer);
    %find difference between algorithm output and training data
    costAll(i) = norm(y(:,i) - A_Last,2);
end
costTotal = norm(costAll,2)^2;
end

function y = activate(x,W,b)
z=(W*x+b);
%evaluate sigmoid (ie logistic) function
y = 1./(1+exp(-z));

%evaluate ReLU (rectified linear unit)
%y = max(0,z);
end

function A_Last= feedForward(xSingle,arrayW,array_b,numbLayer)
%run forward pass
A_now=xSingle;
for j=2:numbLayer
    A_next=activate(A_now,arrayW{j-1},array_b{j-1});
    A_now=A_next;
end


A_Last=A_next;
end

function booleFlowerInside=checkFlowerInside(x1,x2)
% This function creates a simple binary classification problem.
% A single point (x1,x2) is either inside a flower petal or not, where the
% flower (or rose) is defined in polar coordinates by the equation:
%
% rho(theta) =cos(k*theta)), where k is a positive integer.

orderFlower = 2;
[thetaFlower, rhoFlower] = cart2pol(x1(:)',x2(:)');
% check if point (x1,x2) is inside the rose
booleFlowerInside = abs(cos(orderFlower*thetaFlower)) <= rhoFlower;

end

function booleVoronoiInside=checkVoronoiInside(x1,x2)
% This function creates a simple binary classification problem.
% A single point (x1,x2) is either inside a chosen Voronoi cell or not,
% where the Voronoi cells are defined deterministically below.

numbPoints = length(x1);
% generat some deterministic seeds for the Voronoi cells
numbSeed = 6;
indexCellChosen = 4; %arbitrary chosen cells from first up to this number

t = 1:numbSeed;
% a deterministic points located on the square [-1,+1]x[-1,+1]
xSeed1 = sin(27*t.*pi.*cos(t.^2*pi+.4)+1.4);
xSeed2 = sin(4.7*t.*pi.*sin(t.^3*pi+.7)+.3);
xSeed = [xSeed1;xSeed2];

% find which Voroinoi cells each point (x1,x2) belongs to
indexVoronoi = zeros(1,numbPoints);
for i = 1:numbPoints
    x_i = [x1(i);x2(i)]; %retrieve single point
    diff_x = xSeed-x_i;
    distSquared_x = diff_x(1,:).^2+diff_x(2,:).^2; %relative distance squared
    [~,indexVoronoiTemp] = min(distSquared_x); %find closest seed
    indexVoronoi(i) = indexVoronoiTemp;
end

% see if points are inside the Voronoi cell number indexCellSingle
booleVoronoiInside = (indexVoronoi==indexCellChosen);
end

Python

The use of matrices in Python are kindly described as an afterthought. So it takes a bit of Python sleight of hand to get MATLAB code translated into working Python code.

(This stems partly from the fact that Python is row-major, as the language it’s built upon, C, whereas MATLAB is column-major, as its inspiration, Fortran.)

# This code runs a simple feedforward neural network.
#
# The neural network is a multi-layer perceptron designed for a simple
# binary classification. The classification is determining whether a
# two-dimensional (Cartesian) point is located inside some given region or
# not.
#
# For the simple binary examples used here, the code seems to work for
# sufficiently large number of optimization steps, typically in the 10s and
# 100s of thousands.
#
# This neural network only has 4 layers (so 2 hidden layers). The
# trainning procedure, meaning the forward and backward passes, are
# hard-coded. This means changing the number of layers requires additional
# information on the neural network architecture.
#
# It uses the sigmoid function; see the activate function.
#
# NOTE: This code is for illustration and educational purposes only. Use a
# industry-level library for real problems.
#
# For more details, see the post:
#
# https://hpaulkeeler.com/creating-a-simple-feedforward-neural-network-without-libraries/
#
# Author: H. Paul Keeler, 2019.
# Website: hpaulkeeler.com
# Repository: github.com/hpaulkeeler/posts

import numpy as np;  # NumPy package for arrays, random number generation, etc
import matplotlib.pyplot as plt  # For plotting
#from FunctionsExamples import checkFlowerInside, checkVoronoiInside

plt.close('all');  # close all figures

choiceData=1; #choose 1, 2 or 3

numbSteps = 1e5; #number of steps for the (gradient) optimization method
rateLearn = 0.05; #hyperparameter
numbTest = 1000; #number of points for testing the fitted model
boolePlot = True;

np.random.seed(42); #set seed for reproducibility of results

#helper functions for generating problems
def checkFlowerInside(x1,x2):
    # This function creates a simple binary classification problem.
    # A single point (x1,x2) is either inside a flower petal or not, where the
    # flower (or rose) is defined in polar coordinates by the equation:
    #
    # rho(theta) =cos(k*theta)), where k is a positive integer.

    orderFlower = 2;
    thetaFlower = np.arctan2(x2,x1);
    rhoFlower = np.sqrt(x1**2+x2**2);
    # check if point (x1,x2) is inside the rose
    booleFlowerInside = np.abs(np.cos(orderFlower*thetaFlower)) <= rhoFlower;
    return booleFlowerInside

def checkVoronoiInside(x1,x2):
    # This function creates a simple binary classification problem.
    # A single point (x1,x2) is either inside a chosen Voronoi cell or not,
    # where the Voronoi cells are defined deterministically below.
    numbPoints = len(x1);
    # generat some deterministic seeds for the Voronoi cells
    numbSeed = 6;
    indexCellChosen = 4; #arbitrary chosen cells from first up to this number

    t = np.arange(numbSeed)+1;
    # a deterministic points located on the square [-1,+1]x[-1,+1]
    xSeed1 = np.sin(27*t*np.pi*np.cos(t**2*np.pi+.4)+1.4);
    xSeed2 = np.sin(4.7*t*np.pi*np.sin(t**3*np.pi+.7)+.3);
    xSeed = np.stack((xSeed1,xSeed2),axis=0);

    # find which Voroinoi cells each point (x1,x2) belongs to
    indexVoronoi = np.zeros(numbPoints);
    for i in range(numbPoints):
        x_i = np.stack((x1[i],x2[i]),axis=0).reshape(-1, 1); #retrieve single point
        diff_x = xSeed-x_i;
        distSquared_x = diff_x[0,:]**2+diff_x[1,:]**2; #relative distance squared
        indexVoronoiTemp = np.argmin(distSquared_x); #find closest seed
        indexVoronoi[i] = indexVoronoiTemp;

    # see if points are inside the Voronoi cell number indexCellSingle
    booleVoronoiInside = (indexVoronoi==indexCellChosen);
    return booleVoronoiInside


#generating or retrieving the training data
if (choiceData==1):
    ### simple example
    #recommended minimum number of optimization steps: numbSteps = 1e5
    x1 = 2*np.array([0.1,0.3,0.1,0.6,0.4,0.6,0.5,0.9,0.4,0.7])-1;
    x2 = 2*np.array([0.1,0.4,0.5,0.9,0.2,0.3,0.6,0.2,0.4,0.6])-1;
    numbTrainAll = 10; #number of training data
    #output (binary) data
    y = np.array([[1,1,1,1,1,0,0,0,0,0],[0,0,0,0,0,1,1,1,1,1]]);

    x=np.stack((x1,x2),axis=0);

if (choiceData==2):
    ### voronoi example
    #recommended minimum number of optimization steps: numbSteps = 1e5
    numbTrainAll = 500; #number of training data
    x=2*np.random.uniform(0, 1,(2,numbTrainAll))-1;
    x1 = x[0,:];
    x2 = x[1,:];
    booleInsideVoroni=checkVoronoiInside(x1,x2);
    # output (binary) data
    y=np.stack((booleInsideVoroni,~booleInsideVoroni),axis=0);

if (choiceData==3):
    ### rose/flower example
    #recommended minimum number of optimization steps: numbSteps = 2e5
    numbTrainAll=100; #number of training data
    x=2*np.random.uniform(0, 1,(2,numbTrainAll))-1;
    x1 = x[0,:];
    x2 = x[1,:];
    booleInsideFlower=checkFlowerInside(x1,x2);
    #output (binary) data
    y=np.stack((booleInsideFlower,~booleInsideFlower),axis=0);

#helper functions for training and running the neural network
def activate(x,W,b):
    z=np.matmul(W,x)+b;
    #evaluate sigmoid (ie logistic) function
    y = 1/(1+np.exp(-z));

    #evaluate ReLU (rectified linear unit)
    #y = np.max(0,z);
    return y

def feedForward(xSingle,arrayW,array_b,numbLayer):
    #run forward pass
    A_now=xSingle;
    for j in np.arange(1,numbLayer):
        A_next=activate(A_now,arrayW[j-1],array_b[j-1]);
        A_now=A_next;
    A_Last=A_next;

    return A_Last;


def getCost(x,y,arrayW,array_b,numbLayer):
    numbTrainAll = np.max(x.shape);
    #originally y, x1 and x2 were defined outside this function
    costAll = np.zeros((numbTrainAll,1));

    for i in range(numbTrainAll):
        #loop through every training point
        x_i = x[:,i].reshape(-1, 1);

        A_Last=feedForward(x_i,arrayW,array_b,numbLayer);
        #find difference between algorithm output and training data
        costAll[i] = np.linalg.norm(y[:,i].reshape(-1, 1) - A_Last,2);

        costTotal = np.linalg.norm(costAll,2)**2;

    return costTotal

#dimensions/widths of neural network
numbWidth1 = 2; #must be 2 for 2-D input data
numbWidth2 = 4;
numbWidth3 = 4;
numbWidth4 = 2; #must be 2 for binary output
numbWidthAll=np.array([numbWidth1,numbWidth2,numbWidth3,numbWidth4]);
numbLayer=len(numbWidthAll);

#initialize weights and biases
scaleModel = .5;
W2 = scaleModel * np.random.normal(0, 1,(numbWidth2,numbWidth1));
b2 = scaleModel * np.random.uniform(0, 1,(numbWidth2,1));
W3 = scaleModel * np.random.normal(0, 1,(numbWidth3,numbWidth2));
b3 = scaleModel * np.random.uniform(0, 1,(numbWidth3,1));
W4 = scaleModel * np.random.normal(0, 1,(numbWidth4,numbWidth3));
b4 = scaleModel * np.random.uniform(0, 1,(numbWidth4,1));

#create lists of matrices
#w matrices
arrayW=[];
arrayW.append(W2);
arrayW.append(W3);
arrayW.append(W4);
#b vectors
array_b=[];
array_b.append(b2);
array_b.append(b3);
array_b.append(b4);
#activation matrices
arrayA=[];
for j in range(numbLayer):
    arrayA.append(np.zeros(numbWidthAll[j]).reshape(-1, 1).T);
#delta (gradient) vectors
array_delta=[];
for j in range(numbLayer-1):
    array_delta.append(np.zeros(numbWidthAll[j+1]).reshape(-1, 1).T);

numbSteps=int(numbSteps)
costHistory = np.zeros((numbSteps,1));
booleTrain = y;
for i in range(numbSteps):
    #forward and back propagation
    indexTrain=np.random.randint(numbTrainAll);
    xTrain = x[:,indexTrain].reshape(-1, 1);

    #run forward pass
    arrayA[0]=xTrain; #treat input as the A1 matrix
    for j in (range(numbLayer-1)):
        arrayA[j+1]=activate(arrayA[j],arrayW[j],array_b[j]);

    #run backward pass (to calculate the gradient) using Hadamard products
    A_temp=arrayA[-1];
    array_delta[-1]=A_temp*(1-A_temp)*(A_temp-y[:,indexTrain].reshape(-1, 1));
    for j in np.arange(numbLayer-2,0,-1):
        A_temp=arrayA[j];
        delta_temp=array_delta[j];
        W_temp=arrayW[j];
        array_delta[j-1]= A_temp*(1-A_temp)*np.matmul(np.transpose(W_temp),delta_temp);


    #update using steepest descent
    #transpose of xTrains used for W matrices
    for j in range(numbLayer-1):
        A_temp=arrayA[j];
        delta_temp=array_delta[j];
        arrayW[j]=arrayW[j] - rateLearn*np.matmul(delta_temp,np.transpose(A_temp));
        array_b[j]=array_b[j] - rateLearn*delta_temp;


    #update cost
    costCurrent_i = getCost(x,y,arrayW,array_b,numbLayer);
    costHistory[i] = costCurrent_i;

#generating test data
xTest = 2*np.random.uniform(0,1,(2,numbTest))-1;
xTest1 = xTest[0,:];
xTest2 = xTest[1,:];

booleTest = np.zeros((2,numbTest));

#testing every point
for k in range(numbTest):
    xTestSingle = xTest[:,k].reshape(-1, 1);
    #apply fitted model
    yTestSingle = feedForward(xTestSingle,arrayW,array_b,numbLayer);
    indexTest = np.argmax(yTestSingle, axis=0);
    booleTest[0,k] = ((indexTest==0)[0]);
    booleTest[1,k] = ((indexTest==1)[0]);

if boolePlot:
    #plot history of cost
    plt.figure();
    plt.semilogy(np.arange(numbSteps)+1,costHistory);
    plt.xlabel('Number of optimization steps');
    plt.ylabel('Value of cost function');
    plt.title('History of the cost function')

    #plot training data
    fig, ax = plt.subplots(1, 2);
    ax[0].plot(x1[np.where(booleTrain[0,:])],x2[np.where(booleTrain[0,:])],\
        'rs', markerfacecolor='none');
    ax[0].plot(x1[np.where(booleTrain[1,:])],x2[np.where(booleTrain[1,:])],\
        'b+');
    ax[0].set_xlabel('x');
    ax[0].set_ylabel('y');
    ax[0].set_title('Training data');
    ax[0].set_xlim([-1, 1])
    ax[0].set_ylim([-1, 1])
    ax[0].axis('equal');

    #plot test data
    ax[1].plot(xTest1[np.where(booleTest[0,:])],xTest2[np.where(booleTest[0,:])],\
        'rd', markerfacecolor='none');
    ax[1].plot(xTest1[np.where(booleTest[1,:])],xTest2[np.where(booleTest[1,:])],\
        'bx');
    ax[1].set_xlabel('x');
    ax[1].set_ylabel('y');
    ax[1].set_title('Testing data');
    ax[1].set_xlim([-1, 1])
    ax[1].set_ylim([-1, 1])
    ax[1].axis('equal');

Results

After fitting or training the models, I tested them by applying the fitted neural network to other data. (In machine learning and statistics, you famously don’t use the same data to train and test a model.)

The number of steps needed to obtain reasonable results was about 100 thousand. To get a feeling of this, look at the plot of the cost function value over the number of optimization steps. For more complicated problems, such as the flower or rose problem, even more steps were needed.

I found the Python code, which is basically a close literal translation (by me) of the MATLAB code, too slow, so I generated the results using MATLAB.

Note: These results were generated with stochastic methods (namely, the training method), so my results will not agree exactly with yours, but hopefully they’ll be close enough.

Simple

Voronoi

Rose

The rose problem was the most difficult to classify. Sometimes it was a complete fail. Sometimes the training method could only see one component of the rose. It worked a couple of times, namely with 20 thousand optimization steps and a random seed of 42.

I imagine this problem is difficult due to the obvious complex geometry. Specifically, the both ends of each petal become thin, which is where the testing breaks down.

Tag: steepest descent

Creating a simple feedforward neural network (without using libraries)