Deep.Net


Model Components

As you start building more complex machine learning models, it becomes beneficial to build the model from small, reusable components. For example it makes sense to define a generic multilayer perceptron component and use it in multiple models. In Deep.Net, a model component can contain other model components; for example an autoencoder component could be built using two multi-layer perceptron components.

In this document we will describe how to build a simple layer of neurons and how to instantiate it in your model. You can run this example by executing FsiAnyCPU.exe docs\content\components.fsx after cloning the Deep.Net repository.

Defining a model component

A model component corresponds to an F# module that contains conventionally named types and functions. We will call our example component MyFirstPerceptron.

We will build a component for a single layer of neurons. Our neural layer will compute the function \(f(\mathbf{x}) = \mathbf{\sigma} ( W \mathbf{x} + \mathbf{b} )\) where \(\mathbf{\sigma}\) can be either element-wise \(\mathrm{tanh}\) or the soft-max function \(\mathbf{\sigma}(\mathbf{x})_i = \exp(x_i) / \sum_{i'} \exp(x_i')\). \(W\) is the weight matrix and \(\mathbf{b}\) is the bias vector.

Consequently our component has two parameters (a parameter is a quantity that changes during model training): \(W\) and \(\mathbf{b}\). These two parameters give rise to two integer hyper-parameters (a hyper-parameter is fixed a model definition and does not change during training): the number of inputs NInput (number of columns in \(W\)) and the number of outputs NOutput (number of rows in \(W\)). Furthermore we have the transfer function as a third, discrete hyper-parameter TransferFunc that can either be Tanh or SoftMax. Let us define record types for the parameters and hyper-parameters.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
open ArrayNDNS
open SymTensor
open Datasets

module MyFirstPerceptron = 

    type TransferFuncs =
        | Tanh
        | SoftMax

    type HyperPars = {
        NInput:         SizeSpecT
        NOutput:        SizeSpecT
        TransferFunc:   TransferFuncs
    }

    type Pars = {
        Weights:        ExprT<single> ref
        Bias:           ExprT<single> ref
        HyperPars:      HyperPars
    }

We see that, by convention, the record type for the hyper-parameters is named HyperPars. The fields NInput and NOutput have been defined as type SizeSpecT. This is the type used by Deep.Net to represent an integral size, either symbolic or numeric.

Also by convention, the record type that stores the model parameters is named Pars. The weights and bias have been defined as type ExprT<single> ref.

SymTensor.ExprT<'T> is the type of an symbolic tensor expression of data type 'T. For example 'T can be float for a tensor containing double precision floating point numbers or, as in this case, single for single precision floats. The reader might wonder, why we use the generic expression type instead of the VarSpecT type that represents a symbolic variable in Deep.Net. After all, the model's parameters are variables, are they not?

While in most cases, a model parameter will be a tensor variable, it makes sense to let the user pass an arbitrary expression for the model parameter. Consider, for example, an auto-encoder with tied input/output weights (this means that the weights of the output layer are given by the transposition of the weights of the input layer). The user can construct such an auto-encoder using two of our perceptron components. He just needs to set pOut.Weights <- pIn.Weights.T, where pOut represents the parameters of the output layer and pIn represents the parameters of the input layer, to tie the input and output weights together. But this would not be possible if Pars.Weights was of type VarSpecT since pIn.Weights.T is an expression due to the use of the transposition operation.

Furthermore we observe that Weights and Bias have been declared as reference cells. We will see the reason for that in a few lines further below.

Let us now define the functions of ours component's module.

We define a function pars that, by convention, returns an instance of the parameter record.

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
    let internal initBias (seed: int) (shp: int list) : ArrayNDHostT<single> =
        ArrayNDHost.zeros shp

    let pars (mb: ModelBuilder<_>) (hp: HyperPars) = {
        Weights   = mb.Param ("Weights", [hp.NOutput; hp.NInput])
        Bias      = mb.Param ("Bias",    [hp.NOutput], initBias)
        HyperPars = hp
    }

The function pars takes two arguments: a model builder and the hyper-parameters of the component. It construct a parameter record and populates the weights and bias with parameter tensors obtain from the model builder by calling mb.Param with the appropriate shapes from the hyper-parameters.

For the bias we also specify the custom initialization function initBias. A custom initialization function takes two arguments: a random seed and a list of integers representing the shape of the instantiated parameter tensor. It should return the initialization value of appropriate shape for the parameter. Here, we initialize the bias with zero and thus return a zero tensor of the requested shape. If no custom initializer is specified, the parameter is initialized using random numbers from a uniform distribution with support \([-0.01, 0.01]\).

We also store a reference to the hyper-parameters in our parameter record to save ourselves the trouble of passing the hyper-parameter record to functions that require both the parameter record and the hyper-parameter record.

Now, we can define the function that returns the expression for the output of the perceptron component.

1: 
2: 
3: 
4: 
5: 
    let pred pars input =
        let activation = !pars.Weights .* input + !pars.Bias
        match pars.HyperPars.TransferFunc with
        | Tanh     -> tanh activation
        | SoftMax  -> exp activation / Expr.sumKeepingAxis 0 (exp activation)

The function computes the activation using the formula specified above and then applies the transfer function specified in the hyper-parameters. The normalization of the soft-max activation function is performed over the left-most axis. The ! operator is used to dereference the reference cells in the parameter record.

This concludes the definition of our model component.

Predefined model components

The Models namespace of Deep.Net contains the following model components:

  • LinearRegression. A linear predictor.
  • NeuralLayer. A layer of neurons, with weights, bias and a transfer function.
  • LossLayer. A layer that calculates the loss between predictions and target values using a difference metric (for example the mean-squared-error or cross entropy).
  • MLP. A multi-layer neural network with a loss layer on top.

Using a model component

Let us rebuild the hand-crafted model described in the chapter Learning MNIST using the MyFirstPerceptron component and the LossLayer component from Deep.Net. As before the model will consist of one hidden layer of neurons with a tanh transfer function and an output layer with a soft-max transfer function.

As in the referred chapter, we first load the MNIST dataset and declare symbolic sizes for the model.

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
let mnist = Mnist.load (__SOURCE_DIRECTORY__ + "../../../Data/MNIST") 0.0
            |> TrnValTst.ToCuda

let mb = ModelBuilder<single> "NeuralNetModel"

let nBatch  = mb.Size "nBatch"
let nInput  = mb.Size "nInput"
let nClass  = mb.Size "nClass"
let nHidden = mb.Size "nHidden"

Then we instantiate the parameters of our components.

1: 
2: 
3: 
4: 
5: 
6: 
let lyr1 = 
    MyFirstPerceptron.pars (mb.Module "lyr1") 
        {NInput=nInput; NOutput=nHidden; TransferFunc=MyFirstPerceptron.Tanh}
let lyr2 =
    MyFirstPerceptron.pars (mb.Module "lyr2")
        {NInput=nHidden; NOutput=nClass; TransferFunc=MyFirstPerceptron.SoftMax}

We used the mb.Module method of the model builder to create a new, subordinated model builder for the components. The mb.Module function takes one argument that specifies an identifier for the subordinated model builder. The name of the current model builder is combined using a dot with the specified identifier to construct the name of the subordinated model builder. In this example mb has the name NeuralNetModel and we specified the identifier lyr1 when calling mb.Module. Hence, the subordinate model builder will have the name NeuralNetModel.lyr1.

The mb.Param method combines the name of the model builder with the specified identifier to construct the full parameter name. Thus the weights parameter of lyr1 will have the full name NeuralNetModel.lyr1.Weights and the biases will be NeuralNetModel.lyr1.Bias. This automatic parameter name construction allows multiple, independent instantiations of components without name clashes.

We continue with variable definitions and model instantiation.

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
let input  : ExprT<single> = mb.Var "Input"  [nBatch; nInput]
let target : ExprT<single> = mb.Var "Target" [nBatch; nClass]

mb.SetSize nInput mnist.Trn.All.Img.Shape.[1]
mb.SetSize nClass mnist.Trn.All.Lbl.Shape.[1]
mb.SetSize nHidden 100

open SymTensor.Compiler.Cuda
let mi = mb.Instantiate DevCuda

Next, we use the components to generate the model's expressions.

1: 
2: 
let hiddenVal = MyFirstPerceptron.pred lyr1 input.T
let classProb = MyFirstPerceptron.pred lyr2 hiddenVal

And the LossLayer from Deep.Net to generate an expression for the loss.

1: 
2: 
open Models
let loss = LossLayer.loss LossLayer.CrossEntropy classProb target.T

We can now precede to compile our model's expressions into functions and train it using the gradient descent optimizer for a fixed number of iterations.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
let opt = Optimizers.GradientDescent (loss, mi.ParameterVector, DevCuda)
let optCfg = { Optimizers.GradientDescent.Step=1e-1f }

let lossFn = mi.Func loss |> arg2 input target
let optFn = mi.Func opt.Minimize |> opt.Use |> arg2 input target

for itr = 0 to 1000 do
    optFn mnist.Trn.All.Img mnist.Trn.All.Lbl optCfg |> ignore
    if itr % 50 = 0 then
        let l = lossFn mnist.Tst.All.Img mnist.Tst.All.Lbl |> ArrayND.value
        printfn "Test loss after %5d iterations: %.4f" itr l

This should produce output similar to

1: 
2: 
3: 
4: 
5: 
Test loss after     0 iterations: 2.3013
Test loss after    50 iterations: 1.9930
Test loss after   100 iterations: 1.0479
...
Test loss after  1000 iterations: 0.2701

Nesting model components

Model components can be nested. This means that a component can contain one more other components. For illustration, let us define an autoencoder component using our MyFirstPerceptron component.

We begin by defining the hyper-parameters and parameter.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
module MyFirstAutoencoder =

    type HyperPars = {
        NInOut:         SizeSpecT
        NLatent:        SizeSpecT
    }

    type Pars = {
        InLayer:        MyFirstPerceptron.Pars
        OutLayer:       MyFirstPerceptron.Pars
        HyperPars:      HyperPars
    }

The hyper-parameters consists of the number of inputs and output and the number of neurons that constitute the latent representation. The parameters are made up of the parameters of the input layer and the parameters of the output layer; thus we just reuse the existing record type from the MyFirstPerceptron component.

Next, we define the pars function that instantiates a parameter record for this component.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
    let pars (mb: ModelBuilder<_>) (hp: HyperPars) = 
        let hpInLayer : MyFirstPerceptron.HyperPars = {
            NInput       = hp.NInOut
            NOutput      = hp.NLatent
            TransferFunc = MyFirstPerceptron.Tanh
        }
        let hpOutLayer : MyFirstPerceptron.HyperPars = {
            NInput       = hp.NLatent
            NOutput      = hp.NInOut
            TransferFunc = MyFirstPerceptron.Tanh
        }

        {
            InLayer   = MyFirstPerceptron.pars (mb.Module "InLayer") hpInLayer
            OutLayer  = MyFirstPerceptron.pars (mb.Module "OutLayer") hpOutLayer
            HyperPars = hp
        }

The function computer the hyper-parameters for the input and output layer and calls the MyFirstPerceptron.pars function to instantiate the parameter records for the two employed perceptrons.

Now, we can define the expressions for the latent values, the reconstruction and the reconstruction error.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
    let latent pars input = 
        input
        |> MyFirstPerceptron.pred pars.InLayer

    let reconst pars input =
        input
        |> MyFirstPerceptron.pred pars.InLayer
        |> MyFirstPerceptron.pred pars.OutLayer

    let loss pars input = 
        input
        |> reconst pars
        |> LossLayer.loss LossLayer.MSE input

This concludes the definition of the autoencoder model. As you have seen, it is straightforward to create more complex components by combining existing components.

Finally, let us instantiate our simple autoencoder with 100 latent units and train it on MNIST.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
let mb2 = ModelBuilder<single> "AutoEncoderModel"

// define symbolic sizes
let nBatch2  = mb2.Size "nBatch2"
let nInput2  = mb2.Size "nInput2"
let nLatent2 = mb2.Size "nLatent2"

// define model parameters
let ae = 
    MyFirstAutoencoder.pars (mb2.Module "Autoencoder") {NInOut=nInput2; NLatent=nLatent2}

// instantiate model
mb2.SetSize nInput2 mnist.Trn.All.Img.Shape.[1]
mb2.SetSize nLatent2 100
let mi2 = mb2.Instantiate DevCuda

// loss function
let input2 = mb2.Var "Input"  [nBatch2; nInput2]
let loss2 = MyFirstAutoencoder.loss ae input2.T
let lossFn2 = mi2.Func loss2 |> arg1 input2 

// optimization function
let opt2 = Optimizers.GradientDescent (loss2, mi2.ParameterVector, DevCuda)
let optCfg2 = { Optimizers.GradientDescent.Step=1e-1f }
let optFn2 = mi2.Func opt2.Minimize |> opt2.Use |> arg1 input2 

// initializes parameters and train
mi2.InitPars 123
for itr = 0 to 1000 do
    optFn2 mnist.Trn.All.Img optCfg2 |> ignore
    if itr % 50 = 0 then
        let l = lossFn2 mnist.Tst.All.Img |> ArrayND.value
        printfn "Reconstruction error after %5d iterations: %.4f" itr l

This should produce output similar to

1: 
2: 
3: 
4: 
5: 
Reconstruction error after     0 iterations: 0.1139
Reconstruction error after    50 iterations: 0.1124
Reconstruction error after   100 iterations: 0.1105    
...
Reconstruction error after  1000 iterations: 0.0641

Note: Training of the autoencoder seems to be slow with the current version of Deep.Net. We are investigating the reasons for this and plan to deploy optimizations that will make training faster.

Summary

Model components provide a way to construct a model out of small building blocks. Predefined models are located in Models namespace. Component use and definition in Deep.Net is not constrained by a fixed interface but naming and signature conventions exist. The model builder supports the use of components through the mb.Module function that creates a subordinated model builder with a distinct namespace to avoid name clashes between components. A component can also contain further components; thus more complex components can be constructed out of simple ones.

type TransferFuncs =
  | Tanh
  | SoftMax

Full name: Components.MyFirstPerceptron.TransferFuncs
union case TransferFuncs.Tanh: TransferFuncs
union case TransferFuncs.SoftMax: TransferFuncs
type HyperPars =
  {NInput: obj;
   NOutput: obj;
   TransferFunc: TransferFuncs;}

Full name: Components.MyFirstPerceptron.HyperPars
HyperPars.NInput: obj
HyperPars.NOutput: obj
HyperPars.TransferFunc: TransferFuncs
type Pars =
  {Weights: obj;
   Bias: obj;
   HyperPars: HyperPars;}

Full name: Components.MyFirstPerceptron.Pars
Pars.Weights: obj
Multiple items
val single : value:'T -> single (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.single

--------------------
type single = System.Single

Full name: Microsoft.FSharp.Core.single
Multiple items
val ref : value:'T -> 'T ref

Full name: Microsoft.FSharp.Core.Operators.ref

--------------------
type 'T ref = Ref<'T>

Full name: Microsoft.FSharp.Core.ref<_>
Pars.Bias: obj
Multiple items
Pars.HyperPars: HyperPars

--------------------
type HyperPars =
  {NInput: obj;
   NOutput: obj;
   TransferFunc: TransferFuncs;}

Full name: Components.MyFirstPerceptron.HyperPars
val internal initBias : seed:int -> shp:int list -> 'a

Full name: Components.MyFirstPerceptron.initBias
val seed : int
Multiple items
val int : value:'T -> int (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.int

--------------------
type int = int32

Full name: Microsoft.FSharp.Core.int

--------------------
type int<'Measure> = int

Full name: Microsoft.FSharp.Core.int<_>
val shp : int list
type 'T list = List<'T>

Full name: Microsoft.FSharp.Collections.list<_>
val pars : mb:'a -> hp:HyperPars -> Pars

Full name: Components.MyFirstPerceptron.pars
val mb : 'a
val hp : HyperPars
val pred : pars:Pars -> input:obj -> obj

Full name: Components.MyFirstPerceptron.pred
val pars : Pars
val input : obj
val activation : obj
Pars.HyperPars: HyperPars
val tanh : value:'T -> 'T (requires member Tanh)

Full name: Microsoft.FSharp.Core.Operators.tanh
val exp : value:'T -> 'T (requires member Exp)

Full name: Microsoft.FSharp.Core.Operators.exp
val mnist : 'a

Full name: Components.mnist
val mb : 'a

Full name: Components.mb
val nBatch : 'a

Full name: Components.nBatch
val nInput : 'a

Full name: Components.nInput
val nClass : 'a

Full name: Components.nClass
val nHidden : 'a

Full name: Components.nHidden
val lyr1 : MyFirstPerceptron.Pars

Full name: Components.lyr1
module MyFirstPerceptron

from Components
val pars : mb:'a -> hp:MyFirstPerceptron.HyperPars -> MyFirstPerceptron.Pars

Full name: Components.MyFirstPerceptron.pars
union case MyFirstPerceptron.TransferFuncs.Tanh: MyFirstPerceptron.TransferFuncs
val lyr2 : MyFirstPerceptron.Pars

Full name: Components.lyr2
union case MyFirstPerceptron.TransferFuncs.SoftMax: MyFirstPerceptron.TransferFuncs
val input : 'a

Full name: Components.input
val target : 'a

Full name: Components.target
val mi : 'a

Full name: Components.mi
val hiddenVal : obj

Full name: Components.hiddenVal
val pred : pars:MyFirstPerceptron.Pars -> input:obj -> obj

Full name: Components.MyFirstPerceptron.pred
val classProb : obj

Full name: Components.classProb
val loss : 'a

Full name: Components.loss
val opt : 'a

Full name: Components.opt
val optCfg : 'a

Full name: Components.optCfg
val lossFn : ('a -> 'b -> 'c)

Full name: Components.lossFn
val optFn : ('a -> 'b -> 'c -> 'd)

Full name: Components.optFn
val itr : int
val ignore : value:'T -> unit

Full name: Microsoft.FSharp.Core.Operators.ignore
val l : float
val printfn : format:Printf.TextWriterFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
type HyperPars =
  {NInOut: obj;
   NLatent: obj;}

Full name: Components.MyFirstAutoencoder.HyperPars
HyperPars.NInOut: obj
HyperPars.NLatent: obj
type Pars =
  {InLayer: Pars;
   OutLayer: Pars;
   HyperPars: HyperPars;}

Full name: Components.MyFirstAutoencoder.Pars
Pars.InLayer: MyFirstPerceptron.Pars
Pars.OutLayer: MyFirstPerceptron.Pars
Multiple items
Pars.HyperPars: HyperPars

--------------------
type HyperPars =
  {NInOut: obj;
   NLatent: obj;}

Full name: Components.MyFirstAutoencoder.HyperPars
val pars : mb:'a -> hp:HyperPars -> Pars

Full name: Components.MyFirstAutoencoder.pars
val hpInLayer : MyFirstPerceptron.HyperPars
val hpOutLayer : MyFirstPerceptron.HyperPars
val latent : pars:Pars -> input:obj -> obj

Full name: Components.MyFirstAutoencoder.latent
val reconst : pars:Pars -> input:obj -> obj

Full name: Components.MyFirstAutoencoder.reconst
val loss : pars:Pars -> input:obj -> 'a

Full name: Components.MyFirstAutoencoder.loss
val mb2 : 'a

Full name: Components.mb2
val nBatch2 : 'a

Full name: Components.nBatch2
val nInput2 : 'a

Full name: Components.nInput2
val nLatent2 : 'a

Full name: Components.nLatent2
val ae : MyFirstAutoencoder.Pars

Full name: Components.ae
module MyFirstAutoencoder

from Components
val pars : mb:'a -> hp:MyFirstAutoencoder.HyperPars -> MyFirstAutoencoder.Pars

Full name: Components.MyFirstAutoencoder.pars
val mi2 : 'a

Full name: Components.mi2
val input2 : 'a

Full name: Components.input2
val loss2 : 'a

Full name: Components.loss2
val loss : pars:MyFirstAutoencoder.Pars -> input:obj -> 'a

Full name: Components.MyFirstAutoencoder.loss
val lossFn2 : ('a -> 'b)

Full name: Components.lossFn2
val opt2 : 'a

Full name: Components.opt2
val optCfg2 : 'a

Full name: Components.optCfg2
val optFn2 : ('a -> 'b -> 'c)

Full name: Components.optFn2
Fork me on GitHub