Model Components
As you start building more complex machine learning models, it becomes beneficial to build the model from small, reusable components. For example it makes sense to define a generic multilayer perceptron component and use it in multiple models. In Deep.Net, a model component can contain other model components; for example an autoencoder component could be built using two multi-layer perceptron components.
In this document we will describe how to build a simple layer of neurons and how to instantiate it in your model.
You can run this example by executing FsiAnyCPU.exe docs\content\components.fsx
after cloning the Deep.Net repository.
Defining a model component
A model component corresponds to an F# module that contains conventionally named types and functions.
We will call our example component MyFirstPerceptron
.
We will build a component for a single layer of neurons. Our neural layer will compute the function \(f(\mathbf{x}) = \mathbf{\sigma} ( W \mathbf{x} + \mathbf{b} )\) where \(\mathbf{\sigma}\) can be either element-wise \(\mathrm{tanh}\) or the soft-max function \(\mathbf{\sigma}(\mathbf{x})_i = \exp(x_i) / \sum_{i'} \exp(x_i')\). \(W\) is the weight matrix and \(\mathbf{b}\) is the bias vector.
Consequently our component has two parameters (a parameter is a quantity that changes during model training): \(W\) and \(\mathbf{b}\).
These two parameters give rise to two integer hyper-parameters (a hyper-parameter is fixed a model definition and does not change during training): the number of inputs NInput
(number of columns in \(W\)) and the number of outputs NOutput
(number of rows in \(W\)).
Furthermore we have the transfer function as a third, discrete hyper-parameter TransferFunc
that can either be Tanh
or SoftMax
.
Let us define record types for the parameters and hyper-parameters.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: |
|
We see that, by convention, the record type for the hyper-parameters is named HyperPars
.
The fields NInput
and NOutput
have been defined as type SizeSpecT
.
This is the type used by Deep.Net to represent an integral size, either symbolic or numeric.
Also by convention, the record type that stores the model parameters is named Pars
.
The weights and bias have been defined as type ExprT<single> ref
.
SymTensor.ExprT<'T>
is the type of an symbolic tensor expression of data type 'T
.
For example 'T
can be float
for a tensor containing double precision floating point numbers or, as in this case, single
for single precision floats.
The reader might wonder, why we use the generic expression type instead of the VarSpecT
type that represents a symbolic variable in Deep.Net.
After all, the model's parameters are variables, are they not?
While in most cases, a model parameter will be a tensor variable, it makes sense to let the user pass an arbitrary expression for the model parameter.
Consider, for example, an auto-encoder with tied input/output weights (this means that the weights of the output layer are given by the transposition of the weights of the input layer).
The user can construct such an auto-encoder using two of our perceptron components.
He just needs to set pOut.Weights <- pIn.Weights.T
, where pOut
represents the parameters of the output layer and pIn
represents the parameters of the input layer, to tie the input and output weights together.
But this would not be possible if Pars.Weights
was of type VarSpecT
since pIn.Weights.T
is an expression due to the use of the transposition operation.
Furthermore we observe that Weights
and Bias
have been declared as reference cells.
We will see the reason for that in a few lines further below.
Let us now define the functions of ours component's module.
We define a function pars
that, by convention, returns an instance of the parameter record.
1: 2: 3: 4: 5: 6: 7: 8: |
|
The function pars
takes two arguments: a model builder and the hyper-parameters of the component.
It construct a parameter record and populates the weights and bias with parameter tensors obtain from the model builder by calling mb.Param
with the appropriate shapes from the hyper-parameters.
For the bias we also specify the custom initialization function initBias
.
A custom initialization function takes two arguments: a random seed and a list of integers representing the shape of the instantiated parameter tensor.
It should return the initialization value of appropriate shape for the parameter.
Here, we initialize the bias with zero and thus return a zero tensor of the requested shape.
If no custom initializer is specified, the parameter is initialized using random numbers from a uniform distribution with support \([-0.01, 0.01]\).
We also store a reference to the hyper-parameters in our parameter record to save ourselves the trouble of passing the hyper-parameter record to functions that require both the parameter record and the hyper-parameter record.
Now, we can define the function that returns the expression for the output of the perceptron component.
1: 2: 3: 4: 5: |
|
The function computes the activation using the formula specified above and then applies the transfer function specified in the hyper-parameters.
The normalization of the soft-max activation function is performed over the left-most axis.
The !
operator is used to dereference the reference cells in the parameter record.
This concludes the definition of our model component.
Predefined model components
The Models
namespace of Deep.Net contains the following model components:
- LinearRegression. A linear predictor.
- NeuralLayer. A layer of neurons, with weights, bias and a transfer function.
- LossLayer. A layer that calculates the loss between predictions and target values using a difference metric (for example the mean-squared-error or cross entropy).
- MLP. A multi-layer neural network with a loss layer on top.
Using a model component
Let us rebuild the hand-crafted model described in the chapter Learning MNIST using the MyFirstPerceptron
component and the LossLayer
component from Deep.Net.
As before the model will consist of one hidden layer of neurons with a tanh transfer function and an output layer with a soft-max transfer function.
As in the referred chapter, we first load the MNIST dataset and declare symbolic sizes for the model.
1: 2: 3: 4: 5: 6: 7: 8: 9: |
|
Then we instantiate the parameters of our components.
1: 2: 3: 4: 5: 6: |
|
We used the mb.Module
method of the model builder to create a new, subordinated model builder for the components.
The mb.Module
function takes one argument that specifies an identifier for the subordinated model builder.
The name of the current model builder is combined using a dot with the specified identifier to construct the name of the subordinated model builder.
In this example mb
has the name NeuralNetModel
and we specified the identifier lyr1
when calling mb.Module
.
Hence, the subordinate model builder will have the name NeuralNetModel.lyr1
.
The mb.Param
method combines the name of the model builder with the specified identifier to construct the full parameter name.
Thus the weights parameter of lyr1
will have the full name NeuralNetModel.lyr1.Weights
and the biases will be NeuralNetModel.lyr1.Bias
.
This automatic parameter name construction allows multiple, independent instantiations of components without name clashes.
We continue with variable definitions and model instantiation.
1: 2: 3: 4: 5: 6: 7: 8: 9: |
|
Next, we use the components to generate the model's expressions.
1: 2: |
|
And the LossLayer
from Deep.Net to generate an expression for the loss.
1: 2: |
|
We can now precede to compile our model's expressions into functions and train it using the gradient descent optimizer for a fixed number of iterations.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: |
|
This should produce output similar to
1: 2: 3: 4: 5: |
|
Nesting model components
Model components can be nested.
This means that a component can contain one more other components.
For illustration, let us define an autoencoder component using our MyFirstPerceptron
component.
We begin by defining the hyper-parameters and parameter.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: |
|
The hyper-parameters consists of the number of inputs and output and the number of neurons that constitute the latent representation.
The parameters are made up of the parameters of the input layer and the parameters of the output layer; thus we just reuse the existing record type from the MyFirstPerceptron
component.
Next, we define the pars
function that instantiates a parameter record for this component.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: |
|
The function computer the hyper-parameters for the input and output layer and calls the MyFirstPerceptron.pars
function to instantiate the parameter records for the two employed perceptrons.
Now, we can define the expressions for the latent values, the reconstruction and the reconstruction error.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: |
|
This concludes the definition of the autoencoder model. As you have seen, it is straightforward to create more complex components by combining existing components.
Finally, let us instantiate our simple autoencoder with 100 latent units and train it on MNIST.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: |
|
This should produce output similar to
1: 2: 3: 4: 5: |
|
Note: Training of the autoencoder seems to be slow with the current version of Deep.Net. We are investigating the reasons for this and plan to deploy optimizations that will make training faster.
Summary
Model components provide a way to construct a model out of small building blocks.
Predefined models are located in Models
namespace.
Component use and definition in Deep.Net is not constrained by a fixed interface but naming and signature conventions exist.
The model builder supports the use of components through the mb.Module
function that creates a subordinated model builder with a distinct namespace to avoid name clashes between components.
A component can also contain further components; thus more complex components can be constructed out of simple ones.
| Tanh
| SoftMax
Full name: Components.MyFirstPerceptron.TransferFuncs
{NInput: obj;
NOutput: obj;
TransferFunc: TransferFuncs;}
Full name: Components.MyFirstPerceptron.HyperPars
{Weights: obj;
Bias: obj;
HyperPars: HyperPars;}
Full name: Components.MyFirstPerceptron.Pars
val single : value:'T -> single (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.single
--------------------
type single = System.Single
Full name: Microsoft.FSharp.Core.single
val ref : value:'T -> 'T ref
Full name: Microsoft.FSharp.Core.Operators.ref
--------------------
type 'T ref = Ref<'T>
Full name: Microsoft.FSharp.Core.ref<_>
Pars.HyperPars: HyperPars
--------------------
type HyperPars =
{NInput: obj;
NOutput: obj;
TransferFunc: TransferFuncs;}
Full name: Components.MyFirstPerceptron.HyperPars
Full name: Components.MyFirstPerceptron.initBias
val int : value:'T -> int (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.int
--------------------
type int = int32
Full name: Microsoft.FSharp.Core.int
--------------------
type int<'Measure> = int
Full name: Microsoft.FSharp.Core.int<_>
Full name: Microsoft.FSharp.Collections.list<_>
Full name: Components.MyFirstPerceptron.pars
Full name: Components.MyFirstPerceptron.pred
Full name: Microsoft.FSharp.Core.Operators.tanh
Full name: Microsoft.FSharp.Core.Operators.exp
Full name: Components.mnist
Full name: Components.mb
Full name: Components.nBatch
Full name: Components.nInput
Full name: Components.nClass
Full name: Components.nHidden
Full name: Components.lyr1
from Components
Full name: Components.MyFirstPerceptron.pars
Full name: Components.lyr2
Full name: Components.input
Full name: Components.target
Full name: Components.mi
Full name: Components.hiddenVal
Full name: Components.MyFirstPerceptron.pred
Full name: Components.classProb
Full name: Components.loss
Full name: Components.opt
Full name: Components.optCfg
Full name: Components.lossFn
Full name: Components.optFn
Full name: Microsoft.FSharp.Core.Operators.ignore
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
{NInOut: obj;
NLatent: obj;}
Full name: Components.MyFirstAutoencoder.HyperPars
{InLayer: Pars;
OutLayer: Pars;
HyperPars: HyperPars;}
Full name: Components.MyFirstAutoencoder.Pars
Pars.HyperPars: HyperPars
--------------------
type HyperPars =
{NInOut: obj;
NLatent: obj;}
Full name: Components.MyFirstAutoencoder.HyperPars
Full name: Components.MyFirstAutoencoder.pars
Full name: Components.MyFirstAutoencoder.latent
Full name: Components.MyFirstAutoencoder.reconst
Full name: Components.MyFirstAutoencoder.loss
Full name: Components.mb2
Full name: Components.nBatch2
Full name: Components.nInput2
Full name: Components.nLatent2
Full name: Components.ae
from Components
Full name: Components.MyFirstAutoencoder.pars
Full name: Components.mi2
Full name: Components.input2
Full name: Components.loss2
Full name: Components.MyFirstAutoencoder.loss
Full name: Components.lossFn2
Full name: Components.opt2
Full name: Components.optCfg2
Full name: Components.optFn2