Variance propagation

This package implements variance propagating networks.

If we really want to talk about neural networks in a probabilistic way, the right way to do it is to treat every number in the network as a Dirac distributed value.

There have been numerous attempts to model the adaptable parameters of networks as random variables, leading to so called “Bayesian Neural Networks”.

In some applications, it makes sense to treat the activations as random variables. This can be done very efficiently and with a very good approximation for the mean and the variance of random variables.

The algorithm for this has initially been described in [FD] and been described in the context of RNNs in [FD-RNN].

References

[FD]Wang, Sida, and Christopher Manning. “Fast dropout training.” Proceedings of the 30th International Conference on Machine Learning (ICML-13). 2013.
[FD-RNN]Bayer, Justin, et al. “On Fast Dropout and its Applicability to Recurrent Networks.” arXiv preprint arXiv:1311.0701 (2013).

Recurrent Networks

Module implementing variance propagation and fast dropout for recurrent networks.

In this module, we will often do with multiple sequences organized into a single Theano tensor. This tensor then has the shape of (t, n, d), where

  • t is the number of time steps,
  • n is the number of samples and
  • d is the dimensionality of each sample.

We call these “sequence tensor”. Sometimes, it makes sense to flatten out the time dimension to apply better optimized linear algebra, such as a dot product. In that case, we will talk of a “flat sequence tensor”.

breze.arch.model.varprop.rnn.recurrent_layer(in_mean, in_var, weights, f, initial_hidden_mean, initial_hidden_var, p_dropout)

Return a theano variable representing a recurrent layer.

Parameters:

in_mean : Theano variable

Sequence tensor of shape (t, n ,d). Represents the mean of the input to the layer.

in_var : Theano variable

Sequence tensor. Represents the variance of the input to the layer. Either (a) same shape as the mean or (b) scalar.

weights : Theano variable

Theano matrix of shape (d, d). Represents the recurrent weight matrix the hiddens are right multiplied with.

f : function

Function that takes a theano variable and returns a theano variable of the same shape. Meant as transfer function of the layer.

initial_hidden : Theano variable

Theano vector of size d, representing the initial hidden state.

p_dropout : Theano variable

Scalar representing the probability that unit is dropped out.

Returns:

hidden_in_mean_rec : Theano variable

Theano sequence tensor representing the mean of the hidden activations before the application of f.

hidden_in_var_rec : Theano variable

Theano sequence tensor representing the varianceof the hidden activations before the application of f.

hidden_mean_rec : Theano variable

Theano sequence tensor representing the mean of the hidden activations after the application of f.

hidden_var_rec : Theano variable

Theano sequence tensor representing the varianceof the hidden activations after the application of f.

Transfer functions

Module that contains transfer functions for variance propagation, working on Theano variables.

Each transfer function has the signature:

m2, s2 = f(m1, s1)

where f is the transfer function, m1 and s2 are the pre-synaptic mean and variance respectively; m2 and s2 are the post-synaptic means.

breze.arch.component.varprop.transfer.identity(mean, var)

Return the mean and variance unchanged.

Parameters:

mean : Theano variable

Theano variable of the shape s.

var : Theano variable

Theano variable of the shape s.

Returns:

mean_ : Theano variable

Theano variable of the shape r.

var_ : Theano variable

Theano variable of the shape r.

breze.arch.component.varprop.transfer.sigmoid(mean, var)

Return the mean and variance of a Gaussian distributed random variable, described by its mean and variacne, after passing it through a logistic sigmoid.

Parameters:

mean : Theano variable

Theano variable of the shape s.

var : Theano variable

Theano variable of the shape s.

Returns:

mean_ : Theano variable

Theano variable of the shape r.

var_ : Theano variable

Theano variable of the shape r.

breze.arch.component.varprop.transfer.rectifier(mean, var)

Return the mean and variance of a Gaussian distributed random variable, described by its mean and variacne, after passing it through a rectified linear unit.

Parameters:

mean : Theano variable

Theano variable of the shape s.

var : Theano variable

Theano variable of the shape s.

Returns:

mean_ : Theano variable

Theano variable of the shape r.

var_ : Theano variable

Theano variable of the shape r.

breze.arch.component.varprop.transfer.tanh(mean, var)

Return the mean and variance of a Gaussian distributed random variable, described by its mean and variacne, after passing it through a tangent hyperbolicus.

Parameters:

mean : Theano variable

Theano variable of the shape s.

var : Theano variable

Theano variable of the shape s.

Returns:

mean_ : Theano variable

Theano variable of the shape r.

var_ : Theano variable

Theano variable of the shape r.

Losses

Module containing several losses usable for supervised and unsupervised training. This is different from breze.component.loss in the sense that each prediction is also assumed to have a variance.

The losses in this module assume two inputs: a target and a prediction. Additionally, if the target has a dimensionality of D, the prediction is assumed to have a dimensionality of 2D. The first D element constitute to the mean while the latter to the variance.

Additionally, all losses from breze.arch.component.loss are also available; here, we just ignore the variance part of the input to the loss.