Loss functions

Module containing several losses usable for supervised and unsupervised training.

A loss is of the form:

def loss(target, prediction, ...):
    ...

The results depends on the exact nature of the loss. Some examples are:

  • coordinate wise loss, such as a sum of squares or a Bernoulli cross entropy with a one-of-k target,
  • sample wise, such as neighbourhood component analysis.

In case of the coordinate wise losses, the dimensionality of the result should be the same as that of the predictions and targets. In all other cases, it is important that the sample axes (usually the first axis) stays the same. The individual data points lie along the coordinate axis, which might change to 1.

Some examples of valid shape transformations:

(n, d) -> (n, d)
(n, d) -> (n, 1)

These are not valid:

(n, d) -> (1, d)
(n, d) -> (n,)

For some examples, consult the source code of this module.

breze.arch.component.loss.squared(target, prediction)

Return the element wise squared loss between the target and the prediction.

Parameters:

target : Theano variable

An array of arbitrary shape representing representing the targets.

prediction : Theano variable

An array of arbitrary shape representing representing the predictions.

Returns:

res : Theano variable

An array of the same shape as target and prediction representing the pairwise distances.

breze.arch.component.loss.absolute(target, prediction)

Return the element wise absolute difference between the target and the prediction.

Parameters:

target : Theano variable

An array of arbitrary shape representing representing the targets.

prediction : Theano variable

An array of arbitrary shape representing representing the predictions.

Returns:

res : Theano variable

An array of the same shape as target and prediction representing the pairwise distances.

breze.arch.component.loss.cat_ce(target, prediction, eps=1e-08)

Return the cross entropy between the target and the prediction, where prediction is a summary of the statistics of a categorial distribution and target is a some outcome.

Used for multiclass classification purposes.

The loss is different to ncat_ce by that target is not an array of integers but a hot k coding.

Note that predictions are clipped between eps and 1 - eps to ensure numerical stability.

Parameters:

target : Theano variable

An array of shape (n, k) where n is the number of samples and k is the number of classes. Each row represents a hot k coding. It should be zero except for one element, which has to be exactly one.

prediction : Theano variable

An array of shape (n, k). Each row is interpreted as a categorical probability. Thus, each row has to sum up to one and be non-negative.

Returns:

res : Theano variable.

An array of the same size as target and prediction representing the pairwise divergences.

breze.arch.component.loss.ncat_ce(target, prediction)

Return the cross entropy between the target and the prediction, where prediction is a summary of the statistics of the categorical distribution and target is a some outcome.

Used for classification purposes.

The loss is different to cat_ce by that target is not a hot k coding but an array of integers.

Parameters:

target : Theano variable

An array of shape (n,) where n is the number of samples. Each entry of the array should be an integer between 0 and k-1, where k is the number of classes.

prediction : Theano variable

An array of shape (n, k) or (t, n , k). Each row (i.e. entry in the last dimension) is interpreted as a categorical probability. Thus, each row has to sum up to one and be non-negative.

Returns:

res : Theano variable

An array of shape (n, 1) as target containing the log probability that that example is classified correctly.

breze.arch.component.loss.bern_ces(target, prediction)

Return the Bernoulli cross entropies between binary vectors target and a number of Bernoulli variables prediction.

Used in regression on binary variables, not classification.

Parameters:

target : Theano variable

An array of shape (n, k) where n is the number of samples and k is the number of outputs. Each entry should be either 0 or 1.

prediction : Theano variable.

An array of shape (n, k). Each row is interpreted as a set of statistics of Bernoulli variables. Thus, each element has to lie in (0, 1).

Returns:

res : Theano variable

An array of the same size as target and prediction representing the pairwise divergences.

breze.arch.component.loss.bern_bern_kl(X, Y)

Return the Kullback-Leibler divergence between Bernoulli variables represented by their sufficient statistics.

Parameters:

X : Theano variable

An array of arbitrary shape where each element represents the statistic of a Bernoulli variable and thus should lie in (0, 1).

Y : Theano variable

An array of the same shape as target where each element represents the statistic of a Bernoulli variable and thus should lie in (0, 1).

Returns:

res : Theano variable

An array of the same size as target and prediction representing the pairwise divergences.

breze.arch.component.loss.ncac(target, embedding)

Return the NCA for classification loss.

This corresponds to the probability that a point is correctly classified with a soft knn classifier using leave-one-out. Each neighbour is weighted according to an exponential of its negative Euclidean distance. Afterwards, a probability is calculated for each class depending on the weights of the neighbours. For details, we refer you to

‘Neighbourhood Component Analysis’ by J Goldberger, S Roweis, G Hinton, R Salakhutdinov (2004).

Parameters:

target : Theano variable

An array of shape (n,) where n is the number of samples. Each entry of the array should be an integer between 0 and k - 1, where k is the number of classes.

embedding : Theano variable

An array of shape (n, d) where each row represents a point in``d``-dimensional space.

Returns:

res : Theano variable

Array of shape (n, 1) holding a probability that a point is classified correclty.

breze.arch.component.loss.ncar(target, embedding)

Return the NCA for regression loss.

This is similar to NCA for classification, except that not soft KNN classification but regression performance is maximized. (Actually, the negative performance is minimized.)

For details, we refer you to

‘Pose-sensitive embedding by nonlinear nca regression’ by Taylor, G. and Fergus, R. and Williams, G. and Spiro, I. and Bregler, C. (2010)

Parameters:

target : Theano variable

An array of shape (n, d) where n is the number of samples and d the dimensionalty of the target space.

embedding : Theano variable

An array of shape (n, d) where each row represents a point in d-dimensional space.

Returns:

res : Theano variable

Array of shape (n, 1).

breze.arch.component.loss.drlim(push_margin, pull_margin, c_contrastive, push_loss='squared', pull_loss='squared')

Return a function that implements the

‘Dimensionality reduction by learning an invariant mapping’ by Hadsell, R. and Chopra, S. and LeCun, Y. (2006).

For an example of such a function, see drlim1 with a margin of 1.

Parameters:

push_margin : Float

The minimum margin that negative pairs should be seperated by. Pairs seperated by higher distance than push_margin will not contribute to the loss.

pull_margin: Float

The maximum margin that positive pairs may be seperated by. Pairs seperated by lower distances do not contribute to the loss.

c_contrastive : Float

Coefficient to weigh the contrastive term relative to the positive term

push_loss : One of {‘squared’, ‘absolute’}, optional, default: ‘squared’

Loss to encourage Euclidean distances between non pairs.

pull_loss : One of {‘squared’, ‘absolute’}, optional, default: ‘squared’

Loss to punish Euclidean distances between pairs.

Returns:

loss : callable

Function that takes two arguments, a target and an embedding.