Loss functions¶
Module containing several losses usable for supervised and unsupervised training.
A loss is of the form:
def loss(target, prediction, ...):
...
The results depends on the exact nature of the loss. Some examples are:
- coordinate wise loss, such as a sum of squares or a Bernoulli cross entropy with a one-of-k target,
- sample wise, such as neighbourhood component analysis.
In case of the coordinate wise losses, the dimensionality of the result should be the same as that of the predictions and targets. In all other cases, it is important that the sample axes (usually the first axis) stays the same. The individual data points lie along the coordinate axis, which might change to 1.
Some examples of valid shape transformations:
(n, d) -> (n, d)
(n, d) -> (n, 1)
These are not valid:
(n, d) -> (1, d)
(n, d) -> (n,)
For some examples, consult the source code of this module.
-
breze.arch.component.loss.
squared
(target, prediction)¶ Return the element wise squared loss between the target and the prediction.
Parameters: target : Theano variable
An array of arbitrary shape representing representing the targets.
prediction : Theano variable
An array of arbitrary shape representing representing the predictions.
Returns: res : Theano variable
An array of the same shape as
target
andprediction
representing the pairwise distances.
-
breze.arch.component.loss.
absolute
(target, prediction)¶ Return the element wise absolute difference between the
target
and theprediction
.Parameters: target : Theano variable
An array of arbitrary shape representing representing the targets.
prediction : Theano variable
An array of arbitrary shape representing representing the predictions.
Returns: res : Theano variable
An array of the same shape as
target
andprediction
representing the pairwise distances.
-
breze.arch.component.loss.
cat_ce
(target, prediction, eps=1e-08)¶ Return the cross entropy between the
target
and theprediction
, whereprediction
is a summary of the statistics of a categorial distribution andtarget
is a some outcome.Used for multiclass classification purposes.
The loss is different to
ncat_ce
by thattarget
is not an array of integers but a hot k coding.Note that predictions are clipped between
eps
and1 - eps
to ensure numerical stability.Parameters: target : Theano variable
An array of shape
(n, k)
wheren
is the number of samples andk
is the number of classes. Each row represents a hot k coding. It should be zero except for one element, which has to be exactly one.prediction : Theano variable
An array of shape
(n, k)
. Each row is interpreted as a categorical probability. Thus, each row has to sum up to one and be non-negative.Returns: res : Theano variable.
An array of the same size as
target
andprediction
representing the pairwise divergences.
-
breze.arch.component.loss.
ncat_ce
(target, prediction)¶ Return the cross entropy between the
target
and theprediction
, whereprediction
is a summary of the statistics of the categorical distribution andtarget
is a some outcome.Used for classification purposes.
The loss is different to
cat_ce
by thattarget
is not a hot k coding but an array of integers.Parameters: target : Theano variable
An array of shape
(n,)
where n is the number of samples. Each entry of the array should be an integer between0
andk-1
, wherek
is the number of classes.prediction : Theano variable
An array of shape
(n, k)
or(t, n , k)
. Each row (i.e. entry in the last dimension) is interpreted as a categorical probability. Thus, each row has to sum up to one and be non-negative.Returns: res : Theano variable
An array of shape
(n, 1)
astarget
containing the log probability that that example is classified correctly.
-
breze.arch.component.loss.
bern_ces
(target, prediction)¶ Return the Bernoulli cross entropies between binary vectors
target
and a number of Bernoulli variablesprediction
.Used in regression on binary variables, not classification.
Parameters: target : Theano variable
An array of shape
(n, k)
wheren
is the number of samples and k is the number of outputs. Each entry should be either 0 or 1.prediction : Theano variable.
An array of shape
(n, k)
. Each row is interpreted as a set of statistics of Bernoulli variables. Thus, each element has to lie in(0, 1)
.Returns: res : Theano variable
An array of the same size as
target
andprediction
representing the pairwise divergences.
-
breze.arch.component.loss.
bern_bern_kl
(X, Y)¶ Return the Kullback-Leibler divergence between Bernoulli variables represented by their sufficient statistics.
Parameters: X : Theano variable
An array of arbitrary shape where each element represents the statistic of a Bernoulli variable and thus should lie in
(0, 1)
.Y : Theano variable
An array of the same shape as
target
where each element represents the statistic of a Bernoulli variable and thus should lie in(0, 1)
.Returns: res : Theano variable
An array of the same size as
target
andprediction
representing the pairwise divergences.
-
breze.arch.component.loss.
ncac
(target, embedding)¶ Return the NCA for classification loss.
This corresponds to the probability that a point is correctly classified with a soft knn classifier using leave-one-out. Each neighbour is weighted according to an exponential of its negative Euclidean distance. Afterwards, a probability is calculated for each class depending on the weights of the neighbours. For details, we refer you to
‘Neighbourhood Component Analysis’ by J Goldberger, S Roweis, G Hinton, R Salakhutdinov (2004).
Parameters: target : Theano variable
An array of shape
(n,)
wheren
is the number of samples. Each entry of the array should be an integer between0
andk - 1
, wherek
is the number of classes.embedding : Theano variable
An array of shape
(n, d)
where each row represents a point in``d``-dimensional space.Returns: res : Theano variable
Array of shape (n, 1) holding a probability that a point is classified correclty.
-
breze.arch.component.loss.
ncar
(target, embedding)¶ Return the NCA for regression loss.
This is similar to NCA for classification, except that not soft KNN classification but regression performance is maximized. (Actually, the negative performance is minimized.)
For details, we refer you to
‘Pose-sensitive embedding by nonlinear nca regression’ by Taylor, G. and Fergus, R. and Williams, G. and Spiro, I. and Bregler, C. (2010)
Parameters: target : Theano variable
An array of shape
(n, d)
wheren
is the number of samples andd
the dimensionalty of the target space.embedding : Theano variable
An array of shape
(n, d)
where each row represents a point ind
-dimensional space.Returns: res : Theano variable
Array of shape
(n, 1)
.
-
breze.arch.component.loss.
drlim
(push_margin, pull_margin, c_contrastive, push_loss='squared', pull_loss='squared')¶ Return a function that implements the
‘Dimensionality reduction by learning an invariant mapping’ by Hadsell, R. and Chopra, S. and LeCun, Y. (2006).
For an example of such a function, see drlim1 with a margin of 1.
Parameters: push_margin : Float
The minimum margin that negative pairs should be seperated by. Pairs seperated by higher distance than push_margin will not contribute to the loss.
pull_margin: Float
The maximum margin that positive pairs may be seperated by. Pairs seperated by lower distances do not contribute to the loss.
c_contrastive : Float
Coefficient to weigh the contrastive term relative to the positive term
push_loss : One of {‘squared’, ‘absolute’}, optional, default: ‘squared’
Loss to encourage Euclidean distances between non pairs.
pull_loss : One of {‘squared’, ‘absolute’}, optional, default: ‘squared’
Loss to punish Euclidean distances between pairs.
Returns: loss : callable
Function that takes two arguments, a target and an embedding.