# Pytorch 中可以直接调用的Loss Functions总结:
Pytorch 中可以直接调用的Loss Functions总结:
这里,我们想对Pytorch中可以直接调用的Loss Functions做一个简单的梳理,对于每个Loss Functions,标记了它的使用方法,并对一些不那么常见的Loss FunctionsLink了一些介绍它的Blogs,方便我们学习与使用这些Loss Functions。
文章目录
- Pytorch 中可以直接调用的Loss Functions总结:
- L1Loss
- MSELoss
- CROSSENTROPYLOSS
- CTCLoss
- NLLLoss
- PoissonNLLLoss
- GAUSSIANNLLLOSS
- KLDIVLOSS
- BCELOSS
- BCEWITHLOGITSLOSS
- MARGINRANKINGLOSS
- HingeEmbeddingLoss
- COSINEEMBEDDINGLOSS
- MultiLabelMarginLoss
- HuberLoss
- SmoothL1Loss
- SoftMarginLoss
- MultiLabelSoftMarginLoss
- MULTIMARGINLOSS
- TripletMarginLoss
- TripletMarginWithDistanceLoss
- nn.xx 与 nn.functional .xx区别:
文档链接: Loss Functions
L1Loss
用于测量输入中每个元素之间的平均绝对误差 (MAE)。
Creates a criterion that measures the mean absolute error (MAE) between each element in the input x* and target y.
torch.nn.L1Loss(size_average=None, reduce=None, reduction='mean')
参数:
- size_average (bool, optional) – Deprecated (see ). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field is set to , the losses are instead summed for each minibatch. Ignored when is . Default:
reduction``size_average``False``reduce``False``True
- reduce (bool, optional) – Deprecated (see ). By default, the losses are averaged or summed over observations for each minibatch depending on . When is , returns a loss per batch element instead and ignores . Default:
reduction``size_average``reduce``False``size_average``True
- reduction (string*,* optional) – Specifies the reduction to apply to the output: | | . : no reduction will be applied, : the sum of the output will be divided by the number of elements in the output, : the output will be summed. Note: and are in the process of being deprecated, and in the meantime, specifying either of those two args will override . Default:
'none'``'mean'``'sum'``'none'``'mean'``'sum'``size_average``reduce``reduction``'mean'
使用:
>>> loss = nn.L1Loss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5)
>>> output = loss(input, target)
>>> output.backward()
MSELoss
用于测量输入中每个元素之间的均方误差(L2 范数)
torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')
参数:
- size_average (bool, optional) – Deprecated (see
reduction
). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the fieldsize_average
is set toFalse
, the losses are instead summed for each minibatch. Ignored whenreduce
isFalse
. Default:True
- reduce (bool, optional) – Deprecated (see
reduction
). By default, the losses are averaged or summed over observations for each minibatch depending onsize_average
. Whenreduce
isFalse
, returns a loss per batch element instead and ignoressize_average
. Default:True
- reduction (string*,* optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output,'sum'
: the output will be summed. Note:size_average
andreduce
are in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction
. Default:'mean'
使用:
loss = nn.MSELoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()
CROSSENTROPYLOSS
此标准计算输入和目标之间的交叉熵损失
torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=- 100, reduce=None, reduction='mean', label_smoothing=0.0)
The input is expected to contain raw, unnormalized scores for each class. input has to be a Tensor of size ©(C) for unbatched input,(minibatc**h,C) or (minibatch, C, d_1, d_2, …, d_K)(minibatc**h,C,d1,d2,…,d**K) with K \geq 1K≥1 for the K-dimensional case. The last being useful for higher dimension inputs, such as computing cross entropy loss per-pixel for 2D images.
参数:
- weight (Tensor, optional) – a manual rescaling weight given to each class. If given, has to be a Tensor of size C
- size_average (bool, optional) – Deprecated (see
reduction
). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the fieldsize_average
is set toFalse
, the losses are instead summed for each minibatch. Ignored whenreduce
isFalse
. Default:True
- ignore_index (int, optional) – Specifies a target value that is ignored and does not contribute to the input gradient. When
size_average
isTrue
, the loss is averaged over non-ignored targets. Note thatignore_index
is only applicable when the target contains class indices.- reduce (bool, optional) – Deprecated (see
reduction
). By default, the losses are averaged or summed over observations for each minibatch depending onsize_average
. Whenreduce
isFalse
, returns a loss per batch element instead and ignoressize_average
. Default:True
- reduction (string*,* optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the weighted mean of the output is taken,'sum'
: the output will be summed. Note:size_average
andreduce
are in the process of being deprecated, and in the meantime, specifying either of those two args will overridereduction
. Default:'mean'
- label_smoothing (float, optional) – A float in [0.0, 1.0]. Specifies the amount of smoothing when computing the loss, where 0.0 means no smoothing. The targets become a mixture of the original ground truth and a uniform distribution as described in Rethinking the Inception Architecture for Computer Vision. Default: 0.00.0.
使用:
# Example of target with class indices
loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)
output.backward()
# Example of target with class probabilities
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5).softmax(dim=1)
output = loss(input, target)
output.backward()
CTCLoss
CTC loss 理解_代码款款的博客-CSDN博客_ctc loss
CTC Loss原理 - 知乎 (zhihu.com)
计算连续(未分段)时间序列和目标序列之间的损失。CTCLoss 对输入与目标可能对齐的概率求和,生成一个相对于每个输入节点可微分的损失值。假定输入与目标的对齐方式为"多对一"
torch.nn.CTCLoss(blank=0, reduction='mean', zero_infinity=False)
参数:
- Log_probs: Tensor of size (T, N, C)(T,N,C) or (T, C)(T,C), where T = \text{input length}T=input length, N = \text{batch size}N=batch size, and C = \text{number of classes (including blank)}C=number of classes (including blank). The logarithmized probabilities of the outputs (e.g. obtained with
torch.nn.functional.log_softmax()
).- Targets: Tensor of size (N, S)(N,S) or (\operatorname{sum}(\text{target_lengths}))(sum(target_lengths)), where N = \text{batch size}N=batch size and S = \text{max target length, if shape is } (N, S)S=max target length, if shape is (N,S). It represent the target sequences. Each element in the target sequence is a class index. And the target index cannot be blank (default=0). In the (N, S)(N,S) form, targets are padded to the length of the longest sequence, and stacked. In the (\operatorname{sum}(\text{target_lengths}))(sum(target_lengths)) form, the targets are assumed to be un-padded and concatenated within 1 dimension.
- Input_lengths: Tuple or tensor of size (N)(N) or ()(), where N = \text{batch size}N=batch size. It represent the lengths of the inputs (must each be \leq T≤T). And the lengths are specified for each sequence to achieve masking under the assumption that sequences are padded to equal lengths.
- Target_lengths: Tuple or tensor of size (N)(N) or ()(), where N = \text{batch size}N=batch size. It represent lengths of the targets. Lengths are specified for each sequence to achieve masking under the assumption that sequences are padded to equal lengths. If target shape is (N,S)(N,S), target_lengths are effectively the stop index s_ns**n for each target sequence, such that
target_n = targets[n,0:s_n]
for each target in a batch. Lengths must each be \leq S≤S If the targets are given as a 1d tensor that is the concatenation of individual targets, the target_lengths must add up to the total length of the tensor.- Output: scalar. If
reduction
is'none'
, then (N)(N) if input is batched or ()() if input is unbatched, where N = \text{batch size}N=batch size.
使用:
# Target are to be padded
T = 50 # Input sequence length
C = 20 # Number of classes (including blank)
N = 16 # Batch size
S = 30 # Target sequence length of longest target in batch (padding length)
S_min = 10 # Minimum target length, for demonstration purposes
# Initialize random batch of input vectors, for *size = (T,N,C)
input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_()
# Initialize random batch of targets (0 = blank, 1:C = classes)
target = torch.randint(low=1, high=C, size=(N, S), dtype=torch.long)
input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)
target_lengths = torch.randint(low=S_min, high=S, size=(N,), dtype=torch.long)
ctc_loss = nn.CTCLoss()
loss = ctc_loss(input, target, input_lengths, target_lengths)
loss.backward()
# Target are to be un-padded
T = 50 # Input sequence length
C = 20 # Number of classes (including blank)
N = 16 # Batch size
# Initialize random batch of input vectors, for *size = (T,N,C)
input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_()
input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)
# Initialize random batch of targets (0 = blank, 1:C = classes)
target_lengths = torch.randint(low=1, high=T, size=(N,), dtype=torch.long)
target = torch.randint(low=1, high=C, size=(sum(target_lengths),), dtype=torch.long)
ctc_loss = nn.CTCLoss()
loss = ctc_loss(input, target, input_lengths, target_lengths)
loss.backward()
# Target are to be un-padded and unbatched (effectively N=1)
T = 50 # Input sequence length
C = 20 # Number of classes (including blank)
# Initialize random batch of input vectors, for *size = (T,C)
input = torch.randn(T, C).log_softmax(2).detach().requires_grad_()
input_lengths = torch.tensor(T, dtype=torch.long)
# Initialize random batch of targets (0 = blank, 1:C = classes)
target_lengths = torch.randint(low=1, high=T, size=(), dtype=torch.long)
target = torch.randint(low=1, high=C, size=(target_lengths,), dtype=torch.long)
ctc_loss = nn.CTCLoss()
loss = ctc_loss(input, target, input_lengths, target_lengths)
loss.backward()
NLLLoss
详解torch.nn.NLLLOSS - 知乎 (zhihu.com)
log_softmax与softmax的区别在哪里? - 知乎 (zhihu.com)
使用:
m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
# input is of size N x C = 3 x 5
input = torch.randn(3, 5, requires_grad=True)
# each element in target has to have 0 <= value < C
target = torch.tensor([1, 0, 4])
output = loss(m(input), target)
output.backward()
# 2D loss example (used, for example, with image inputs)
N, C = 5, 4
loss = nn.NLLLoss()
# input is of size N x C x height x width
data = torch.randn(N, 16, 10, 10)
conv = nn.Conv2d(16, C, (3, 3))
m = nn.LogSoftmax(dim=1)
# each element in target has to have 0 <= value < C
target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C)
output = loss(m(conv(data)), target)
output.backward()
PoissonNLLLoss
目标泊松分布的负对数似然损失。
使用:
loss = nn.PoissonNLLLoss()
log_input = torch.randn(5, 2, requires_grad=True)
target = torch.randn(5, 2)
output = loss(log_input, target)
output.backward()
GAUSSIANNLLLOSS
真实标签服从高斯分布的负对数似然损失,神经网络的输出作为高斯分布的均值和方差。
对于包含[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-q4kc1uHT-1664029582934)(https://math.jianshu.com/math?formula=N)]个样本的batch数据 [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-h83QFnzx-1664029582935)(https://math.jianshu.com/math?formula=D(x%2C%20var%2C%20y)]),[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-HqpRvEU0-1664029582936)(https://math.jianshu.com/math?formula=x)]神经网络的输出,作为高斯分布的均值,[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-rx7lIhr4-1664029582937)(https://math.jianshu.com/math?formula=var)]神经网络的输出,作为高斯分布的方差,[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-j8SxYO5A-1664029582939)(https://math.jianshu.com/math?formula=y)]是样本对应的标签,服从高斯分布。[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5TpBJomt-1664029582940)(https://math.jianshu.com/math?formula=x)]与[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-i9xZLG0R-1664029582941)(https://math.jianshu.com/math?formula=y)]的维度相同,[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-J9stE5rL-1664029582942)(https://math.jianshu.com/math?formula=var)]和[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-xavLHkx4-1664029582943)(https://math.jianshu.com/math?formula=x)]的维度相同,或者最后一个维度不同且最后一个维度为1,可以进行broadcast。
参考链接:
loss函数之PoissonNLLLoss,GaussianNLLLoss - 简书 (jianshu.com)
使用:
loss = nn.GaussianNLLLoss()
input = torch.randn(5, 2, requires_grad=True)
target = torch.randn(5, 2)
var = torch.ones(5, 2, requires_grad=True) #heteroscedastic
output = loss(input, target, var)
output.backward()
KLDIVLOSS
KL散度,又叫相对熵,用于衡量两个分布(离散分布和连续分布)之间的距离。
设[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4ZRB6KpE-1664029582944)(https://math.jianshu.com/math?formula=p(x)]) 、[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WFmovzOi-1664029582945)(https://math.jianshu.com/math?formula=q(x)]) 是离散随机变量[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Xle4sPs0-1664029582947)(https://math.jianshu.com/math?formula=X)]的两个概率分布,则[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-A3Ob8twF-1664029582948)(https://math.jianshu.com/math?formula=p)] 对[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-DsIYvCCP-1664029582948)(https://math.jianshu.com/math?formula=q)] 的KL散度是:
D
K
L
(
p
∥
q
)
=
E
p
(
x
)
log
p
(
x
)
q
(
x
)
=
∑
i
=
1
N
p
(
x
i
)
⋅
(
log
p
(
x
i
)
−
log
q
(
x
i
)
)
D_{K L}(p \| q)=E_{p(x)} \log \frac{p(x)}{q(x)}=\sum_{i=1}^{N} p\left(x_{i}\right) \cdot\left(\log p\left(x_{i}\right)-\log q\left(x_{i}\right)\right)
DKL(p∥q)=Ep(x)logq(x)p(x)=i=1∑Np(xi)⋅(logp(xi)−logq(xi))
参考链接:
loss函数之KLDivLoss - 简书 (jianshu.com)
使用:
kl_loss = nn.KLDivLoss(reduction="batchmean")
# input should be a distribution in the log space
input = F.log_softmax(torch.randn(3, 5, requires_grad=True))
# Sample a batch of distributions. Usually this would come from the dataset
target = F.softmax(torch.rand(3, 5))
output = kl_loss(input, target)
kl_loss = nn.KLDivLoss(reduction="batchmean", log_target=True)
log_target = F.log_softmax(torch.rand(3, 5))
output = kl_loss(input, log_target)
BCELOSS
loss函数之BCELoss - 简书 (jianshu.com)
使用:
m = nn.Sigmoid()
loss = nn.BCELoss()
input = torch.randn(3, requires_grad=True)
target = torch.empty(3).random_(2)
output = loss(m(input), target)
output.backward()
BCEWITHLOGITSLOSS
这个东西,本质上和nn.BCELoss()没有区别,只是在BCELoss上加了个logits函数(也就是sigmoid函数)
使用:
loss = nn.BCEWithLogitsLoss()
input = torch.randn(3, requires_grad=True)
target = torch.empty(3).random_(2)
output = loss(input, target)
output.backward()
MARGINRANKINGLOSS
loss函数之MarginRankingLoss - 简书 (jianshu.com)
使用:
loss = nn.MarginRankingLoss()
input1 = torch.randn(3, requires_grad=True)
input2 = torch.randn(3, requires_grad=True)
target = torch.randn(3).sign()
output = loss(input1, input2, target)
output.backward()
HingeEmbeddingLoss
用于判断两个向量是否相似,输入是两个向量之间的距离。 常用于非线性词向量学习以及半监督学习。
loss函数之CosineEmbeddingLoss,HingeEmbeddingLoss_ltochange的博客-CSDN博客_余弦相似度损失函数
COSINEEMBEDDINGLOSS
余弦相似度损失函数,用于判断输入的两个向量是否相似。 常用于非线性词向量学习以及半监督学习。
loss函数之CosineEmbeddingLoss,HingeEmbeddingLoss_ltochange的博客-CSDN博客_余弦相似度损失函数
MultiLabelMarginLoss
多分类合页损失函数(hinge loss),对于一个样本不是考虑样本输出与真实类别之间的误差,而是考虑对应真实类别与其他类别之间的误差
loss函数之MultiMarginLoss, MultiLabelMarginLoss_ltochange的博客-CSDN博客
使用:
loss = nn.MultiLabelMarginLoss()
x = torch.FloatTensor([[0.1, 0.2, 0.4, 0.8]])
# for target y, only consider labels 3 and 0, not after label -1
y = torch.LongTensor([[3, 0, -1, 1]])
loss(x, y)
# 0.25 * ((1-(0.1-0.2)) + (1-(0.1-0.4)) + (1-(0.8-0.2)) + (1-(0.8-0.4)))
HuberLoss
回归损失函数:Huber Loss_Peanut_范的博客-CSDN博客_huber loss
一个损失函数,y是真实值,f(x)是预测值,δ是HuberLoss的参数,当预测偏差小于δ时,它采用平方误差,当预测偏差大于δ,采用线性误差。相比于最小二乘的线性回归,Huber Loss降低了对异常点的惩罚程度,是一种常用的robust regression的损失函数
SmoothL1Loss
创建一个条件,如果绝对元素误差低于 beta,则使用平方项,否则使用 L1 项。它对异常值的敏感度低于torch.nn.MSELoss
,并且在某些情况下可以防止梯度爆炸(例如,参见Ross Girshick的论文Fast R-CNN)。
SoftMarginLoss
loss函数之SoftMarginLoss - 简书 (jianshu.com)
MultiLabelSoftMarginLoss
MultiLabelSoftMarginLoss函数_Coding-Prince的博客-CSDN博客_multilabelsoftmarginloss
MULTIMARGINLOSS
多分类合页损失函数(hinge loss),对于一个样本不是考虑样本输出与真实类别之间的误差,而是考虑对应真实类别与其他类别之间的误差
loss函数之MultiMarginLoss, MultiLabelMarginLoss_旺旺棒棒冰的博客-CSDN博客
使用:
loss = nn.MultiMarginLoss()
x = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
y = torch.tensor([3])
loss(x, y)
# 0.25 * ((1-(0.8-0.1)) + (1-(0.8-0.2)) + (1-(0.8-0.4)))
TripletMarginLoss
PyTorch TripletMarginLoss(三元损失)_zj134_的博客-CSDN博客_pytorch 三元组损失
TripletMarginWithDistanceLoss
loss函数之TripletMarginLoss与TripletMarginWithDistanceLoss_ltochange的博客-CSDN博客
nn.xx 与 nn.functional .xx区别:
参考回答:
作者:肥波喇齐
链接:https://www.zhihu.com/question/66782101/answer/579393790
我们经常看到,二者有很多相同的loss函数,他们使用时有什么区别呢?
两者的相同之处:
nn.Xxx
和nn.functional.xxx
的实际功能是相同的,即nn.Conv2d
和nn.functional.conv2d
都是进行卷积,nn.Dropout
和nn.functional.dropout
都是进行dropout,。。。。。;- 运行效率也是近乎相同。
nn.functional.xxx
是函数接口,而nn.Xxx
是nn.functional.xxx
的类封装,并且**nn.Xxx
都继承于一个共同祖先nn.Module
。**这一点导致nn.Xxx
除了具有nn.functional.xxx
功能之外,内部附带了nn.Module
相关的属性和方法,例如train(), eval(),load_state_dict, state_dict
等。
什么时候使用nn.functional.xxx
,什么时候使用nn.Xxx
?
这个问题依赖于你要解决你问题的复杂度和个人风格喜好。在
nn.Xxx
不能满足你的功能需求时,nn.functional.xxx
是更佳的选择,因为nn.functional.xxx
更加的灵活(更加接近底层),你可以在其基础上定义出自己想要的功能。
个人偏向于在能使用nn.Xxx
情况下尽量使用,不行再换nn.functional.xxx
,感觉这样更能显示出网络的层次关系,也更加的纯粹(所有layer和model本身都是Module,一种和谐统一的感觉)。
一点导致nn.Xxx
除了具有nn.functional.xxx
功能之外,内部附带了nn.Module
相关的属性和方法,例如train(), eval(),load_state_dict, state_dict
等。
什么时候使用nn.functional.xxx
,什么时候使用nn.Xxx
?
这个问题依赖于你要解决你问题的复杂度和个人风格喜好。在
nn.Xxx
不能满足你的功能需求时,nn.functional.xxx
是更佳的选择,因为nn.functional.xxx
更加的灵活(更加接近底层),你可以在其基础上定义出自己想要的功能。
个人偏向于在能使用nn.Xxx
情况下尽量使用,不行再换nn.functional.xxx
,感觉这样更能显示出网络的层次关系,也更加的纯粹(所有layer和model本身都是Module,一种和谐统一的感觉)。