Pytorch学习率lr衰减(decay)(scheduler)
1、手动修改optimizer中的lr
import matplotlib.pyplot as plt from torch import nn import torch class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.net = nn.Linear(10,10) def forward(self, input): out = self.net(input) return out model = Net() LR = 0.01 optimizer = torch.optim.Adam(model.parameters(), lr = LR) lr_list = [] for epoch in range(100): if epoch % 5 == 0: for p in optimizer.param_groups: p['lr'] *= 0.9#注意这里 lr_list.append(optimizer.state_dict()['param_groups'][0]['lr']) plt.plot(range(100), lr_list, color = 'r') plt.show()
2. 动态调整学习率
torch.optim.lr_scheduler
在
torch.optim.lr_scheduler
上,基于当前epoch的数值,为我们封装了几种相应的动态学习率调整方法① lr_scheduler.LambdaLR
torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1)
lr_lambda 会接收到一个int参数:epoch,
然后根据epoch计算出对应的lr
。如果设置多个lambda函数的话,会分别作用于Optimizer中的不同的params_groupimport matplotlib.pyplot as plt from torch import nn import torch from torch import optim class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.net = nn.Linear(10,10) def forward(self, input): out = self.net(input) return out import numpy as np lr_list = [] model = Net() LR = 0.01 optimizer = optim.Adam(model.parameters(),lr = LR) lambda1 = lambda epoch:np.sin(epoch) / epoch scheduler = optim.lr_scheduler.LambdaLR(optimizer,lr_lambda = lambda1) for epoch in range(100): scheduler.step() lr_list.append(optimizer.state_dict()['param_groups'][0]['lr']) plt.plot(range(100),lr_list,color = 'r') plt.show()
② lr_scheduler.StepLR 阶梯式衰减
torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)
每个一定的epoch,
lr会自动乘以gamma
进行阶梯式衰减注意:pytorch1.1.0之后
scheduler.step()
要放在optimizer.step()
之后!!!import matplotlib.pyplot as plt from torch import nn import torch from torch import optim class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.net = nn.Linear(10,10) def forward(self, input): out = self.net(input) return out lr_list = [] model = Net() LR = 0.01 optimizer = optim.Adam(model.parameters(),lr = LR) scheduler = optim.lr_scheduler.StepLR(optimizer,step_size=5,gamma = 0.8) for epoch in range(100): scheduler.step() lr_list.append(optimizer.state_dict()['param_groups'][0]['lr']) plt.plot(range(100),lr_list,color = 'r') plt.show()
③ lr_scheduler.MultiStepLR——多阶梯式衰减
三段式lr,epoch进入milestones范围内即乘以gamma,离开milestones范围之后再乘以gamma。这种衰减方式也是在学术论文中最常见的方式,一般手动调整也会采用这种方法。
import matplotlib.pyplot as plt from torch import nn import torch from torch import optim class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.net = nn.Linear(10,10) def forward(self, input): out = self.net(input) return out lr_list = [] model = Net() LR = 0.01 optimizer = optim.Adam(model.parameters(),lr = LR) scheduler = optim.lr_scheduler.MultiStepLR(optimizer,milestones=[20,80],gamma = 0.9) for epoch in range(100): scheduler.step() lr_list.append(optimizer.state_dict()['param_groups'][0]['lr']) plt.plot(range(100),lr_list,color = 'r') plt.show()
④ExponentialLR——指数连续衰减
torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1)
每个epoch中lr都乘以gamma
import matplotlib.pyplot as plt from torch import nn import torch from torch import optim class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.net = nn.Linear(10,10) def forward(self, input): out = self.net(input) return out lr_list = [] model = Net() LR = 0.01 optimizer = optim.Adam(model.parameters(),lr = LR) scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9) for epoch in range(100): scheduler.step() lr_list.append(optimizer.state_dict()['param_groups'][0]['lr']) plt.plot(range(100),lr_list,color = 'r') plt.show()
⑤ ReduceLROnPlateau
在发现loss不再降低或者acc不再提高之后,降低学习率。
torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
各参数意义如下:
mode:'min'模式检测metric是否不再减小,'max'模式检测metric是否不再增大;
factor: 触发条件后lr*=factor;
patience:不再减小(或增大)的累计次数;
verbose:触发条件后print;
threshold:只关注超过阈值的显著变化;
threshold_mode:有rel和abs两种阈值计算模式,rel规则:max模式下如果超过best(1+threshold)为显著,min模式下如果低于best(1-threshold)为显著;abs规则:max模式下如果超过best+threshold为显著,min模式下如果低于best-threshold为显著;
cooldown:触发一次条件后,等待一定epoch再进行检测,避免lr下降过速;
min_lr:最小的允许lr;
eps:如果新旧lr之间的差异小与1e-8,则忽略此次更新。
class torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-8) # patience=10代表的是耐心值为10, # 当loss出现10次不变化时,即开始调用learning rate decat功能 optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay) scheduler = ReduceLROnPlateau(optimizer, 'min') # min代表希望的目标减少的loss scheduler.step(loss_val) # 设置监听的是loss