初识Pytorch(二) -- 神经网络搭建

1 神经网络基本骨架

pytorch关于神经网络的工具主要在torch.nn中(Neural Network)

官网文档:https://docs.pytorch.org/docs/stable/nn.html

  • Containers # 主要给神经网络定义了一些骨架(结构),往结构中添加不同的内容就可以组成神经网络

  • Convolution Layers # 卷积层

  • Pooling layers # 池化层

  • Padding Layers # 填充层

  • Non-linear Activations (weighted sum, nonlinearity) # 非线性激活

  • Non-linear Activations (other)

  • Normalization Layers # 标准化(归一化)层

1.1 骨架 Containers

Containers包含6个模块,Module是最常用的模块,给所有神经网络提供一个基本的骨架

Module Base class for all neural network modules.
Sequential A sequential container.
ModuleList Holds submodules in a list.
ModuleDict Holds submodules in a dictionary.
ParameterList Holds parameters in a list.
ParameterDict Holds parameters in a dictionary.

1.1.1 Module的使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import torch
import torch.nn as nn

class TdModel(nn.Module): # 创建自定义模型,需要继承nn.Module
def __init__(self):
super().__init__() # 调用父类的初始化函数
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)

def forward(self, x): # 前向传播
output = x + 1
return output

td = TdModel() # 实例化模型对象
x = torch.tensor(1.0) # 设定输入值
output = td(x) # 调用模型运算
print(output)

1.1.2 模型搭建 Sequential

按照顺序执行pytorch模型中的各项功能。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Using Sequential to create a small model. When `model` is run,
# input will first be passed to `Conv2d(1,20,5)`. The output of
# `Conv2d(1,20,5)` will be used as the input to the first
# `ReLU`; the output of the first `ReLU` will become the input
# for `Conv2d(20,64,5)`. Finally, the output of
# `Conv2d(20,64,5)` will be used as input to the second `ReLU`
model = nn.Sequential(
nn.Conv2d(1,20,5),
nn.ReLU(),
nn.Conv2d(20,64,5),
nn.ReLU()
)

# Using Sequential with OrderedDict. This is functionally the
# same as the above code
model = nn.Sequential(OrderedDict([
('conv1', nn.Conv2d(1,20,5)),
('relu1', nn.ReLU()),
('conv2', nn.Conv2d(20,64,5)),
('relu2', nn.ReLU())
]))

以 CIFAR 10 的结构为例

image-20250604103242875

不使用 Sequential ,神经网络模型结构过程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
from torch import nn
import torch

class TdModule(nn.Module):
def __init__(self):
super(TdModule, self).__init__()
self.conv1 = nn.Conv2d(3, 32, kernel_size=5, padding=2)
self.maxpool1 = nn.MaxPool2d(2)
self.conv2 = nn.Conv2d(32, 32, kernel_size=5, padding=2)
self.maxpool2 = nn.MaxPool2d(2)
self.conv3 = nn.Conv2d(32, 64, kernel_size=5, padding=2)
self.maxpool3 = nn.MaxPool2d(2)
self.flatten = nn.Flatten()
self.linear1 = nn.Linear(1024, 64)
self.linear2 = nn.Linear(64, 10)

def forward(self, x):
x = self.conv1(x)
x = self.maxpool1(x)
x = self.conv2(x)
x = self.maxpool2(x)
x = self.conv3(x)
x = self.maxpool3(x)
x = self.flatten(x)
x = self.linear1(x)
x = self.linear2(x)
return x

td = TdModule()
print(td)

# 模型输出结果查看
input = torch.randn(64, 3, 32, 32) # 测试数据
output = td(input)
print(output.size())

使用 Sequential 组建神经网络模型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from torch import nn
import torch

class TdModule(nn.Module):
def __init__(self):
super(TdModule, self).__init__()
self.model1 = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=5, padding=2),
nn.MaxPool2d(2),
nn.Conv2d(32, 32, kernel_size=5, padding=2),
nn.MaxPool2d(2),
nn.Conv2d(32, 64, kernel_size=5, padding=2),
nn.MaxPool2d(2),
nn.Flatten(),
nn.Linear(1024, 64),
nn.Linear(64, 10),
)

def forward(self, x):
x = self.model1(x)
return x

td = TdModule()

# 模型输出结果查看
input = torch.randn(64, 3, 32, 32) # 测试数据
output = td(input)
print(output.size())

使用 tensotboard 可视化模型架构

1
2
3
4
5
from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter('logs')
writer.add_graph(TdModule(), input)
writer.close()

1.2 卷积层 Convolution Layers

torch.nn是对torch.nn.functional的一个封装

nn.Conv1d表示一维,nn.Conv2d表示二维(图片是2D),nn.Conv3d表示三维

卷积操作可视化:https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md

1.2.1 nn.functional.conv2d

torch.nn.functional.conv2d用法,注意数据要使用tensor格式!!

1
2
3
4
torch.nn.functional.conv2d(
input, weight, bias=None, stride=1,
padding=0, dilation=1, groups=1
)
  • input 输入,形状要求 (minibatch,in_channels,iH,iW)
  • weight 权重 / 卷积核,形状要求 (out_channels,in_channelsgroups,kH,kW)
  • bias 偏置,形状要求 (out_channels)(out_channels),默认值:None
  • stride 步长,卷积核每次移动步长,2d卷积中要求长宽,可以是单个数字或一个元组 (sH, sW),默认是1
  • padding 填充,在输入图像的左右两边进行填充,padding决定填充多大范围,可以是单个数字或一个元组 (padH, padW),默认是0(不填充)
    • 如果不填充白边,那么边缘的图像特征就会缺失

示例代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import torch
import torch.nn.functional as F

input = torch.tensor( # 输入图像
[[1,2,0,3,1],
[0,1,2,3,1],
[1,2,1,0,0],
[5,2,3,1,1],
[2,1,0,1,1]]
)
kernel = torch.tensor( # 卷积核,就是权重weight
[[1,2,1],
[0,1,0],
[2,1,0]]
)

input = torch.reshape(input, (1,1,5,5)) # minibatch为1,1通道,5H*5W
kernel = torch.reshape(kernel, (1,1,3,3))

output = F.conv2d(input, kernel, stride=1) # 步长为1
print(output)
output = F.conv2d(input, kernel, stride=2) # 步长为2
print(output)

1.2.2 nn.Conv2d

nn.Conv2d用法

1
2
3
4
5
torch.nn.Conv2d(
in_channels, out_channels, kernel_size, stride=1, padding=0, # 常用的五个参数
dilation=1, groups=1, bias=True, padding_mode='zeros',
device=None, dtype=None
)
  • in_channels (int) – 输入图像中的通道数
  • out_channels (int) – 卷积后输出的通道数
  • kernel_size (int or tuple) – 卷积核的大小,一个数字时为n*n大小的卷积核,不规则核设置为 (行, 宽)
  • stride (int or tuple, optional) – 卷积的步长,2d卷积中要求长宽,可以是单个数字或一个元组 (sH, sW),默认是1
  • padding (int, tuple or str, optional) – 在输入的四个边上添加填充,padding决定填充多大范围,默认是0(不填充)
  • padding_mode (str, optional) –选择padding填充的时候,按什么模式进行填充 ‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’. 默认: ‘zeros’
  • dilation (int or tuple, optional) – 内核元素之间的间距,默认是1
  • groups (int, optional) – 从输入通道到输出通道的阻塞连接数,默认是1
  • bias (bool, optional) – 为True时向输出添加可学习的偏差,默认True

代码实例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import torch
import torchvision
from torch import nn
from torch.utils.data import DataLoader

dataset = torchvision.datasets.CIFAR10(root='./dataset', train=False, transform=torchvision.transforms.ToTensor(), download=True)
dataloader = DataLoader(dataset, batch_size=64)

class TdModule(nn.Module): # 自定义模型
def __init__(self):
super(TdModule, self).__init__() # 完成父类初始化
self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0)

def forward(self, x):
x = self.conv1(x)
return x

td = TdModule()
writer = SummaryWriter('./logs') # 使用tensorboard可视化模型结果
step = 0
for data in dataloader:
imgs, targets = data
output = td(imgs)
# print(imgs.shape) # torch.Size([64, 3, 32, 32])
# print(output.shape) # torch.Size([64, 6, 30, 30])
writer.add_image('input', imgs, step, dataformats='NCHW') # 输入图像
output = torch.reshape(output, (-1, 3, 30, 30)) # 输出是6通道,修改为3通道进行显示
writer.add_image('output', output, step, dataformats='NCHW')
step += 1

卷积层说明,如下图所示。

输入图像是 224 x 224 x 3,经过一个卷积和非线性激活后,变成 224 x 224 x 64

输出层的高宽计算公式见pytorch官方文档:https://docs.pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d

image-20250604103633371

1.3 池化层 Pooling layers

最大池化(MaxPool)也称为下采样,上采样(MaxUnpool),平均池化(AvgPool),自适应的最大池化(AdaptiveMaxPool)

1.3.1 nn.MaxPool2d

1
2
3
4
torch.nn.MaxPool2d(
kernel_size, stride=None, padding=0,
dilation=1, return_indices=False, ceil_mode=False
)
  • kernel_size (Union[int,* tuple*[int,* int*]]) – 设置用来取最大值的窗口尺寸,类似卷积核,设置为3的时候会生成一个3*3的窗口
  • stride (Union[int,* tuple*[int,* int*]]) – 窗口的步长,默认为kernel_size
  • padding (Union[int,* tuple*[int,* int*]]) – 在两侧添加填充
  • dilation (Union[int,* tuple*[int,* int*]]) – 窗口中各元素之间的步长
  • return_indices (bool) – 为True时返回最大索引和输出,用于 torch.nn.MaxUnpool2d 之后
  • ceil_mode (bool) – 为True时使用 ceil 而不是 floor 来计算输出形状,ceil对缺失信息部分也进行上采样,floor会对缺失信息部分进行舍弃
    • floor向下取整,ceil向上取整,如下图所示
    • image-20250604103717552

实例代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import torch
from torch import nn

input = torch.tensor(
[[1,2,0,3,1],
[0,1,2,3,1],
[1,2,1,0,0],
[5,2,3,1,1],
[2,1,0,1,1]], dtype=torch.float32
)
input = torch.reshape(input, (-1, 1, 5, 5)) # 变形为[minibatchsize,通道数,长,宽]

class TdModel(nn.Module):
def __init__(self):
super(TdModel, self).__init__()
self.maxpool1 = nn.MaxPool2d(kernel_size=3, ceil_mode=False)

def forward(self, input):
output = self.maxpool1(input)
return output

td = TdModel()
output = td(input)
print(output)

1.3.2 最大池化作用

最大池化的目的是保留输入的特征,同时把数据量减小

通过tensorboard可视化最大池化效果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import torchvision
from torch.utils.tensorboard import SummaryWriter

dataset = torchvision.datasets.CIFAR10(root='./dataset', train=False, download=True,
transform=torchvision.transforms.ToTensor())
dataloader = torch.utils.data.DataLoader(dataset, batch_size=64)

class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.maxpool1 = nn.MaxPool2d(kernel_size=3, ceil_mode=False)

def forward(self, x):
output = self.maxpool1(x)
return output

td = Net()
writer = SummaryWriter('logs')
step = 0
for data in dataloader:
imgs, targets = data
writer.add_image('input', imgs, step, dataformats='NCHW')
outputs = td(imgs)
writer.add_image('outputs', outputs, step, dataformats='NCHW')
step += 1

writer.close()

1.4 非线性激活 Non-linear Activations

非线性激活主要为了给神经网络引入非线性的特征,常见的有:

  • ReLU函数,输入大于0时输出原值,小于0 时输出0
  • Sigmoid函数,通过公式计算,输出结果在 (0,1) 范围内,常用于二分类问题

1.4.1 ReLU函数

pytorch文档地址:https://docs.pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU

1
2
3
4
5
6
7
8
9
torch.nn.ReLU(inplace=False)

input = -1
ReLU(input, inplace=True)
# input = 0

output = ReLU(input, inplace=False)
# input = -1
# output = 0
  • inplace为True时,会将输入内存地址的数据直接改变,就是原地操作,无需返回值
  • inplace为False时,保留原始数据,需要采用返回值的形式接收改变后的数据

实例代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import torch
from torch import nn

input = torch.tensor(
[[1, -0.5],
[-1, 3]]
)
# 输入添加上batchsize参数,图像是1维的,高宽都是2
input = torch.reshape(input, (-1, 1, 2, 2))

class TdModule(nn.Module):
def __init__(self):
super(TdModule, self).__init__()
self.relu1 = nn.ReLU()

def forward(self, x):
output = self.relu1(x)
return output

Td = TdModule()
output = Td(input)
print(output)

1.4.2 Sigmoid函数

pytorch文档地址:https://docs.pytorch.org/docs/stable/generated/torch.nn.Sigmoid.html#torch.nn.Sigmoid

1
torch.nn.Sigmoid(*args, **kwargs)

使用tensorboard可视化结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from torch.utils.tensorboard import SummaryWriter
import torch
from torch import nn
import torchvision

dataset = torchvision.datasets.CIFAR10(root='./dataset', train=False, download=True, transform=torchvision.transforms.ToTensor())
dataloader = torch.utils.data.DataLoader(dataset, batch_size=64)

class TdModule(nn.Module):
def __init__(self):
super(TdModule, self).__init__()
self.relu1 = nn.ReLU()
self.sigmoid1 = nn.Sigmoid()

def forward(self, x):
output = self.sigmoid1(x)
return output

Td = TdModule()
writer = SummaryWriter('logs')
step = 0
for data in dataloader:
imgs, targets = data
writer.add_image('input', imgs, step, dataformats='NCHW')
outputs = Td(imgs)
writer.add_image('outputs', outputs, step, dataformats='NCHW')
step += 1
writer.close()

1.5 损失函数 Loss Functions

  • 计算实际输出和目标之间的差距
  • 为更新输出提供一定的依据(反向传播)

pytorch文档地址:https://docs.pytorch.org/docs/stable/nn.html#loss-functions

1.5.1 L1Loss函数

1
2
3
torch.nn.L1Loss(
size_average=None, reduce=None, reduction='mean'
)
  • size_average (bool, optional) – 已弃用(请参阅reduction)。默认情况下,损失按批次中的每个损失元素进行平均。请注意,对于某些损失,每个样本都有多个元素。如果字段size_maverage设置为False,则将每个小批量的损失相加。当reduce为False时忽略。默认值:True
  • reduce (bool, optional) – 已弃用(参见reduce)。默认情况下,根据size_maverage,对每个小批量的观测值进行平均或求和。当reduce为False时,返回每个批处理元素的损失,并忽略size_maverage。默认值:True
  • reduction (str, optional) – 指定要应用于输出的缩减:‘none’|‘表示’|‘和’。‘none’:不应用缩减,‘mean’:输出的总和将除以输出中的元素数量,‘sum’:输出将被求和。注意:size_maverage和reduce正在被弃用,同时,指定这两个参数中的任何一个都将覆盖reduce。默认值:‘mean’

实例代码

1
2
3
4
5
6
7
8
9
10
11
12
import torch
from torch import nn

inputs = torch.tensor([1,2,3], dtype=torch.float)
targets = torch.tensor([1,2,5], dtype=torch.float)

inputs = torch.reshape(inputs, (1,1,1,3)) # 变成batchsize,1通道,1行3列
targets = torch.reshape(targets, (1,1,1,3))

loss = nn.L1Loss()
result = loss(inputs, targets)
print(result)

1.5.2 MSELoss函数

计算平方差函数

1
2
3
torch.nn.MSELoss(
size_average=None, reduce=None, reduction='mean'
)

实例代码

1
2
3
4
5
6
7
8
9
10
11
12
import torch
from torch import nn

inputs = torch.tensor([1, 2, 3], dtype=torch.float)
targets = torch.tensor([1, 2, 5], dtype=torch.float)

inputs = torch.reshape(inputs, (1, 1, 1, 3)) # 变成batchsize,1通道,1行3列
targets = torch.reshape(targets, (1, 1, 1, 3))

loss_mse = nn.MSELoss()
result_mse = loss_mse(inputs, targets)
print(result_mse)

1.5.3 CrossEntropyLoss函数

交叉熵函数

1
2
3
4
torch.nn.CrossEntropyLoss(
weight=None, size_average=None, ignore_index=-100,
reduce=None, reduction='mean', label_smoothing=0.0
)

实例代码

1
2
3
4
5
6
7
x = torch.tensor([0.1, 0.2, 0.3], dtype=torch.float)    # x有3类
y = torch.tensor([1])
x = torch.reshape(x, (1,3)) # 变为 batchsize,类别数

loss_cross = nn.CrossEntropyLoss()
result_cross = loss_cross(x, y)
print(result_cross)

2 反向传播 backward

使用 CrossEntropyLoss函数 实现反向传播计算

神经网络中每个节点(要更新的参数)都有一个梯度,根据梯度对参数进行优化,最终实现降低loss的目的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
from torch import nn
import torch
import torchvision
from torch.utils.data import DataLoader

class TdModule(nn.Module):
def __init__(self):
super(TdModule, self).__init__()
self.model1 = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=5, padding=2),
nn.MaxPool2d(2),
nn.Conv2d(32, 32, kernel_size=5, padding=2),
nn.MaxPool2d(2),
nn.Conv2d(32, 64, kernel_size=5, padding=2),
nn.MaxPool2d(2),
nn.Flatten(),
nn.Linear(1024, 64),
nn.Linear(64, 10),
)

def forward(self, x):
x = self.model1(x)
return x

dataset = torchvision.datasets.CIFAR10(root='./dataset', train=False, transform=torchvision.transforms.ToTensor(), download=True)
dataloader = DataLoader(dataset, batch_size=1)

td = TdModule()
loss = nn.CrossEntropyLoss()
for data in dataloader:
imgs, targets = data
outputs = td(imgs)
# print(outputs) # 输出有10个数值,每一个代表每个类的概率
# print(targets) # 图片的类别编号
result_loss = loss(outputs, targets) # 交叉熵函数
print(result_loss)
result_loss.backward() # 开启计算梯度

.backward() 用来计算梯度,需要使用合适的优化器去更新参数,以达到整体的误差降低的目的

3 优化器 optim

pytorch文档地址:https://docs.pytorch.org/docs/stable/optim.html

  • params 模型的参数
  • lr 学习速率 learning rate

构造优化器

1
2
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
# optimizer = optim.Adam([var1, var2], lr=0.0001)

首先在优化器中放入模型的参数,优化步骤如下

1
2
3
4
5
6
for input, target in dataset:
optimizer.zero_grad() # 梯度清零
output = model(input)
loss = loss_fn(output, target) # 计算输出与真实值的误差
loss.backward() # 得到每个要更新参数的梯度
optimizer.step() # 每个参数都根据上步的梯度进行优化

实例代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
from torch import nn
import torch
import torchvision
from torch.utils.data import DataLoader

# 构建自定义模型
class TdModule(nn.Module):
def __init__(self):
super(TdModule, self).__init__()
self.model1 = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=5, padding=2),
nn.MaxPool2d(2),
nn.Conv2d(32, 32, kernel_size=5, padding=2),
nn.MaxPool2d(2),
nn.Conv2d(32, 64, kernel_size=5, padding=2),
nn.MaxPool2d(2),
nn.Flatten(),
nn.Linear(1024, 64),
nn.Linear(64, 10),
)

def forward(self, x):
x = self.model1(x)
return x

# 引入数据集
dataset = torchvision.datasets.CIFAR10(root='./dataset', train=False, transform=torchvision.transforms.ToTensor(), download=True)

dataloader = DataLoader(dataset, batch_size=1)

# 实例化模型,.to('cuda')将数据移至GPU计算,如果没有GPU,则
td = TdModule().to('cuda')
loss = nn.CrossEntropyLoss().to('cuda')

# 使用随机梯度下降优化器 SGD
optim = torch.optim.SGD(td.parameters(), lr=0.01) # 定义优化器
for epoch in range(20):
running_loss = 0.0
for data in dataloader:
imgs, targets = data
outputs = td(imgs.to('cuda'))
# print(outputs) # 输出有10个数值,每一个代表每个类的概率
# print(targets) # 图片的类别编号
result_loss = loss(outputs, targets.to('cuda')) # 交叉熵函数
optim.zero_grad() # 梯度清零,上一次的梯度对这一次的梯度更新没有用
result_loss.backward() # 得到每个可以调节参数对应的梯度
optim.step() # 对每个参数进行调优
running_loss = running_loss + result_loss # 整体误差求和
print(running_loss)

4 现有网络模型

pytorch文档地址:https://docs.pytorch.org/vision/stable/models.html

torchvision是关于图像相关的模型,torchaudio是关于语音相关的模型,torchtext是关于文字相关的模型…

4.1 VGG分类模型

常用的有VGG16、VGG19

1
2
3
4
torchvision.models.vgg16(
*, weights: Optional[VGG16_Weights] = None,
progress: bool = True, **kwargs: Any
)
  • weights (VGG16_Weights, optional) – 使用已经下载好的的预训练权重,参阅 VGG16_Weights 。默认情况下,不使用预先训练的权重。
  • progress (bool, optional) – 如果为True,则显示下载到stderr的进度条。默认值为True。
  • **kwargs – 传递给 torchvision.models.vgg.VGG 的参数。 更多详细信息参阅 source code

4.2 查看VGG模型架构

1
2
3
4
5
6
import torchvision

vgg16_false = torchvision.models.vgg16(pretrained=False) # 不使用预训练模型
vgg16_true = torchvision.models.vgg16(pretrained=True) # 使用预训练模型

print(vgg16_true) # 查看与训练模型的架构

VGG 模型架构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace=True)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace=True)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace=True)
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace=True)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace=True)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace=True)
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace=True)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace=True)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace=True)
(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace=True)
(5): Dropout(p=0.5, inplace=False)
(6): Linear(in_features=4096, out_features=10, bias=True)
)
)

4.3 修改网络模型

VGG模型通过ImageNet数据集训练,最终分类类别有1000个

CIFAR10数据集中只有10各类别的数据

如何利用现有的网络,去改动它的结构

在网络最后添加一个层级

1
2
3
4
5
6
# 在模型末尾添加层级
vgg16_true.add_module('add_linear', nn.Linear(1000, 10))

# 在classifier模块最后添加一个层级
vgg16_true.classifier.add_module('add_linear', nn.Linear(1000, 10))
print(vgg16_true)

修改某个层级参数

1
2
3
# 调整classifier模块中第6层参数
vgg16_false.classifier[6] = nn.Linear(4096, 10)
print(vgg16_false)

5 模型保存与加载

5.1 保存模型结构(参数)

1
2
3
4
import torchvision
import torch

vgg16 = torchvision.models.vgg16(weights=None) # 初始化模型

保存方式1,保存模型结构 + 模型参数

1
torch.save(vgg16, 'vgg16_method1.pth')

保存方式2,把模型的参数保存成字典(官方推荐)

1
torch.save(vgg16.state_dict(), 'vgg16_method2.pth')

5.2 模型加载

1
import torch

方式1 –> 保存方式1,加载模型

1
2
model = torch.load('vgg16_method1.pth', weights_only=False)
print(model)

方式2 –> 保存方式2,加载模型

1
2
3
4
5
6
vgg16 = torchvision.models.vgg16(weights=None)
vgg16.load_state_dict(torch.load('vgg16_method2.pth'))
print(vgg16)

# model = torch.load('vgg16_method2.pth') # 加载模型权重,读取出来字典形式
# print(model)