初识Pytorch(二) -- 神经网络搭建

山河忽晚2025-06-29

1 神经网络基本骨架

pytorch关于神经网络的工具主要在torch.nn中（Neural Network）

官网文档：https://docs.pytorch.org/docs/stable/nn.html

Containers # 主要给神经网络定义了一些骨架（结构），往结构中添加不同的内容就可以组成神经网络
Convolution Layers # 卷积层
Pooling layers # 池化层
Padding Layers # 填充层
Non-linear Activations (weighted sum, nonlinearity) # 非线性激活
Non-linear Activations (other)
Normalization Layers # 标准化（归一化）层
…

1.1 骨架 Containers

Containers包含6个模块，Module是最常用的模块，给所有神经网络提供一个基本的骨架

Module	Base class for all neural network modules.
Sequential	A sequential container.
ModuleList	Holds submodules in a list.
ModuleDict	Holds submodules in a dictionary.
ParameterList	Holds parameters in a list.
ParameterDict	Holds parameters in a dictionary.

1.1.1 Module的使用

import torch
import torch.nn as nn

class TdModel(nn.Module):    # 创建自定义模型，需要继承nn.Module
    def __init__(self):
        super().__init__()   # 调用父类的初始化函数
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):       # 前向传播
        output = x + 1
        return output

td = TdModel()    # 实例化模型对象
x = torch.tensor(1.0)     # 设定输入值
output = td(x)            # 调用模型运算
print(output)

1.1.2 模型搭建 Sequential

按照顺序执行pytorch模型中的各项功能。

# Using Sequential to create a small model. When `model` is run,
# input will first be passed to `Conv2d(1,20,5)`. The output of
# `Conv2d(1,20,5)` will be used as the input to the first
# `ReLU`; the output of the first `ReLU` will become the input
# for `Conv2d(20,64,5)`. Finally, the output of
# `Conv2d(20,64,5)` will be used as input to the second `ReLU`
model = nn.Sequential(
          nn.Conv2d(1,20,5),
          nn.ReLU(),
          nn.Conv2d(20,64,5),
          nn.ReLU()
        )

# Using Sequential with OrderedDict. This is functionally the
# same as the above code
model = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv2d(1,20,5)),
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv2d(20,64,5)),
          ('relu2', nn.ReLU())
        ]))

以 CIFAR 10 的结构为例

不使用 Sequential ，神经网络模型结构过程

from torch import nn
import torch

class TdModule(nn.Module):
    def __init__(self):
        super(TdModule, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=5, padding=2)
        self.maxpool1 = nn.MaxPool2d(2)
        self.conv2 = nn.Conv2d(32, 32, kernel_size=5, padding=2)
        self.maxpool2 = nn.MaxPool2d(2)
        self.conv3 = nn.Conv2d(32, 64, kernel_size=5, padding=2)
        self.maxpool3 = nn.MaxPool2d(2)
        self.flatten = nn.Flatten()
        self.linear1 = nn.Linear(1024, 64)
        self.linear2 = nn.Linear(64, 10)
        
    def forward(self, x):
        x = self.conv1(x)
        x = self.maxpool1(x)
        x = self.conv2(x)
        x = self.maxpool2(x)
        x = self.conv3(x)
        x = self.maxpool3(x)
        x = self.flatten(x)
        x = self.linear1(x)
        x = self.linear2(x)
        return x

td = TdModule()
print(td)

# 模型输出结果查看
input = torch.randn(64, 3, 32, 32)    # 测试数据
output = td(input)
print(output.size())

使用 Sequential 组建神经网络模型

from torch import nn
import torch

class TdModule(nn.Module):
    def __init__(self):
        super(TdModule, self).__init__()
        self.model1 = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=5, padding=2),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 32, kernel_size=5, padding=2),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, kernel_size=5, padding=2),
            nn.MaxPool2d(2),
            nn.Flatten(),
            nn.Linear(1024, 64),
            nn.Linear(64, 10),
        )
        
    def forward(self, x):
        x = self.model1(x)
        return x

td = TdModule()

# 模型输出结果查看
input = torch.randn(64, 3, 32, 32)    # 测试数据
output = td(input)
print(output.size())

使用 tensotboard 可视化模型架构

from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter('logs')
writer.add_graph(TdModule(), input)
writer.close()

1.2 卷积层 Convolution Layers

torch.nn是对torch.nn.functional的一个封装

nn.Conv1d表示一维，nn.Conv2d表示二维（图片是2D），nn.Conv3d表示三维

卷积操作可视化：https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md

1.2.1 nn.functional.conv2d

torch.nn.functional.conv2d用法，注意数据要使用tensor格式！！

torch.nn.functional.conv2d(
    input, weight, bias=None, stride=1, 
    padding=0, dilation=1, groups=1
)

input 输入，形状要求 (minibatch,in_channels,iH,iW)
weight 权重 / 卷积核，形状要求 (out_channels,in_channelsgroups,kH,kW)
bias 偏置，形状要求 (out_channels)(out_channels)，默认值：None
stride 步长，卷积核每次移动步长，2d卷积中要求长宽，可以是单个数字或一个元组 (sH, sW)，默认是1
padding 填充，在输入图像的左右两边进行填充，padding决定填充多大范围，可以是单个数字或一个元组 (padH, padW)，默认是0（不填充）
- 如果不填充白边，那么边缘的图像特征就会缺失

示例代码

import torch
import torch.nn.functional as F

input = torch.tensor(       # 输入图像
    [[1,2,0,3,1],
     [0,1,2,3,1],
     [1,2,1,0,0],
     [5,2,3,1,1],
     [2,1,0,1,1]]
)
kernel = torch.tensor(      # 卷积核，就是权重weight
    [[1,2,1],
     [0,1,0],
     [2,1,0]]
)

input = torch.reshape(input, (1,1,5,5))     # minibatch为1，1通道，5H*5W
kernel = torch.reshape(kernel, (1,1,3,3))

output = F.conv2d(input, kernel, stride=1)    # 步长为1
print(output)
output = F.conv2d(input, kernel, stride=2)    # 步长为2
print(output)

1.2.2 nn.Conv2d

nn.Conv2d用法

torch.nn.Conv2d(
    in_channels, out_channels, kernel_size, stride=1, padding=0, # 常用的五个参数
    dilation=1, groups=1, bias=True, padding_mode='zeros', 
    device=None, dtype=None
)

in_channels (int) – 输入图像中的通道数
out_channels (int) – 卷积后输出的通道数
kernel_size (int or tuple) – 卷积核的大小，一个数字时为n*n大小的卷积核，不规则核设置为 (行, 宽)
stride (int or tuple, optional) – 卷积的步长，2d卷积中要求长宽，可以是单个数字或一个元组 (sH, sW)，默认是1
padding (int, tuple or str, optional) – 在输入的四个边上添加填充，padding决定填充多大范围，默认是0（不填充）
padding_mode (str, optional) –选择padding填充的时候，按什么模式进行填充 ‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’. 默认: ‘zeros’
dilation (int or tuple, optional) – 内核元素之间的间距，默认是1
groups (int, optional) – 从输入通道到输出通道的阻塞连接数，默认是1
bias (bool, optional) – 为True时向输出添加可学习的偏差，默认True

代码实例

import torch
import torchvision
from torch import nn
from torch.utils.data import DataLoader

dataset = torchvision.datasets.CIFAR10(root='./dataset', train=False, transform=torchvision.transforms.ToTensor(), download=True)
dataloader = DataLoader(dataset, batch_size=64)

class TdModule(nn.Module):    # 自定义模型
    def __init__(self):
        super(TdModule, self).__init__()    # 完成父类初始化
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0)
        
    def forward(self, x):
        x = self.conv1(x)
        return x

td = TdModule()
writer = SummaryWriter('./logs')    # 使用tensorboard可视化模型结果
step = 0
for data in dataloader:
    imgs, targets = data
    output = td(imgs)
    # print(imgs.shape)   # torch.Size([64, 3, 32, 32])
    # print(output.shape) # torch.Size([64, 6, 30, 30])
    writer.add_image('input', imgs, step, dataformats='NCHW')   # 输入图像
    output = torch.reshape(output, (-1, 3, 30, 30))                      # 输出是6通道，修改为3通道进行显示
    writer.add_image('output', output, step, dataformats='NCHW')
    step += 1

卷积层说明，如下图所示。

输入图像是 224 x 224 x 3，经过一个卷积和非线性激活后，变成 224 x 224 x 64

输出层的高宽计算公式见pytorch官方文档：https://docs.pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d

1.3 池化层 Pooling layers

最大池化（MaxPool）也称为下采样，上采样（MaxUnpool），平均池化（AvgPool），自适应的最大池化（AdaptiveMaxPool）

1.3.1 nn.MaxPool2d

torch.nn.MaxPool2d(
    kernel_size, stride=None, padding=0, 
    dilation=1, return_indices=False, ceil_mode=False
)

kernel_size (Union[int,* tuple*[int,* int*]]) – 设置用来取最大值的窗口尺寸，类似卷积核，设置为3的时候会生成一个3*3的窗口
stride (Union[int,* tuple*[int,* int*]]) – 窗口的步长，默认为kernel_size
padding (Union[int,* tuple*[int,* int*]]) – 在两侧添加填充
dilation (Union[int,* tuple*[int,* int*]]) – 窗口中各元素之间的步长
return_indices (bool) – 为True时返回最大索引和输出，用于 torch.nn.MaxUnpool2d 之后
ceil_mode (bool) – 为True时使用 ceil 而不是 floor 来计算输出形状，ceil对缺失信息部分也进行上采样，floor会对缺失信息部分进行舍弃
- floor向下取整，ceil向上取整，如下图所示

实例代码

import torch
from torch import nn

input = torch.tensor(
    [[1,2,0,3,1],
     [0,1,2,3,1],
     [1,2,1,0,0],
     [5,2,3,1,1],
     [2,1,0,1,1]], dtype=torch.float32
)
input = torch.reshape(input, (-1, 1, 5, 5))     # 变形为[minibatchsize，通道数，长，宽]

class TdModel(nn.Module):
    def __init__(self):
        super(TdModel, self).__init__()
        self.maxpool1 = nn.MaxPool2d(kernel_size=3, ceil_mode=False)
        
    def forward(self, input):
        output = self.maxpool1(input)
        return output
        
td = TdModel()
output = td(input)
print(output)

1.3.2 最大池化作用

最大池化的目的是保留输入的特征，同时把数据量减小

通过tensorboard可视化最大池化效果

import torchvision
from torch.utils.tensorboard import SummaryWriter

dataset = torchvision.datasets.CIFAR10(root='./dataset', train=False, download=True,
                                       transform=torchvision.transforms.ToTensor())
dataloader = torch.utils.data.DataLoader(dataset, batch_size=64)

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.maxpool1 = nn.MaxPool2d(kernel_size=3, ceil_mode=False)
        
    def forward(self, x):
        output = self.maxpool1(x)
        return output

td = Net()
writer = SummaryWriter('logs')
step = 0
for data in dataloader:
    imgs, targets = data
    writer.add_image('input', imgs, step, dataformats='NCHW')
    outputs = td(imgs)
    writer.add_image('outputs', outputs, step, dataformats='NCHW')
    step += 1

writer.close()

1.4 非线性激活 Non-linear Activations

非线性激活主要为了给神经网络引入非线性的特征，常见的有：

ReLU函数，输入大于0时输出原值，小于0 时输出0
Sigmoid函数，通过公式计算，输出结果在 (0,1) 范围内，常用于二分类问题

1.4.1 ReLU函数

pytorch文档地址：https://docs.pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU

torch.nn.ReLU(inplace=False)

 input = -1
 ReLU(input, inplace=True)
 # input = 0
 
output = ReLU(input, inplace=False)
# input = -1
# output = 0

inplace为True时，会将输入内存地址的数据直接改变，就是原地操作，无需返回值
inplace为False时，保留原始数据，需要采用返回值的形式接收改变后的数据

实例代码

import torch
from torch import nn

input = torch.tensor(
    [[1, -0.5],
     [-1, 3]]
)
# 输入添加上batchsize参数，图像是1维的，高宽都是2
input = torch.reshape(input, (-1, 1, 2, 2))    

class TdModule(nn.Module):
    def __init__(self):
        super(TdModule, self).__init__()
        self.relu1 = nn.ReLU()
    
    def forward(self, x):
        output = self.relu1(x)
        return output

Td = TdModule()
output = Td(input)
print(output)

1.4.2 Sigmoid函数

pytorch文档地址：https://docs.pytorch.org/docs/stable/generated/torch.nn.Sigmoid.html#torch.nn.Sigmoid

1	torch.nn.Sigmoid(args, *kwargs)

使用tensorboard可视化结果

from torch.utils.tensorboard import SummaryWriter
import torch
from torch import nn
import torchvision

dataset = torchvision.datasets.CIFAR10(root='./dataset', train=False, download=True, transform=torchvision.transforms.ToTensor())
dataloader = torch.utils.data.DataLoader(dataset, batch_size=64)

class TdModule(nn.Module):
    def __init__(self):
        super(TdModule, self).__init__()
        self.relu1 = nn.ReLU()
        self.sigmoid1 = nn.Sigmoid()

    def forward(self, x):
        output = self.sigmoid1(x)
        return output

Td = TdModule()
writer = SummaryWriter('logs')
step = 0
for data in dataloader:
    imgs, targets = data
    writer.add_image('input', imgs, step, dataformats='NCHW')
    outputs = Td(imgs)
    writer.add_image('outputs', outputs, step, dataformats='NCHW')
    step += 1
writer.close()

1.5 损失函数 Loss Functions

计算实际输出和目标之间的差距
为更新输出提供一定的依据（反向传播）

pytorch文档地址：https://docs.pytorch.org/docs/stable/nn.html#loss-functions

1.5.1 L1Loss函数

1
2
3

torch.nn.L1Loss(
    size_average=None, reduce=None, reduction='mean'
)

size_average (bool, optional) – 已弃用（请参阅reduction）。默认情况下，损失按批次中的每个损失元素进行平均。请注意，对于某些损失，每个样本都有多个元素。如果字段size_maverage设置为False，则将每个小批量的损失相加。当reduce为False时忽略。默认值：True
reduce (bool, optional) – 已弃用（参见reduce）。默认情况下，根据size_maverage，对每个小批量的观测值进行平均或求和。当reduce为False时，返回每个批处理元素的损失，并忽略size_maverage。默认值：True
reduction (str, optional) – 指定要应用于输出的缩减：‘none’|‘表示’|‘和’。‘none’：不应用缩减，‘mean’：输出的总和将除以输出中的元素数量，‘sum’：输出将被求和。注意：size_maverage和reduce正在被弃用，同时，指定这两个参数中的任何一个都将覆盖reduce。默认值：‘mean’

实例代码

import torch
from torch import nn

inputs = torch.tensor([1,2,3], dtype=torch.float)
targets = torch.tensor([1,2,5], dtype=torch.float)

inputs = torch.reshape(inputs, (1,1,1,3))   # 变成batchsize，1通道，1行3列
targets = torch.reshape(targets, (1,1,1,3))

loss = nn.L1Loss()
result = loss(inputs, targets)
print(result)

1.5.2 MSELoss函数

计算平方差函数

1
2
3

torch.nn.MSELoss(
    size_average=None, reduce=None, reduction='mean'
)

实例代码

import torch
from torch import nn

inputs = torch.tensor([1, 2, 3], dtype=torch.float)
targets = torch.tensor([1, 2, 5], dtype=torch.float)

inputs = torch.reshape(inputs, (1, 1, 1, 3))  # 变成batchsize，1通道，1行3列
targets = torch.reshape(targets, (1, 1, 1, 3))

loss_mse = nn.MSELoss()
result_mse = loss_mse(inputs, targets)
print(result_mse)

1.5.3 CrossEntropyLoss函数

交叉熵函数

torch.nn.CrossEntropyLoss(
    weight=None, size_average=None, ignore_index=-100, 
    reduce=None, reduction='mean', label_smoothing=0.0
)

实例代码

x = torch.tensor([0.1, 0.2, 0.3], dtype=torch.float)    # x有3类
y = torch.tensor([1])
x = torch.reshape(x, (1,3))     # 变为 batchsize，类别数

loss_cross = nn.CrossEntropyLoss()
result_cross = loss_cross(x, y)
print(result_cross)

2 反向传播 backward

使用 CrossEntropyLoss函数 实现反向传播计算

神经网络中每个节点（要更新的参数）都有一个梯度，根据梯度对参数进行优化，最终实现降低loss的目的

from torch import nn
import torch
import torchvision
from torch.utils.data import DataLoader

class TdModule(nn.Module):
    def __init__(self):
        super(TdModule, self).__init__()
        self.model1 = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=5, padding=2),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 32, kernel_size=5, padding=2),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, kernel_size=5, padding=2),
            nn.MaxPool2d(2),
            nn.Flatten(),
            nn.Linear(1024, 64),
            nn.Linear(64, 10),
        )
        
    def forward(self, x):
        x = self.model1(x)
        return x

dataset = torchvision.datasets.CIFAR10(root='./dataset', train=False, transform=torchvision.transforms.ToTensor(), download=True)
dataloader = DataLoader(dataset, batch_size=1)

td = TdModule()
loss = nn.CrossEntropyLoss()
for data in dataloader:
    imgs, targets = data
    outputs = td(imgs)
    # print(outputs)      # 输出有10个数值，每一个代表每个类的概率
    # print(targets)      # 图片的类别编号
    result_loss = loss(outputs, targets)    # 交叉熵函数
    print(result_loss)
    result_loss.backward()     # 开启计算梯度

.backward() 用来计算梯度，需要使用合适的优化器去更新参数，以达到整体的误差降低的目的

3 优化器 optim

pytorch文档地址：https://docs.pytorch.org/docs/stable/optim.html

params 模型的参数
lr 学习速率 learning rate

构造优化器

1 2	optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9) # optimizer = optim.Adam([var1, var2], lr=0.0001)

首先在优化器中放入模型的参数，优化步骤如下

for input, target in dataset:
    optimizer.zero_grad()          # 梯度清零
    output = model(input)          
    loss = loss_fn(output, target) # 计算输出与真实值的误差
    loss.backward()                # 得到每个要更新参数的梯度
    optimizer.step()               # 每个参数都根据上步的梯度进行优化

实例代码

from torch import nn
import torch
import torchvision
from torch.utils.data import DataLoader

# 构建自定义模型
class TdModule(nn.Module):
    def __init__(self):
        super(TdModule, self).__init__()
        self.model1 = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=5, padding=2),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 32, kernel_size=5, padding=2),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, kernel_size=5, padding=2),
            nn.MaxPool2d(2),
            nn.Flatten(),
            nn.Linear(1024, 64),
            nn.Linear(64, 10),
        )

    def forward(self, x):
        x = self.model1(x)
        return x

# 引入数据集
dataset = torchvision.datasets.CIFAR10(root='./dataset', train=False, transform=torchvision.transforms.ToTensor(), download=True)

dataloader = DataLoader(dataset, batch_size=1)

# 实例化模型，.to('cuda')将数据移至GPU计算，如果没有GPU，则
td = TdModule().to('cuda')
loss = nn.CrossEntropyLoss().to('cuda')

# 使用随机梯度下降优化器 SGD
optim = torch.optim.SGD(td.parameters(), lr=0.01)   # 定义优化器
for epoch in range(20):
    running_loss = 0.0
    for data in dataloader:
        imgs, targets = data
        outputs = td(imgs.to('cuda'))
        # print(outputs)      # 输出有10个数值，每一个代表每个类的概率
        # print(targets)      # 图片的类别编号
        result_loss = loss(outputs, targets.to('cuda'))  # 交叉熵函数
        optim.zero_grad()       # 梯度清零，上一次的梯度对这一次的梯度更新没有用
        result_loss.backward()  # 得到每个可以调节参数对应的梯度
        optim.step()            # 对每个参数进行调优
        running_loss = running_loss + result_loss   # 整体误差求和
    print(running_loss)

4 现有网络模型

pytorch文档地址：https://docs.pytorch.org/vision/stable/models.html

torchvision是关于图像相关的模型，torchaudio是关于语音相关的模型，torchtext是关于文字相关的模型…

4.1 VGG分类模型

常用的有VGG16、VGG19

torchvision.models.vgg16(
    *, weights: Optional[VGG16_Weights] = None, 
    progress: bool = True, **kwargs: Any
)

weights (VGG16_Weights, optional) – 使用已经下载好的的预训练权重，参阅 VGG16_Weights 。默认情况下，不使用预先训练的权重。
progress (bool, optional) – 如果为True，则显示下载到stderr的进度条。默认值为True。
**kwargs – 传递给 torchvision.models.vgg.VGG 的参数。更多详细信息参阅 source code 。

4.2 查看VGG模型架构

import torchvision
 
vgg16_false = torchvision.models.vgg16(pretrained=False)    # 不使用预训练模型
vgg16_true = torchvision.models.vgg16(pretrained=True)      # 使用预训练模型
 
print(vgg16_true)   # 查看与训练模型的架构

VGG 模型架构

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=10, bias=True)
  )
)

4.3 修改网络模型

VGG模型通过ImageNet数据集训练，最终分类类别有1000个

CIFAR10数据集中只有10各类别的数据

如何利用现有的网络，去改动它的结构

在网络最后添加一个层级

# 在模型末尾添加层级
vgg16_true.add_module('add_linear', nn.Linear(1000, 10))

# 在classifier模块最后添加一个层级
vgg16_true.classifier.add_module('add_linear', nn.Linear(1000, 10))
print(vgg16_true)

修改某个层级参数

1
2
3

# 调整classifier模块中第6层参数
vgg16_false.classifier[6] = nn.Linear(4096, 10)
print(vgg16_false)

5 模型保存与加载

5.1 保存模型结构(参数)

import torchvision
import torch

vgg16 = torchvision.models.vgg16(weights=None)      # 初始化模型

保存方式1，保存模型结构 + 模型参数

1	torch.save(vgg16, 'vgg16_method1.pth')

保存方式2，把模型的参数保存成字典（官方推荐）

1	torch.save(vgg16.state_dict(), 'vgg16_method2.pth')

5.2 模型加载

1	import torch

方式1 –> 保存方式1，加载模型

1 2	model = torch.load('vgg16_method1.pth', weights_only=False) print(model)

方式2 –> 保存方式2，加载模型

vgg16 = torchvision.models.vgg16(weights=None)
vgg16.load_state_dict(torch.load('vgg16_method2.pth'))
print(vgg16)
                      
# model = torch.load('vgg16_method2.pth')   # 加载模型权重，读取出来字典形式
# print(model)