PyTorch 构建 Tensor

2024年5月17日

摘要

在使用 PyTorch 的过程中，要想创建一个 Tensor，我们时常会构建一个全 0 或是全 1 的 Tensor，或是按照一定的规则构建一个 Tensor，PyTorch 提供了多种方式进行构造，本文将进行介绍。

复制 Tensor

由于 Tensor 是一个 Python 对象，常规的赋值操作传递的是引用，也就是类似快捷方式的版本，即使对了一个变量，但指向的还是同一个对象。例如：

a = torch.tensor([1, 2, 3])
b = a

a += 1

print(b)
# tensor([2, 3, 4])

因此，如果要复制成一个全新的版本，我们需要使用一些方法对其进行复制。

有三个函数可以实现，如下表：

名称	内存	梯度
`.clone()`	独立	共享
`.detach()`	共享	独立
`.clone().detach()`	独立	独立

复制但共享梯度

.clone() 方法可以复制 Tensor 到一片全新的内存，但并不会将复制后的 Tensor 从计算图剥离：

a = torch.tensor([1, 2, 3], dtype=torch.float32, requires_grad=True)
b = a.clone()

result = b * 2;
result.backward(torch.ones_like(result))

print(a.grad)
# tensor([2., 2., 2.])

b += 1

print(a)
# tensor([1., 2., 3.], requires_grad=True)

可以看到，即使我们对 b 进行了 \times 2 的操作，但 a 内部储存的梯度也变成了 2。说明两个 Tensor 数据上已经独立，但在计算图上还处在同一个位置，会被计算求导机制当作同一个东西。

⚠️注意事项：

在创建 Tensor 时，必须设置 requires_grad=True 才可以使用 .backward() 进行求导。

对非标量进行反向传播时，需要在 .backward() 给出一个起点，标量的话不需要，默认是 1

在此处，需要补充说明一个知识，先看一个例子：
a = torch.tensor([1, 2, 3], dtype=torch.float32, requires_grad=True)
b = a.clone()

result = b * 2;
result.backward(torch.ones_like(result))

print(b.grad)
# UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
可以看到有关叶子节点的报错。

叶子节点是 PyTorch 计算图上有的一个概念，每一个在计算图上（requires_grad=True）的 Tensor 都是一个节点，但只有用户直接通过代码新建的 Tensor 才叫做叶子节点，其余复制或是计算出来的都不是，而为了节约内存，只有叶子节点，才会持久保存梯度值，非叶子节点储存的梯度将在反向传播后被释放。

如果我们想要为某一个非叶子节点保留梯度，需要单独为其进行设置 b.retain_grad()：
a = torch.tensor([1, 2, 3], dtype=torch.float32, requires_grad=True)
b = a.clone()
b.retain_grad()

result = b * 2;

result.backward(torch.ones_like(result))

print(b.grad)
# tensor([2., 2., 2.])
但是，即使两个 Tensor 共享梯度，也有一些不同：

a 的梯度会由于副本 b 的计算而更新

b 的梯度不会由于原始版本 a 的计算而更新

因此：
a = torch.tensor([1, 2, 3], dtype=torch.float32, requires_grad=True)
b = a.clone()
b.retain_grad()

result = a * 2;

result.backward(torch.ones_like(result))

print(b.grad)
# None
此版本输出结果为 None

共享内存但脱离计算图

.detach() 方法可以让新 Tensor 脱离计算图但依然共享数据内存：

a = torch.tensor([1, 2, 3], dtype=torch.float32, requires_grad=True)
b = a.detach()

print(b.requires_grad)
# False

b += 1

print(a)
# tensor([2., 3., 4.], requires_grad=True)

可以看到，detach() 后的 b 最基础的 required_grad 属性都已变成 False，已完全脱离计算图，但对 b 值的更改还是会作用于 a，因此，内存依旧共享。

完全独立

.clone().detach() 方法可以让新 Tensor 脱离计算图且数据内存也独立：

a = torch.tensor([1, 2, 3], dtype=torch.float32, requires_grad=True)
b = a.clone().detach()

print(b.requires_grad)
# False

b += 1

print(a)
# tensor([1., 2., 3.], requires_grad=True)

可以看到，.clone().detach() 后的 b 最基础的 required_grad 属性都已变成 False，已完全脱离计算图，且对 b 值的更改不再会作用于 a，因此，内存也已独立。

创建 Tensor

创建 `0/1` Tensor

PyTorch 中，可以通过 torch.zeros() 或 torch.ones() 来创建全 0 或全 1 的 Tensor，只需要给出他们的维度：

a = torch.ones(2, 3)
# tensor([[ 1.,  1.,  1.],
#         [ 1.,  1.,  1.]])

a = torch.ones(5)
# tensor([ 1.,  1.,  1.,  1.,  1.])

b = torch.zeros(2, 3)
# tensor([[ 0.,  0.,  0.],
#         [ 0.,  0.,  0.]])

b = torch.zeros(5)
# tensor([ 0.,  0.,  0.,  0.,  0.])

还可以给出其他可选设置：

a = torch.ones(2, 3, dtype=torch.float32, device='cpu', requires_grad=True)
b = torch.zeros(2, 3, dtype=torch.float32, device='cpu', requires_grad=True)

从值和属性创建 Tensor

有时我们会需要创建一个 Tensor，其属性（如 dtype 等）从 tensor 获得，值和 device 则从 data 复制而来，可以使用以下方式：

tensor = torch.tensor([1.2, 2.3, 3.4], dtype=torch.float32, device='cpu')
data = torch.tensor([100, 200, 300], dtype=torch.int32, device='mps')

another = tensor.new_tensor(data)
print(another)
# tensor([100., 200., 300.], device='mps:0')
print(another.dtype)
# torch.float32
print(another.device)
# mps:0

⚠️ 截至 PyTorch 2.3，PyTorch 官方文档都依然存在错误，其中说，device 从 tensor 获得，然而实际如本文所示，从 data 获得。

当然，创建一个 Tensor，其属性（如 dtype, device 等）从 a 获得，值则为全 0 或全 1：

a = torch.tensor([1.2, 2.3, 3.4], dtype=torch.float32, device='mps')

zero = a.new_zeros(3, 2)
print(zero)
# tensor([[0., 0.],
#         [0., 0.],
#         [0., 0.]], device='mps:0')
print(a.device)
# mps:0

one = a.new_ones(3, 2)
print(one)
# tensor([[1., 1.],
#         [1., 1.],
#         [1., 1.]], device='mps:0')
print(one.device)
# mps:0

按分布新建 Tensor

这几个比较简单，没什么特殊规则，看一下就行：

torch.arange(start=0, end, step=1, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

torch.linspace(start, end, steps, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

torch.logspace(start, end, steps, base=10.0, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

其中的参数都是看一眼就能明白的，只有 step 和 steps 区别一下：

step - torch.arrange() 中使用，代表步长
steps - torch.linespace() 和 torch.logspace() 中使用，代表区间内取几个值

除此之外，在注意下开闭区间：

[start, end) - torch.arange()
[start, end] - torch.linespace() 和 torch.logspace()

彭某的技术折腾笔记

彭某的技术折腾笔记

PyTorch 构建 Tensor

PyTorch 构建 Tensor

摘要

复制 Tensor

复制但共享梯度

共享内存但脱离计算图

完全独立

创建 Tensor

创建 `0/1` Tensor

从值和属性创建 Tensor

按分布新建 Tensor

赞助

分享

彭某的技术折腾笔记

彭某的技术折腾笔记

PyTorch 构建 Tensor

PyTorch 构建 Tensor

摘要

复制 Tensor

复制但共享梯度

共享内存但脱离计算图

完全独立

创建 Tensor

创建 0/1 Tensor

从值和属性创建 Tensor

按分布新建 Tensor

赞助

分享

创建 `0/1` Tensor