Pytorch遇到错误的解决方法

发布时间：2021-12-04 18:44:37 来源：亿速云阅读：221 作者：柒染栏目：大数据

Pytorch遇到错误的解决方法，很多新手对此不是很清楚，为了帮助大家解决这个难题，下面小编将为大家详细讲解，有这方面需求的人可以来学习下，希望你能有所收获。

pytorch运行错误：RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

解决方法：

代码中添加：

torch.cuda.set_device(0)

训练RNN网络loss出现Nan解决办法

（1）. 梯度爆炸的原因可以通过梯度裁决解决

GRAD_CLIP = 5loss.backward()torch.nn.utils.clip_grad_norm_(model.parameters(), GRAD_CLIP)optimizer.step()

（2）testModel和evaluate中需要使用

with torch.no_grad():

（3）学习率调小一点

RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm

在代码中由三个位置需要进行cuda()转换：

模型是否放到了CUDA上model = model.to(device)
输入数据是否放到了CUDA上data = data.to(device)
模型内部新建的张量是否放到了CUDA上p = torch.tensor([1]).to(device)

关于第一条中model = model.to(device)只对model中实例化在__init__()中的函数有效，如果在forward中实例化并直接使用则不会将model放置到cuda中。

下面给出一个错误的代码：

import torch
import torch.nn as nn


data = torch.rand(1, 10).cuda()


class TestMoule(nn.Module):
    def __init__(self):
        super(TestMoule, self).__init__()
        # self.linear = torch.nn.Linear(10, 2)

    def forward(self, x):
        # return self.linear(x)
        return torch.nn.Linear(10, 2)(x)


model = TestMoule()
model = model.cuda()

print(model(data))

RuntimeError: CUDA error: an illegal memory access was encountered

出现上面问题一种情况是某些nn模块下的函数传入了gpu类型的数据，如下错误代码：

import torch

data = torch.randn(1, 10).cuda()

layernorm = torch.nn.LayerNorm(10)
# layernorm = torch.nn.LayerNorm(10).cuda()

re_data = layernorm(data)
print(re_data)

RuntimeError: CUDA error: device-side assert triggered

分类的类别target与模型输出softmax的值不是一一对应的，如三分类问题：

targets 为 1-3的值，但是softmax计算的值是0-2，因此提示上面的错误。

df = pd.read_csv('data/reviews.csv')

def to_sentiment(score):
    score = int(score)
    if score <= 2:
        return 0
    elif score == 3:
        return 1
    else:
        return 2

df['sentiment'] = df.score.apply(to_sentiment)

看完上述内容是否对您有帮助呢？如果还想对相关知识有进一步的了解或阅读更多相关文章，请关注亿速云行业资讯频道，感谢您对亿速云的支持。

向AI问一下细节

Pytorch遇到错误的解决方法

pytorch运行错误：RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

训练RNN网络loss出现Nan解决办法

RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm

RuntimeError: CUDA error: an illegal memory access was encountered

RuntimeError: CUDA error: device-side assert triggered

猜你喜欢

最新资讯

相关推荐

相关标签