-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
单机多卡训练rt-detrv2-r101,loss反向传播报错ValueError: (InvalidArgument) Required tensor shall not be nullptr, but received nullptr. #9094
Comments
您好,可以切换release/2.7.1试一下 |
您好,我切换到PaddleDetection:release/2.7.1分支后,configs中并没有rtdetrv2文件夹,当我按照 release/2.7.0 分支的下述指令: python -m paddle.distributed.launch --gpus 0,1,2 tools/train.py -c configs/rtdetrv2/rtdetrv2_r101vd_6x_coco.yml --fleet --eval 出现了如下报错: Traceback (most recent call last):
File "/home/zqy/zqy/Codes/PaddleDetection-release-2.7.1/tools/train.py", line 213, in <module>
main()
File "/home/zqy/zqy/Codes/PaddleDetection-release-2.7.1/tools/train.py", line 209, in main
run(FLAGS, cfg)
File "/home/zqy/zqy/Codes/PaddleDetection-release-2.7.1/tools/train.py", line 149, in run
trainer = Trainer(cfg, mode='train')
File "/home/zqy/zqy/Codes/PaddleDetection-release-2.7.1/ppdet/engine/trainer.py", line 116, in __init__
self.model = create(cfg.architecture)
File "/home/zqy/zqy/Codes/PaddleDetection-release-2.7.1/ppdet/core/workspace.py", line 255, in create
cls_kwargs.update(cls.from_config(config, **kwargs))
File "/home/zqy/zqy/Codes/PaddleDetection-release-2.7.1/ppdet/modeling/architectures/detr.py", line 63, in from_config
transformer = create(cfg['transformer'], **kwargs)
File "/home/zqy/zqy/Codes/PaddleDetection-release-2.7.1/ppdet/core/workspace.py", line 229, in create
raise ValueError("The module {} is not registered".format(name))
ValueError: The module RTDETRTransformerv2 is not registered |
先用 |
很抱歉,我之前填写运行环境时出现了错误,现在更正如下: 复现环境 Environment
经过试验,在该环境下跑多卡跑rtdetr是没问题的,但是多卡跑rtdetr v2时会出现上述: ValueError: (InvalidArgument) Required tensor shall not be nullptr, but received nullptr.
[Hint: tensor should not be null.] (at ../paddle/phi/core/device_context.cc:142) 的报错 |
请问这个问题短期内有解决方案吗?辛苦您了 |
收到 最近安排时间看下;其实 |
大佬,导出问题看看吧,v2训练推理都没问题,但是导出报错,paddle3.0b1+paddledetection develop |
请问你是多卡训练也没问题吗 |
@lyuwenyu 大佬看下呢 |
没有测试,我一直用的windows单卡 |
问题确认 Search before asking
Bug组件 Bug Component
Training
Bug描述 Describe the Bug
当我使用下述指令训练rt-detr的时候:
会出现报错:
当我使用单卡训练的时候就不会报错了
复现环境 Environment
Bug描述确认 Bug description confirmation
是否愿意提交PR? Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: