Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on BookSum #170

Open
Aishwarya-Kumaran opened this issue Nov 2, 2023 · 1 comment
Open

Training on BookSum #170

Aishwarya-Kumaran opened this issue Nov 2, 2023 · 1 comment

Comments

@Aishwarya-Kumaran
Copy link

Aishwarya-Kumaran commented Nov 2, 2023

output_and_error.log
Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@orangetin
Copy link
Member

File output

Traceback (most recent call last):
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/dist_clm_train.py", line 478, in <module>
    main()
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/dist_clm_train.py", line 397, in main
    init_communicators(args)
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/comm/comm_utils.py", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/comm/nccl_backend.py", line 31, in __init__
    cupy.cuda.Device(cuda_id).use()
  File "cupy/cuda/device.pyx", line 192, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 198, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 375, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 144, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
Traceback (most recent call last):
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/dist_clm_train.py", line 478, in <module>
    main()
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/dist_clm_train.py", line 397, in main
    init_communicators(args)
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/comm/comm_utils.py", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/comm/nccl_backend.py", line 31, in __init__
    cupy.cuda.Device(cuda_id).use()
  File "cupy/cuda/device.pyx", line 192, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 198, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 375, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 144, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
Traceback (most recent call last):
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/dist_clm_train.py", line 478, in <module>
    main()
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/dist_clm_train.py", line 397, in main
    init_communicators(args)
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/comm/comm_utils.py", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/comm/nccl_backend.py", line 31, in __init__
    cupy.cuda.Device(cuda_id).use()
  File "cupy/cuda/device.pyx", line 192, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 198, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 375, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 144, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
Traceback (most recent call last):
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/dist_clm_train.py", line 478, in <module>
    main()
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/dist_clm_train.py", line 397, in main
    init_communicators(args)
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/comm/comm_utils.py", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/comm/nccl_backend.py", line 31, in __init__
    cupy.cuda.Device(cuda_id).use()
  File "cupy/cuda/device.pyx", line 192, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 198, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 375, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 144, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
Traceback (most recent call last):
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/dist_clm_train.py", line 478, in <module>
    main()
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/dist_clm_train.py", line 397, in main
    init_communicators(args)
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/comm/comm_utils.py", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/comm/nccl_backend.py", line 31, in __init__
    cupy.cuda.Device(cuda_id).use()
  File "cupy/cuda/device.pyx", line 192, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 198, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 375, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 144, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
Traceback (most recent call last):
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/dist_clm_train.py", line 478, in <module>
    main()
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/dist_clm_train.py", line 397, in main
    init_communicators(args)
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/comm/comm_utils.py", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/mnt/camelot/Team4/fall23/llama2/Book/OpenChatKit-main/training/comm/nccl_backend.py", line 31, in __init__
    cupy.cuda.Device(cuda_id).use()
  File "cupy/cuda/device.pyx", line 192, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 198, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 375, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 144, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants