RedPanda training inference error #115

qrpike · 2023-05-10T20:48:08Z

Describe the bug
After following the red panda fine tuning tutorial, running the bot inference script with the output model results in an error.

$python ./inference/bot.py  --model=model_ckpts/hf/
Loading model_ckpts/hf/ to cuda:0...
Welcome to OpenChatKit shell.   Type /help or /? to list commands.

>>> who is allen turing?
Traceback (most recent call last):
  File "/home/ubuntu/OpenChatKit/./inference/bot.py", line 285, in <module>
    main()
  File "/home/ubuntu/OpenChatKit/./inference/bot.py", line 281, in main
    ).cmdloop()
  File "/home/ubuntu/miniconda3/envs/OpenChatKit/lib/python3.10/cmd.py", line 138, in cmdloop
    stop = self.onecmd(line)
  File "/home/ubuntu/miniconda3/envs/OpenChatKit/lib/python3.10/cmd.py", line 217, in onecmd
    return func(arg)
  File "/home/ubuntu/OpenChatKit/./inference/bot.py", line 150, in do_say
    output = self._model.do_inference(
  File "/home/ubuntu/OpenChatKit/./inference/bot.py", line 92, in do_inference
    outputs = self._model.generate(
  File "/home/ubuntu/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/generation_utils.py", line 1326, in generate
    return self.sample(
  File "/home/ubuntu/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/generation_utils.py", line 1981, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

To Reproduce
Steps to reproduce the behavior:

Follow this script: 4877e61?short_path=4b1c3f4#diff-4b1c3f4ad52d26ce38d74a383894c53c0d7d0e40ec39e36e2b48a165f3f5d3f8
Then run: python ./inference/bot.py --model=model_ckpts/hf/

Expected behavior
Inference to run properly.

Environment:
The code is running on a lambdalabs 8xA100 40GB SMX4

The text was updated successfully, but these errors were encountered:

ChengYen-Tang · 2023-08-07T05:24:04Z

#86 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RedPanda training inference error #115

RedPanda training inference error #115

qrpike commented May 10, 2023

ChengYen-Tang commented Aug 7, 2023

RedPanda training inference error #115

RedPanda training inference error #115

Comments

qrpike commented May 10, 2023

ChengYen-Tang commented Aug 7, 2023