显存充足，无法调用，显示只使用一点显存 #5878

Lgugeng · 2024-10-30T17:07:41Z

Reminder

I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.1.dev0
Platform: Linux-6.5.0-28-generic-x86_64-with-glibc2.35
Python version: 3.11.7
PyTorch version: 2.4.0+cu121 (GPU)
Transformers version: 4.43.4
Datasets version: 2.20.0
Accelerate version: 1.0.1
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA GeForce RTX 4080

Reproduction

10/31/2024 01:03:08 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.bfloat16
[INFO|configuration_utils.py:731] 2024-10-31 01:03:08,680 >> loading configuration file /home/wladmin/ai/Qwen2.5-7B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-10-31 01:03:08,681 >> Model config Qwen2Config {
"_name_or_path": "/home/wladmin/ai/Qwen2.5-7B-Instruct",
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 3584,
"initializer_range": 0.02,
"intermediate_size": 18944,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.4",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 152064
}

[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,681 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,681 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,681 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,681 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,681 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,681 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-10-31 01:03:08,780 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|configuration_utils.py:731] 2024-10-31 01:03:08,781 >> loading configuration file /home/wladmin/ai/Qwen2.5-7B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-10-31 01:03:08,781 >> Model config Qwen2Config {
"_name_or_path": "/home/wladmin/ai/Qwen2.5-7B-Instruct",
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 3584,
"initializer_range": 0.02,
"intermediate_size": 18944,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.4",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 152064
}

[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,781 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,781 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,781 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,781 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,781 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,781 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-10-31 01:03:08,872 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
10/31/2024 01:03:08 - WARNING - llamafactory.model.loader - Processor was not found: 'Qwen2Config' object has no attribute 'vision_config'.
10/31/2024 01:03:08 - INFO - llamafactory.data.template - Replace eos token: <|im_end|>
10/31/2024 01:03:08 - INFO - llamafactory.data.loader - Loading dataset 英中_专利_记忆库.json...
training example:
input_ids:
[151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 14880, 108965, 44063, 107083, 105205, 105395, 17714, 104811, 198, 1944, 6730, 369, 1818, 332, 89244, 553, 81345, 9299, 354, 2283, 66848, 6988, 151645, 198, 151644, 77091, 198, 100359, 38212, 99272, 44956, 103697, 24339, 115391, 120143, 9370, 102360, 39907, 151645]
inputs:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
请帮我将这段英文翻译为中文
test Method for Plutonium by Controlled-Potential Coulometry<|im_end|>
<|im_start|>assistant
控制电势库仑法测定钚的试验方法<|im_end|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 100359, 38212, 99272, 44956, 103697, 24339, 115391, 120143, 9370, 102360, 39907, 151645]
labels:
控制电势库仑法测定钚的试验方法<|im_end|>
[INFO|configuration_utils.py:731] 2024-10-31 01:03:09,836 >> loading configuration file /home/wladmin/ai/Qwen2.5-7B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-10-31 01:03:09,836 >> Model config Qwen2Config {
"_name_or_path": "/home/wladmin/ai/Qwen2.5-7B-Instruct",
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 3584,
"initializer_range": 0.02,
"intermediate_size": 18944,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.4",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 152064
}

[INFO|modeling_utils.py:3641] 2024-10-31 01:03:09,847 >> loading weights file /home/wladmin/ai/Qwen2.5-7B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:1572] 2024-10-31 01:03:09,847 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1038] 2024-10-31 01:03:09,848 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"eos_token_id": 151645
}

Loading checkpoint shards: 100%|██████████████████| 4/4 [00:01<00:00, 2.23it/s]
[INFO|modeling_utils.py:4473] 2024-10-31 01:03:11,758 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.

[INFO|modeling_utils.py:4481] 2024-10-31 01:03:11,758 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /home/wladmin/ai/Qwen2.5-7B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
[INFO|configuration_utils.py:991] 2024-10-31 01:03:11,765 >> loading configuration file /home/wladmin/ai/Qwen2.5-7B-Instruct/generation_config.json
[INFO|configuration_utils.py:1038] 2024-10-31 01:03:11,765 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"repetition_penalty": 1.05,
"temperature": 0.7,
"top_k": 20,
"top_p": 0.8
}

Traceback (most recent call last):
File "/home/wladmin/anaconda3/bin/llamafactory-cli", line 8, in
sys.exit(main())
^^^^^^
File "/home/wladmin/ai/LLaMA-Factory/src/llamafactory/cli.py", line 111, in main
run_exp()
File "/home/wladmin/ai/LLaMA-Factory/src/llamafactory/train/tuner.py", line 50, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/home/wladmin/ai/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 48, in run_sft
model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wladmin/ai/LLaMA-Factory/src/llamafactory/model/loader.py", line 160, in load_model
model = load_class.from_pretrained(**init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4000, in from_pretrained
dispatch_model(model, **device_map_kwargs)
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/accelerate/big_modeling.py", line 494, in dispatch_model
model.to(device)
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2871, in to
return super().to(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1174, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 854, in _apply
self._buffers[key] = fn(buf)
^^^^^^^
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1160, in convert
return t.to(
^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 15.69 GiB of which 22.50 MiB is free. Process 1548 has 239.88 MiB memory in use. Process 23735 has 362.46 MiB memory in use. Including non-PyTorch memory, this process has 14.76 GiB memory in use. Of the allocated memory 14.38 GiB is allocated by PyTorch, and 148.86 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1406 G /usr/lib/xorg/Xorg 127MiB |
| 0 N/A N/A 1548 C+G ...libexec/gnome-remote-desktop-daemon 239MiB |
| 0 N/A N/A 1601 G /usr/bin/gnome-shell 41MiB |
| 0 N/A N/A 23735 C+G /opt/todesk/bin/ToDesk_Session 362MiB |
| 0 N/A N/A 468326 G ...irefox/5134/usr/lib/firefox/firefox 138MiB |
+---------------------------------------------------------------------------------------+

Expected behavior

怎么处理，具体方法

Others

No response

The text was updated successfully, but these errors were encountered:

Kuangdd01 · 2024-10-31T08:02:01Z

16GB的4080对于qwen-7B的模型不能算特别充足，在不量化的情况下光是load model都会占用14B显存.

github-actions bot added the pending This problem is yet to be addressed label Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

显存充足，无法调用，显示只使用一点显存 #5878

显存充足，无法调用，显示只使用一点显存 #5878

Lgugeng commented Oct 30, 2024

Kuangdd01 commented Oct 31, 2024

显存充足，无法调用，显示只使用一点显存 #5878

显存充足，无法调用，显示只使用一点显存 #5878

Comments

Lgugeng commented Oct 30, 2024

Reminder

System Info

Reproduction

Expected behavior

Others

Kuangdd01 commented Oct 31, 2024