Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

显存充足,无法调用,显示只使用一点显存 #5878

Open
1 task done
Lgugeng opened this issue Oct 30, 2024 · 1 comment
Open
1 task done

显存充足,无法调用,显示只使用一点显存 #5878

Lgugeng opened this issue Oct 30, 2024 · 1 comment
Labels
pending This problem is yet to be addressed

Comments

@Lgugeng
Copy link

Lgugeng commented Oct 30, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.9.1.dev0
  • Platform: Linux-6.5.0-28-generic-x86_64-with-glibc2.35
  • Python version: 3.11.7
  • PyTorch version: 2.4.0+cu121 (GPU)
  • Transformers version: 4.43.4
  • Datasets version: 2.20.0
  • Accelerate version: 1.0.1
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA GeForce RTX 4080

Reproduction

10/31/2024 01:03:08 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.bfloat16
[INFO|configuration_utils.py:731] 2024-10-31 01:03:08,680 >> loading configuration file /home/wladmin/ai/Qwen2.5-7B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-10-31 01:03:08,681 >> Model config Qwen2Config {
"_name_or_path": "/home/wladmin/ai/Qwen2.5-7B-Instruct",
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 3584,
"initializer_range": 0.02,
"intermediate_size": 18944,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.4",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 152064
}

[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,681 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,681 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,681 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,681 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,681 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,681 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-10-31 01:03:08,780 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|configuration_utils.py:731] 2024-10-31 01:03:08,781 >> loading configuration file /home/wladmin/ai/Qwen2.5-7B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-10-31 01:03:08,781 >> Model config Qwen2Config {
"_name_or_path": "/home/wladmin/ai/Qwen2.5-7B-Instruct",
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 3584,
"initializer_range": 0.02,
"intermediate_size": 18944,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.4",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 152064
}

[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,781 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,781 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,781 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,781 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,781 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-10-31 01:03:08,781 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-10-31 01:03:08,872 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
10/31/2024 01:03:08 - WARNING - llamafactory.model.loader - Processor was not found: 'Qwen2Config' object has no attribute 'vision_config'.
10/31/2024 01:03:08 - INFO - llamafactory.data.template - Replace eos token: <|im_end|>
10/31/2024 01:03:08 - INFO - llamafactory.data.loader - Loading dataset 英中_专利_记忆库.json...
training example:
input_ids:
[151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 14880, 108965, 44063, 107083, 105205, 105395, 17714, 104811, 198, 1944, 6730, 369, 1818, 332, 89244, 553, 81345, 9299, 354, 2283, 66848, 6988, 151645, 198, 151644, 77091, 198, 100359, 38212, 99272, 44956, 103697, 24339, 115391, 120143, 9370, 102360, 39907, 151645]
inputs:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
请帮我将这段英文翻译为中文
test Method for Plutonium by Controlled-Potential Coulometry<|im_end|>
<|im_start|>assistant
控制电势库仑法测定钚的试验方法<|im_end|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 100359, 38212, 99272, 44956, 103697, 24339, 115391, 120143, 9370, 102360, 39907, 151645]
labels:
控制电势库仑法测定钚的试验方法<|im_end|>
[INFO|configuration_utils.py:731] 2024-10-31 01:03:09,836 >> loading configuration file /home/wladmin/ai/Qwen2.5-7B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-10-31 01:03:09,836 >> Model config Qwen2Config {
"_name_or_path": "/home/wladmin/ai/Qwen2.5-7B-Instruct",
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 3584,
"initializer_range": 0.02,
"intermediate_size": 18944,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.4",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 152064
}

[INFO|modeling_utils.py:3641] 2024-10-31 01:03:09,847 >> loading weights file /home/wladmin/ai/Qwen2.5-7B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:1572] 2024-10-31 01:03:09,847 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1038] 2024-10-31 01:03:09,848 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"eos_token_id": 151645
}

Loading checkpoint shards: 100%|██████████████████| 4/4 [00:01<00:00, 2.23it/s]
[INFO|modeling_utils.py:4473] 2024-10-31 01:03:11,758 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.

[INFO|modeling_utils.py:4481] 2024-10-31 01:03:11,758 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /home/wladmin/ai/Qwen2.5-7B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
[INFO|configuration_utils.py:991] 2024-10-31 01:03:11,765 >> loading configuration file /home/wladmin/ai/Qwen2.5-7B-Instruct/generation_config.json
[INFO|configuration_utils.py:1038] 2024-10-31 01:03:11,765 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"repetition_penalty": 1.05,
"temperature": 0.7,
"top_k": 20,
"top_p": 0.8
}

Traceback (most recent call last):
File "/home/wladmin/anaconda3/bin/llamafactory-cli", line 8, in
sys.exit(main())
^^^^^^
File "/home/wladmin/ai/LLaMA-Factory/src/llamafactory/cli.py", line 111, in main
run_exp()
File "/home/wladmin/ai/LLaMA-Factory/src/llamafactory/train/tuner.py", line 50, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/home/wladmin/ai/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 48, in run_sft
model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wladmin/ai/LLaMA-Factory/src/llamafactory/model/loader.py", line 160, in load_model
model = load_class.from_pretrained(**init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4000, in from_pretrained
dispatch_model(model, **device_map_kwargs)
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/accelerate/big_modeling.py", line 494, in dispatch_model
model.to(device)
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2871, in to
return super().to(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1174, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 854, in _apply
self._buffers[key] = fn(buf)
^^^^^^^
File "/home/wladmin/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1160, in convert
return t.to(
^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 15.69 GiB of which 22.50 MiB is free. Process 1548 has 239.88 MiB memory in use. Process 23735 has 362.46 MiB memory in use. Including non-PyTorch memory, this process has 14.76 GiB memory in use. Of the allocated memory 14.38 GiB is allocated by PyTorch, and 148.86 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

nvdia-smi
Thu Oct 31 01:06:37 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4080 Off | 00000000:01:00.0 On | N/A |
| 0% 40C P2 51W / 340W | 939MiB / 16376MiB | 4% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1406 G /usr/lib/xorg/Xorg 127MiB |
| 0 N/A N/A 1548 C+G ...libexec/gnome-remote-desktop-daemon 239MiB |
| 0 N/A N/A 1601 G /usr/bin/gnome-shell 41MiB |
| 0 N/A N/A 23735 C+G /opt/todesk/bin/ToDesk_Session 362MiB |
| 0 N/A N/A 468326 G ...irefox/5134/usr/lib/firefox/firefox 138MiB |
+---------------------------------------------------------------------------------------+

Expected behavior

怎么处理,具体方法

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Oct 30, 2024
@Kuangdd01
Copy link
Contributor

16GB的4080对于qwen-7B的模型不能算特别充足,在不量化的情况下光是load model都会占用14B显存.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

2 participants