-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
model.generate与llamafactory-cli train do_predict给出的结果不一致 #5845
Labels
pending
This problem is yet to be addressed
Comments
遇到同样问题。我基于deepseek-coder-1.3B在一个二分类问题上进行了lora微调。使用llamafactory-cli进行do_predict跟model.generation进行预测的输出格式都是正确的,但使用llamafactory的分类性能好很多。 在prompt的层面已经确认了两种方法的输入prompt是对齐的,也尝试使用了llamafactory做预测时使用的gen_kwargs,不太清楚还会在什么地方没有对齐。 以下是我自行实现的inference部分代码 from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
from tqdm import tqdm
import argparse
import sys
import json
MAX_LENGTH = 1024
INSTRUCTION = '******\n'
def infer(model, tokenizer, code):
# input_text = INSTRUCTION + code
input_text = code
messages = [
{
'role': 'user',
'content': input_text
}
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt",
max_length = MAX_LENGTH,
truncation = True).to(model.device)
input_prompt = tokenizer.decode(input_ids[0], skip_special_tokens=True)
# outputs = model.generate(input_ids, max_new_tokens = MAX_LENGTH, eos_token_id = tokenizer.eos_token_id, pad_token_id = tokenizer.pad_token_id, do_sample = False)
# obtain from gen_kwargs
outputs = model.generate(input_ids, do_sample = True, temperature = 0.95, top_p = 0.7, top_k = 50, num_beams = 1, max_new_tokens = MAX_LENGTH, repetition_penalty = 1,
length_penalty = 1, default_system = None, eos_token_id = tokenizer.eos_token_id, pad_token_id = tokenizer.pad_token_id)
output_text = tokenizer.decode(outputs[0][len(input_ids[0]):], skip_special_tokens=True)
return input_prompt, output_text
def test(args):
tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)
test_dataset = load_dataset('json', data_files = args.test_dataset_file)['train']
model = AutoModelForCausalLM.from_pretrained(args.model_path, trust_remote_code=True, torch_dtype=torch.float32, device_map='auto').eval()
model = PeftModel.from_pretrained(model, args.output_dir).eval()
tp, tn, fp, fn = 0, 0, 0, 0
with open(f'{args.output_dir}/generated_predictions.jsonl', 'w', encoding='utf-8') as f:
for example in tqdm(test_dataset, total=len(test_dataset)):
input_prompt, output_text = infer(model, tokenizer, example['instruction'])
f.write(f'{json.dumps({"prompt": input_prompt, "label": example['output'], "predict": output_text}, ensure_ascii=False)}\n')
label = example['output'].strip()
output_text = output_text.strip()
if label == output_text:
if output_text == 'yes':
tp += 1
else:
tn += 1
else:
if output_text == 'yes':
fp += 1
else:
fn += 1
acc = (tp + tn) / (tp + tn + fp + fn)
print(f'Accuracy: {acc}')
sys.stdout.flush()
acc = (tp + tn) / (tp + tn + fp + fn)
precision = tp / (tp + fp) if tp + fp > 0 else float('nan')
recall = tp / (tp + fn) if tp + fn > 0 else float('nan')
f1 = 2 * precision * recall / (precision + recall) if precision + recall > 0 else float('nan')
fpr = fp / (fp + tn) if fp + tn > 0 else float('nan')
print(f'Accuracy: {acc}')
print(f'Precision: {precision}')
print(f'Recall: {recall}')
print(f'F1: {f1}')
print(f'False Positive Rate: {fpr}')
def main(args):
test(args)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--output_dir', type=str)
parser.add_argument('--test_dataset_file', type=str)
parser.add_argument('--model_path', type=str)
args = parser.parse_args()
main(args) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Reminder
System Info
您好,我在做训完SFT阶段的lora后进行推理的测试,模版为default,我使用llamafactory-cli train并在config yaml设置了do_predict和do_sample=False,会在generated_predictions.jsonl给出预测的结果。但是,这与我自己通过model.generation的实现给出的结果内容的意思大致相同,但token差异比较大,如果想保证model.generation与generated_predictions.jsonl结果完全相同,应该怎么做呢?
以下是我的数据案例和代码
数据案例(实际是我微调中使用的问题):
{
"instruction": "What are the three primary colors?",
"input": "",
"output": "The three primary colors are red, blue, and yellow."
}
根据数据案例,我使用了default模版,我的input应该是:
user_input = "Human: What are the three primary colors?\nAssistant:"
Reproduction
Expected behavior
用llamafactory-cli内置的predict方式是trainer.predict的,它给出的结果和model.generation给出的结果有区别,如何保证二者是相同的?
Others
无
The text was updated successfully, but these errors were encountered: