We provide a user-friendly way to evaluate your own model.
You should add your model config to eval_configs/main_results_all_tasks.yaml
, for example:
codellama-34b
name: vllm
engine: [PATH TO YOUR MODEL]
max_tokens: 100
temperature: 0
top_p: 1
stop:
context_length: 16384
dtype: float32
ngpu: 4
use_parser: True
Arguments for the configs are as follows,
name
: name of the inference framework (e.g.,vllm
,hg
,gpt
,gpt_azure
,claude
)engine
: path to your model or the huggingface model namemax_tokens
: the max number of newly generated tokenstemperature
: temperature for generationtop_p
: top-$p$ for generationstop
: stop tokens for generationcontext_length
: the maximum context length of the LLMdtype
: float32 or float16ngpu
: this argument works for the vllm frameworkusr_parser
: bool, post-process the generated actions
Please check whether your model is supported by vLLM.
We recommend to use vLLM because it is usually much faster than naive model.generate()
in huggingface.
If you decide to use the vLLM, you should set the argument of name
above as vllm, otherwise hg.
Note: The results of inferencing with huggingface and vLLM can be different because their different implementations.
If your model need customized input template, you should write it in agentboard/prompts/prompt_template
, for example:
"codellama-34b":
"""
<s>[INST]{system_prompt}{prompt}[/INST]
""",
Arguments for this template are as follows,
system_prompt
: system prompt of your agentprompt
: the user prompt of your agent