Evaluation for humaneval #3

renmengjie7 · 2024-08-09T05:57:52Z

What is the processing of the inference and extraction for humaneval?
My test score is only 20.73 for humaneval (k=1, model = CodeLlama-7b-Instruct-hf+DPO ).

martin-wey · 2024-12-10T03:54:19Z

Hi! My sincere apologies for the delayed response.

For HumanEval+, we used https://github.com/evalplus/evalplus implementation. You can clone the repo, and add the generate.py in the /evalplus subfolder. You also need to add the evalplus/templates.py file.

Next, you can run generation for HumanEval+ as follows:

python evalplus/generate.py \
  --model_name_or_path coseal/CodeLlama-7B-Instruct-sft-dpo-qlora \
  --model codellama-7b-instruct-sft-dpo \
  --temperature 0.8 \
  --n_samples 10

In the paper, we use --temperature=0.8 and n_samples=10 to report Pass@1 and Pass@10.
After generating the responses, you can use evalplus/evaluate.py to obtain the metrics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation for humaneval #3

Evaluation for humaneval #3

renmengjie7 commented Aug 9, 2024

martin-wey commented Dec 10, 2024 •

edited

Loading

Evaluation for humaneval #3

Evaluation for humaneval #3

Comments

renmengjie7 commented Aug 9, 2024

martin-wey commented Dec 10, 2024 • edited Loading

martin-wey commented Dec 10, 2024 •

edited

Loading