Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation for humaneval #3

Open
renmengjie7 opened this issue Aug 9, 2024 · 1 comment
Open

Evaluation for humaneval #3

renmengjie7 opened this issue Aug 9, 2024 · 1 comment

Comments

@renmengjie7
Copy link

What is the processing of the inference and extraction for humaneval?
My test score is only 20.73 for humaneval (k=1, model = CodeLlama-7b-Instruct-hf+DPO ).

@martin-wey
Copy link
Owner

martin-wey commented Dec 10, 2024

Hi! My sincere apologies for the delayed response.

For HumanEval+, we used https://github.com/evalplus/evalplus implementation. You can clone the repo, and add the generate.py in the /evalplus subfolder. You also need to add the evalplus/templates.py file.

Next, you can run generation for HumanEval+ as follows:

python evalplus/generate.py \
  --model_name_or_path coseal/CodeLlama-7B-Instruct-sft-dpo-qlora \
  --model codellama-7b-instruct-sft-dpo \
  --temperature 0.8 \
  --n_samples 10 

In the paper, we use --temperature=0.8 and n_samples=10 to report Pass@1 and Pass@10.
After generating the responses, you can use evalplus/evaluate.py to obtain the metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants