mirror of https://github.com/inclusionAI/AReaL
![]() * PullRequest: 176 [FIX] clear sensitive info Merge branch fw/fix-sensitive-info of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/176 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * . * . * . * . * . * test env setup * fix * allow cached model * . * revise docs * change docs * format docs * update readme |
||
---|---|---|
.. | ||
code_verifier | ||
data | ||
latex2sympy | ||
sh | ||
LEGAL.md | ||
LICENSE | ||
README.md | ||
aggregate_acc_from_generated.py | ||
cf_elo_caculator.py | ||
code_eval.py | ||
data_loader.py | ||
eval_and_aggregate.py | ||
evaluate.py | ||
examples.py | ||
grader.py | ||
math_eval.py | ||
math_utils.py | ||
model_utils.py | ||
parser.py | ||
python_executor.py | ||
requirements.txt | ||
rm_maj_eval.py | ||
trajectory.py | ||
utils.py |
README.md
Evaluate
This evaluation package was modified from Qwen2.5-Math.
Install the following packages:
cd latex2sympy
pip install -e .
cd ..
pip install -r requirements.txt
pip install vllm --no-build-isolation
pip install transformers==4.47.0
pip install prettytable timeout_decorator
Run evaluation:
python eval_and_aggregate.py \
--model_path ${MODEL_PATH} \
--output_path ${OUTPUT_PATH} \
--data_names aime24 \
--max_gen_tokens 32768 \ # max number of tokens to generate, defaults to 32768
The results are saved in {OUTPUT_PATH}/math_eval_32768
.
Evaluate AReaL-boba-RL-7B:
python eval_and_aggregate.py \
--model_path ${MODEL_PATH} \
--output_path ${OUTPUT_PATH} \
--data_names aime24,aime25,gpqa_diamond \
--prompt_type AReaL-boba \
--output_path outputs --temperature 1.0
Evaluate AReaL-boba-SFT-32B:
python eval_and_aggregate.py \
--model_path ${MODEL_PATH} \
--output_path ${OUTPUT_PATH} \
--data_names aime24,aime25,gpqa_diamond \
--prompt_type AReaL-boba-SFT \
--samples_per_node 2 --num_sample_nodes 16 \
--output_path outputs --temperature 0.6