mirror of https://github.com/inclusionAI/AReaL
643 B
643 B
Evaluate
This evaluation package was modified from Qwen2.5-Math.
Install the following packages:
cd latex2sympy
pip install -e .
cd ..
pip install -r requirements.txt
pip install vllm --no-build-isolation
pip install transformers==4.47.0
pip install prettytable timeout_decorator
Run evaluation:
python eval_and_aggregate.py \
--model_path {MODEL_PATH} \
--output_path {OUTPUT_PATH} \
--data_names math_500,aime24,amc23 \
--max_gen_tokens 32768 \ # max number of tokens to generate, defaults to 32768
The results are saved in {OUTPUT_PATH}/math_eval_32768
.