AReaL

History

晓雷 2963a67311 Initial commit.		2025-02-24 18:58:19 +08:00
..
data	Initial commit.	2025-02-24 18:58:19 +08:00
latex2sympy	Initial commit.	2025-02-24 18:58:19 +08:00
sh	Initial commit.	2025-02-24 18:58:19 +08:00
LEGAL.md	Initial commit.	2025-02-24 18:58:19 +08:00
LICENSE	Initial commit.	2025-02-24 18:58:19 +08:00
README.md	Initial commit.	2025-02-24 18:58:19 +08:00
data_loader.py	Initial commit.	2025-02-24 18:58:19 +08:00
eval_and_aggregate.py	Initial commit.	2025-02-24 18:58:19 +08:00
evaluate.py	Initial commit.	2025-02-24 18:58:19 +08:00
examples.py	Initial commit.	2025-02-24 18:58:19 +08:00
grader.py	Initial commit.	2025-02-24 18:58:19 +08:00
math_eval.py	Initial commit.	2025-02-24 18:58:19 +08:00
math_utils.py	Initial commit.	2025-02-24 18:58:19 +08:00
model_utils.py	Initial commit.	2025-02-24 18:58:19 +08:00
parser.py	Initial commit.	2025-02-24 18:58:19 +08:00
python_executor.py	Initial commit.	2025-02-24 18:58:19 +08:00
requirements.txt	Initial commit.	2025-02-24 18:58:19 +08:00
rm_maj_eval.py	Initial commit.	2025-02-24 18:58:19 +08:00
trajectory.py	Initial commit.	2025-02-24 18:58:19 +08:00
utils.py	Initial commit.	2025-02-24 18:58:19 +08:00

README.md

Evaluate

This evaluation package was modified from Qwen2.5-Math.

Install the following packages:

cd latex2sympy
pip install -e .
cd ..
pip install -r requirements.txt 
pip install vllm --no-build-isolation
pip install transformers==4.47.0
pip install prettytable timeout_decorator

Run evaluation:

python eval_and_aggregate.py \
--model_path {MODEL_PATH} \
--output_path {OUTPUT_PATH} \
--data_names math_500,aime24,amc23 \
--max_gen_tokens 32768 \ # max number of tokens to generate, defaults to 32768

The results are saved in {OUTPUT_PATH}/math_eval_32768.