Go to file

博惟 b54ffe1193 update readme 20250329-20:16		2025-03-29 20:16:18 +08:00
.github/workflows	Initial commit.	2025-02-24 18:58:19 +08:00
assets	update thpt fig	2025-03-28 17:43:13 +08:00
blog	.	2025-03-28 16:34:03 +08:00
csrc	Initial commit.	2025-02-24 18:58:19 +08:00
evaluation	PullRequest: 20 Fix bugs in auto evaluation	2025-03-12 16:10:12 +08:00
examples	PullRequest: 62 [Patch v0.2.0] Move all CLI arguments into a single file and add pretty helper messages.	2025-03-28 12:00:37 +08:00
functioncall	.	2025-03-22 12:22:34 +08:00
realhf	PullRequest: 62 [Patch v0.2.0] Move all CLI arguments into a single file and add pretty helper messages.	2025-03-28 12:00:37 +08:00
tests	PullRequest: 62 [Patch v0.2.0] Move all CLI arguments into a single file and add pretty helper messages.	2025-03-28 12:00:37 +08:00
.clang-format	Initial commit.	2025-02-24 18:58:19 +08:00
.dockerignore	Initial commit.	2025-02-24 18:58:19 +08:00
.gitignore	Initial commit.	2025-02-24 18:58:19 +08:00
Dockerfile	PullRequest: 9 Refactoring data transfer for v2 workers.	2025-03-08 18:26:24 +08:00
LEGAL.md	Initial commit.	2025-02-24 18:58:19 +08:00
LICENSE	Initial commit.	2025-02-24 18:58:19 +08:00
MANIFEST.in	Initial commit.	2025-02-24 18:58:19 +08:00
Makefile	Initial commit.	2025-02-24 18:58:19 +08:00
README.md	update readme 20250329-20:16	2025-03-29 20:16:18 +08:00
grader.py	Initial commit.	2025-02-24 18:58:19 +08:00
math_verify_utils_qwen.py	.	2025-03-21 22:38:21 +08:00
parser.py	Initial commit.	2025-02-24 18:58:19 +08:00
pyproject.toml	Initial commit.	2025-02-24 18:58:19 +08:00
pytest.ini	Initial commit.	2025-02-24 18:58:19 +08:00
requirements.txt	PullRequest: 62 [Patch v0.2.0] Move all CLI arguments into a single file and add pretty helper messages.	2025-03-28 12:00:37 +08:00
setup.py	Initial commit.	2025-02-24 18:58:19 +08:00

README.md

AReaL: A fully open-sourced and inclusive RL project for large reasoning models

AReaL (Ant Reasoning RL) is an open-sourced and efficient reinforcement learning training system for large reasoning models developed at the RL Lab, Ant Research, built upon the open-source project RealHF. We fully commit to open-source by opening training details, data and infra required to reproduce the results along with the model itself. AReaL aims to help everyone build their own AI agents easily and affordably. Our team loves milk tea as it is delicious, customizable, and affordable. We hope you all enjoy our project just like how you enjoy a real-world milk-tea (cheers).

AReaL Highlights

🛠️ Open & Reproducible: We will continuously release all code, datasets, and training recipes for RL training LLMs .
🚀 Scalability: AReaL can seamlessly adapt to different computational resource settings, ranging from 1 single node to 1K GPUs.
🔪 Cutting-Edge Performances: AReaL can produce models with cutting-edge reasoning capabilities. We are actively working on other domains, such as coding and agent, as well.

News

[2025/03/31] (v0.2, nickname Boba) Our milestone release Boba! Please call it A-ReaL-Boba! This release includes much accelerated training with SGLang support and SOTA 7B and 32B models on math reasoning.

[2025/02/24] (v0.1) Our initial release includes reproducible results for 1.5B and 7B LRMs. Check our v0.1 technical blog.

AReaL-boba Milestones and Highlights

In our boba release, we highlight the 3 most important milestones:

Full SGLang support and a collection of efficiency improvements
A SOTA 7B math reasoning model AReaL-boba-RL-7B and the corresponing training data AReaL-boba-106k.
A particularly competitive 32B model AReaL-boba-SFT-32B that can be trained with extremely low cost. (Training Data: AReaL-boba-SFT-200)

For the complete training and model details, please check our v0.2 technical blog.

SGLang support with 1.5x speedup on 7B Training

Thanks to a series of system-level optimizations, AReaL v0.2 improves its end-to-end training performance by up to 73%.

In the following table, we show the convergence time under different resource settings:

Model Size	1.5B	1.5B	1.5B	7B	7B	32B (SFT)
#GPU	8	32	128	32	128	64
Step	250	250	250	400	400	300
Time (h)	~230	~70	~25	~290	~80	~3.5

SOTA 7B model using RL on math reasoning

Model	AIME 2024	AIME 2025	GPQA
O1-Preview	56.7	-	-
R1-Distill-Qwen-7B	55.0	39.7	47.1
Light-R1-7B-DS	56.7	44.9	40.9
AReaL-boba-RL-7B 🤗	61.9	48.3	47.6

We used R1-Distill-Qwen-7B as our base model. After RL training, the pass@1 scores on AIME 2024 and AIME 2025 improve by 6.9 and 8.6 points, respectively, achieving SOTA performance among 7B models in mathematical reasoning. We have released the training data at AReaL-boba-106k.

Approaching QwQ-32B performances using only 200 data samples

	R1-Distill-Qwen-32B	QwQ-32B	AReaL-boba-SFT-32B 🤗
AIME 2024	72.6	78.9	78.8

Building upon R1-Distill-Qwen-32B, we replicate QwQ-32B's inference performance on AIME 2024 using just 200 data points via Supervised Fine-Tuning (SFT). We have released the training data at AReaL-boba-SFT-200.

Getting Started

Quick Start

git clone https://github.com/inclusionAI/AReaL
cd AReaL

# Train the distilled 7B model
REAL_NO_EXT=1 pip3 install -e . --no-build-isolation
python3 -m realhf.apps.quickstart ppo-math --config examples/configs/7B-distill/areal-7B-distill-gpus-128.yaml

# Evaluate the 7B model
python3 evaluation/eval_and_aggregate.py --model_path $MODEL_PATH --max_gen_tokens 32768

Resources

Future Plan

AReaL is under active development. We will have major releases in a weekly manner. We also highly appreciate efforts from the community as well. Here we highlight our future research and development plan.

System Development

Support for SGLang.
Support for the latest vLLM and megatron-core packages.
RL training with coding problems.
Asynchronous generation and RL training.
Optimizations for distributed training: expert parallel and zero-bubble pipelining.
RL for vision-language models (VLM).
Function calling and agent capabilities.

Algorithm Development

The training receipe for 32B models.
Multi-task RL training.
Improving agentic capabilities with end-to-end RL.
Stable RL training for larger MOE models.

Acknowledgement

We would like to remark that major contributors are from RL Lab at Ant Research and Institute for Interdisciplinary Information Sciences, Tsinghua University.

Our team has also received invaluable assistance from the Super Computing Technology (SCT) team at Ant Group, particularly in the realm of large-scale cluster operations and maintenance.

We also appreciate all the pioneer works from the community, particularly the ReaLHF project from OpenPsi Inc. and those other projects, including but not limited to, DeepScaleR, Open-Reasoner-Zero, OpenRLHF, veRL, SGLang, QwQ, Light-R1, and DAPO.

Citation

@inproceedings{mei2024realhf,
  author       = {Mei, Zhiyu and Fu, Wei and Li, Kaiwei and Wang, Guangju and Zhang, Huanchen and Wu, Yi},
  title        = {ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation},
  booktitle    = {Proceedings of the Eighth Conference on Machine Learning and Systems,
                  MLSys 2025, Santa Clara, CA, USA, May 12-15, 2025},
  publisher    = {mlsys.org},
  year         = {2025},
}

@misc{areal2025,
  author = {RL Lab, Ant Research},
  title = {AReaL: Ant Reasoning RL},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/inclusionAI/AReaL}},
}