.

2025-03-28 16:44:56 +08:00 · 2025-03-28 16:44:56 +08:00 · 111211b302
parent 1649c9fcf1
commit 111211b302
1 changed files with 42 additions and 37 deletions
--- a/README.md
+++ b/README.md
@ -1,49 +1,53 @@
-<h1 id="BQDMY">AReaL: Ant Reasoning RL </h1>
-![](https://intranetproxy.alipay.com/skylark/lark/0/2025/jpeg/163056495/1743133178438-df2b3c40-9d0f-4c7b-8118-7327df90ccc8.jpeg)
+<h1 align="center">
+<em>AReaL</em>: A fully open-sourced and inclusive RL project for large reasoning models
+</h1>

-<font style="color:rgb(31, 35, 40);">AReaL (Ant Reasoning RL) is an open-sourced and efficient reinforcement learning training system for large reasoning models developed at </font>**<font style="color:rgb(31, 35, 40);">the RL Lab, Ant Research</font>**<font style="color:rgb(31, 35, 40);">, built upon the open-source project </font>[RealHF](https://github.com/openpsi-project/ReaLHF)<font style="color:rgb(31, 35, 40);">. With a 100% open-source commitment, including data, training details, infra, and models, AReaL aims to help everyone build their own AI agents easily with a low cost.  Our team likes milk tea. We hope people will like our project just like a-real-milk tea. </font>
+![](/assets/logo.jpeg)

-<font style="color:rgb(31, 35, 40);"></font>
+AReaL (Ant Reasoning RL) is an open-sourced and efficient reinforcement learning training system for large reasoning models developed at **the RL Lab, Ant Research**, built upon the open-source project [RealHF](https://github.com/openpsi-project/ReaLHF). With a 100% open-source commitment, including data, training details, infra, and models, AReaL aims to help everyone build their own AI agents easily with a low cost.  Our team likes milk tea. We hope people will like our project just like a-real-milk tea.

-**<font style="color:rgb(31, 35, 40);">AReaL Highlights</font>**
+**AReaL Highlights**

-+ <font style="color:rgb(31, 35, 40);">🛠️</font><font style="color:rgb(31, 35, 40);"> </font>**<font style="color:rgb(31, 35, 40);">Open & Reproducible</font>**<font style="color:rgb(31, 35, 40);">: We will continuously release </font>_<font style="color:rgb(31, 35, 40);">all code, datasets, and training recipes</font>_<font style="color:rgb(31, 35, 40);"> for RL training LLMs .</font>
-+ <font style="color:rgb(31, 35, 40);">🚀</font><font style="color:rgb(31, 35, 40);"> </font>**<font style="color:rgb(31, 35, 40);">Scalability</font>**<font style="color:rgb(31, 35, 40);">: AReaL can seamlessly adapt to different computational resource settings, ranging from 1 single node to 1K GPUs.</font>
-+ 🔪 **<font style="color:rgb(31, 35, 40);">Cutting-Edge Performances:</font>**<font style="color:rgb(31, 35, 40);"> AReaL can produce models with cutting-edge reasoning capabilities. We are actively working on other domains, such as coding and agent, as well. </font>
+ 🛠️ **Open & Reproducible**: We will continuously release _all code, datasets, and training recipes_ for RL training LLMs .
+ 🚀 **Scalability**: AReaL can seamlessly adapt to different computational resource settings, ranging from 1 single node to 1K GPUs.
+ 🔪 **Cutting-Edge Performances:** AReaL can produce models with cutting-edge reasoning capabilities. We are actively working on other domains, such as coding and agent, as well. 

-<h2 id="news"><font style="color:rgb(31, 35, 40);">News</font></h2>
-**<font style="color:rgb(31, 35, 40);">[2025/03/31] (v0.2, nickname Boba)</font>**<font style="color:rgb(31, 35, 40);"> Our milestone release Boba! Please call it A-ReaL-Boba! This release includes much accelerated training with SGLang support and SOTA 7B and 32B models on math reasoning. </font>
+## News

-**<font style="color:rgb(31, 35, 40);">[2025/02/24] (v0.1)</font>**<font style="color:rgb(31, 35, 40);"> Our initial release includes reproducible results for 1.5B and 7B LRMs. Check our </font>[v0.1 technical blog](/blog/AReaL_v0_1.md)<font style="color:rgb(31, 35, 40);">.</font>
+**[2025/03/31] (v0.2, nickname Boba)** Our milestone release Boba! Please call it A-ReaL-Boba! This release includes much accelerated training with SGLang support and SOTA 7B and 32B models on math reasoning. 

-<h2 id="training-a-1.5b-lrm-from-the-distilled-model"><font style="color:rgb(31, 35, 40);">AReaL-boba Milestones</font></h2>
-<font style="color:rgb(31, 35, 40);">In our boba release, we highlight the 3 most important milestones:</font>
+**[2025/02/24] (v0.1)** Our initial release includes reproducible results for 1.5B and 7B LRMs. Check our [v0.1 technical blog](/blog/AReaL_v0_1.md).

-+ <font style="color:rgb(31, 35, 40);">SGLang support</font>
-+ <font style="color:rgb(31, 35, 40);">A SOTA 7B math reasoning model</font>
-+ <font style="color:rgb(31, 35, 40);">A particularly competitive 32B model that can be trained with extremely low cost.</font>
+## AReaL-boba Milestones
+
+In our boba release, we highlight the 3 most important milestones:
+
+ SGLang support
+ A SOTA 7B math reasoning model
+ A particularly competitive 32B model that can be trained with extremely low cost.

 For the complete training and model details, please check [our v0.2 technical blog](/blog/AReaL_v0_2.md). 

-<h4 id="kEJjy"><font style="color:rgb(31, 35, 40);">SGLang support with 1.5x speedup on 7B Training</font></h4>
-<font style="color:rgb(31, 35, 40);">note: vs v0.1</font>
+### SGLang support with 1.5x speedup on 7B Training
+note: vs v0.1

-<h4 id="k4xsF"><font style="color:rgb(31, 35, 40);">SOTA 7B model using RL on math reasoning</font></h4>
-| <font style="color:rgb(31, 35, 40);">Model </font> | <font style="color:rgb(31, 35, 40);">AIME 2024</font> | <font style="color:rgb(31, 35, 40);">AIME 2025</font> | <font style="color:rgb(31, 35, 40);">GPQA</font> |
+### SOTA 7B model using RL on math reasoning
+| Model  | AIME 2024 | AIME 2025 | GPQA |
 | :---: | :---: | :---: | :---: |
-| <font style="color:rgb(31, 35, 40);">R1-Distill-Qwen-7B</font> | <font style="color:rgb(31, 35, 40);">55.0</font> | <font style="color:rgb(31, 35, 40);">39.7</font> | <font style="color:rgb(31, 35, 40);"></font> |
-| <font style="color:rgb(31, 35, 40);">Light-R1-7B-DS</font> | <font style="color:rgb(31, 35, 40);">59.1</font> | <font style="color:rgb(31, 35, 40);">44.3</font> | <font style="color:rgb(31, 35, 40);"></font> |
-| <font style="color:rgb(31, 35, 40);">AReaL-boba-RL-7B (ours)</font> | **61.9** | **48.3** | **** |
+| R1-Distill-Qwen-7B | 55.0 | 39.7 |  |
+| Light-R1-7B-DS | 59.1 | 44.3 |  |
+| AReaL-boba-RL-7B (ours) | **61.9** | **48.3** | **** |


-<h4 id="orhVc"><font style="color:rgb(31, 35, 40);">Approaching QwQ-32B performances using only 200 data samples</font></h4>
-|  | <font style="color:rgb(31, 35, 40);">QwQ-32B</font> | <font style="color:rgb(31, 35, 40);">AReaL-boba-SFT-32B</font> |
+### Approaching QwQ-32B performances using only 200 data samples
+|  | QwQ-32B | AReaL-boba-SFT-32B |
 | --- | :---: | :---: |
-| <font style="color:rgb(31, 35, 40);">AIME 2024</font> | <font style="color:rgb(31, 35, 40);">78.9</font> | 78.8 |
+| AIME 2024 | 78.9 | 78.8 |


-<h2 id="getting-started"><font style="color:rgb(31, 35, 40);">Getting Started</font></h2>
-<h3 id="Trych">Quick Start</h3>
+## Getting Started
+
+### Quick Start
 ```markdown
 git clone https://github.com/inclusionAI/AReaL
 cd AReaL
@ -56,14 +60,14 @@ python3 -m realhf.apps.quickstart ppo-math --config examples/configs/7B-distill/
 python3 evaluation/eval_and_aggregate.py --model_path $MODEL_PATH --max_gen_tokens 32768
 ```

-<h3 id="jWpIm">Resources</h3>
+### Resources
 + [Tutorial](/examples/README.md)
 + [Tutorial (中文)](/examples/README_zh.md)

-<h2 id="future-plan">Future Plan</h2>
+## Future Plan
 AReaL is under active development. We will have major releases in a weekly manner. We also highly appreciate efforts from the community as well. Here we highlight our future research and development plan. 

-<h3 id="fM6oh">System Development</h3>
+### System Development
 - [x] Support for SGLang.
 - [ ] Support for the latest vLLM and megatron-core packages.
 - [ ] RL training with coding problems.
@ -72,20 +76,21 @@ AReaL is under active development. We will have major releases in a weekly manne
 - [ ] RL for vision-language models (VLM).
 - [ ] Function calling and agent capabilities.

-<h3 id="PCbLW">Algorithm Development</h3>
+### Algorithm Development
 - [ ] The training receipe for 32B models.
 - [ ] Multi-task RL training.
 - [ ] Agentic capabilities with end-to-end RL.
 - [ ] Stable RL training for larger MOE models.

-<h2 id="acknowledgement"><font style="color:rgb(31, 35, 40);">Acknowledgement</font></h2>
-<font style="color:rgb(31, 35, 40);">We would like to remark that major contributors are from </font>**<font style="color:rgb(31, 35, 40);">RL Lab at Ant Research</font>**<font style="color:rgb(31, 35, 40);"> and </font>**<font style="color:rgb(31, 35, 40);">Institute for Interdisciplinary Information Sciences, Tsinghua University</font>**<font style="color:rgb(31, 35, 40);">.</font>
+## Acknowledgement

-<font style="color:rgb(31, 35, 40);">Our team has also received invaluable assistance from the Super Computing Technology (SCT) team at Ant Group, particularly in the realm of large-scale cluster operations and maintenance. </font>
+We would like to remark that major contributors are from **RL Lab at Ant Research** and **Institute for Interdisciplinary Information Sciences, Tsinghua University**.

-<font style="color:rgb(31, 35, 40);">We also appreciate all the pioneer works from the community, particularly the </font>[ReaLHF](https://github.com/openpsi-project/ReaLHF)<font style="color:rgb(31, 35, 40);"> project from OpenPsi Inc. and those other projects, including but not limited to, </font>[DeepScaleR](https://github.com/agentica-project/deepscaler)<font style="color:rgb(31, 35, 40);">, </font>[Open-Reasoner-Zero](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/tree/main)<font style="color:rgb(31, 35, 40);">, </font>[OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)<font style="color:rgb(31, 35, 40);">, </font>[VeRL](https://github.com/volcengine/verl)<font style="color:rgb(31, 35, 40);">, and </font>[SGLang](https://github.com/sgl-project/sglang)<font style="color:rgb(31, 35, 40);">.</font>
+Our team has also received invaluable assistance from the Super Computing Technology (SCT) team at Ant Group, particularly in the realm of large-scale cluster operations and maintenance. 

-<h2 id="citation"><font style="color:rgb(31, 35, 40);">Citation</font></h2>
+We also appreciate all the pioneer works from the community, particularly the [ReaLHF](https://github.com/openpsi-project/ReaLHF) project from OpenPsi Inc. and those other projects, including but not limited to, [DeepScaleR](https://github.com/agentica-project/deepscaler), [Open-Reasoner-Zero](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/tree/main), [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [VeRL](https://github.com/volcengine/verl), and [SGLang](https://github.com/sgl-project/sglang).
+
+## Citation
 ```plain
@article{mei2024realhf,
  title={ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation},