Update time estimation in README (#23)

* fix: `self.tasks_ids` should also be filtered

* PullRequest: 67 Update v0.2.0 Dockerfile

Merge branch fw/v0.2.0-dockerfile of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/67

Signed-off-by: 温差 <xushusheng.xss@antgroup.com>


* fw/v0.2.0-dockerfile

* PullRequest: 66 Update v0.2.0 cover letter

Merge branch fw/v0.2.0-readme of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/66

Signed-off-by: 温差 <xushusheng.xss@antgroup.com>


* .
* .
* .
* .
* .
* update thpt fig
* update readme 20250329-20:16
* update
* update tutorial
* .

* upload 7B zero and 32B sft config

* clean ci

* PullRequest: 72 change the condition of using etcd

Merge branch fw/fix-etcd of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/72

Signed-off-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* change the condition of using etcd

* PullRequest: 60 Change the default SGLang parameters to avoid precision issues.

Merge branch fw/fix-sglang of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/60

Signed-off-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* change vllm config
* .
* .

* PullRequest: 73 Fix a setup issue when using ETCD

Merge branch fw/fix-etcd of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/73

Signed-off-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* fix etcd
* .
* .

* PullRequest: 75 Fix epoch counter before model function call execution.

Merge branch fw/fix-epoch-counter of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/75

Signed-off-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .

* PullRequest: 76 Update from opensource repository.

Merge branch mzy/update-from-opensource of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/76

Signed-off-by: 博惟 <bowei.fw@antgroup.com>


* .
* .
* .
* .
* .
* update thpt fig
* update readme 20250329-20:16
* update
* update tutorial
* fw/v0.2.0-dockerfile
* .
* V0.2.0 prerelease (#8)
* V0.2.0 prerelease (#9)
* Update README.md
* Clean up CI (#11)
* Update README.md citation (#15)
* Xss/readme (#16)
* Merge updates from ant repository. (#18)

* update readme
update readme

update readme

* update readme

* .

* .

---------

Signed-off-by: 博惟 <bowei.fw@antgroup.com>
Co-authored-by: wanghuaijie.whj <wanghuaijie.whj@antgroup.com>
Co-authored-by: meijun <meijun.mei@antgroup.com>
Co-authored-by: 晓雷 <meizhiyu.mzy@antgroup.com>
Co-authored-by: chucai.dzq <chucai.dzq@alibaba-inc.com>
This commit is contained in:
Wei Fu 2025-04-02 08:09:11 +08:00 committed by GitHub
parent 1c33379c93
commit aff05c2544
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 31 additions and 24 deletions

View File

@ -34,11 +34,11 @@ Thanks to a series of system-level optimizations, AReaL v0.2 improves its end-to
In the following table, we show the convergence time under different resource settings:
| **Model Size** | **1.5B** | **1.5B** | **1.5B** | **7B** | **7B** | **32B (SFT)** |
| --- | :---: | :---: | :---: | :---: | :---: | :---: |
| #GPU | 8 | 32 | 128 | 32 | 128 | 64 |
| Step | 250 | 250 | 250 | 400 | 400 | 300 |
| Time (h) | ~230 | ~70 | ~25 | ~290 | ~80 | ~3.5 |
| **Model Size** | **1.5B** | **1.5B** | **1.5B** | **7B** | **7B** |**32B (SFT)** |
| --- |:--------:|:--------:|:--------:|:------:|:------:|:-------:|
| #GPU | 8 | 32 | 128 | 32 | 128 | 64 |
| Step | 250 | 250 | 250 | 400 | 400 | 300 |
| Time (h) | ~240 | ~69 | ~27 | ~252 | ~90 | ~3.5 |
#### SOTA 7B model using RL in math reasoning

View File

@ -5,19 +5,23 @@
Check if your hardware meets these minimum requirements:
|**Model Size**| **1.5B** |**1.5B**|**1.5B**| **7B** | **7B** |
|---|:---:|:---:|:---:|:---:|:---:|
| **Nodes** | **1** | **4** | **16** | **4** | **16** |
| GPU | 8x H800 |8x H800 per node| 8x H800 per node |8x H800 per node| 8x H800 per node |
| CPU | 48 cores |48 cores per node|48 cores per node| 48 cores per node |48 cores per node|
| Memory | 1 TB |1 TB per node|1 TB per node| 1 TB per node |1 TB per node|
| Network | NVSwitch |NVSwitch + RoCE 3.2 Tbps|NVSwitch + RoCE 3.2 Tbps| NVSwitch + RoCE 3.2 Tbps |NVSwitch + RoCE 3.2 Tbps|
| Storage | 1TB |Shared storage (NAS) 10TB|Shared storage (NAS) 10TB| Shared storage (NAS) 10TB |Shared storage (NAS) 10TB|
| **Total Time (Hours)** | **230** | **70** | **25** | **290** | **80** |
|**Model Size**| **1.5B** |**1.5B**|**1.5B**| **7B** | **7B** | **32B** |
|---|:---:|:---:|:---:|:-------------------------:|:---:|:---:|
| **Nodes** | **1** | **4** | **16** | **4** | **16** | **16** |
| GPU | 8x H800 |8x H800 per node| 8x H800 per node | 8x H800 per node | 8x H800 per node | 8x H800 per node |
| CPU | 48 cores |48 cores per node|48 cores per node| 48 cores per node | 48 cores per node| 48 cores per node|
| Memory | 1 TB |1 TB per node|1 TB per node| 1 TB per node | 1 TB per node| 1 TB per node|
| Network | NVSwitch |NVSwitch + RoCE 3.2 Tbps|NVSwitch + RoCE 3.2 Tbps| NVSwitch + RoCE 3.2 Tbps | NVSwitch + RoCE 3.2 Tbps| NVSwitch + RoCE 3.2 Tbps|
| Storage | 1TB |Shared storage (NAS) 10TB|Shared storage (NAS) 10TB| Shared storage (NAS) 10TB |Shared storage (NAS) 10TB| Shared storage (NAS) 10TB|
| BatchSize x GroupSize | 512x16 | 512x16 | 512x16 | 512x16 | 512x16 | 512x32|
| **Single-step Time (seconds)** | **3461** | **997** | **391** | **2275** | **815** | **6707**|
| **#Steps Until Convergence** | **~250** |**~250** |**~250** |**~400** |**~400** | - |
| **Total Time (Hours)** | **~240** | **~69** | **~27** | **~252** | **~90** | - |
Notes:
- GPUs need to have 80GB memory. Other GPU models with similar specs are acceptable.
- Single-node training can use local storage, but multi-node training requires shared storage.
- We haven't successfully train a powerful 32B model, so we cannot estimate the required steps and time.
## Software Requirements
This tutorial provides a Docker image. Below are the tested software versions:

View File

@ -6,15 +6,18 @@
为了能正常完成训练流程,请参照下表确认你的硬件是否满足要求:
|**模型大小**| **1.5B** | **1.5B** |**1.5B** |**7B** |**7B** |
|---|---|---|---|---|---|
| 节点 | 1 | 4 | 16 | 4 | 16 |
| GPU | 8 张 H800 | 每节点 8 张 H800 |每节点 8 张 H800 |每节点 8 张 H800 |每节点 8 张 H800 |
| CPU | 48 核 | 每节点 48 核 |每节点 48 核 |每节点 48 核 |每节点 48 核 |
| 内存 | 1 TB |每节点 1 TB|每节点 1 TB |每节点 1 TB |每节点 1 TB |
| 通信 | NVSwitch |NVSwitch+RoCE 带宽 3.2 Tbps|NVSwitch+RoCE 带宽 3.2 Tbps|NVSwitch+RoCE 带宽 3.2 Tbps|NVSwitch+RoCE 带宽 3.2 Tbps|
| 存储 | 1TB |共享存储NAS10TB |共享存储NAS10TB |共享存储NAS10TB |共享存储NAS10TB |
|总训练时间(小时)| **230** | **70** | **25** | **290** | **80** |
| **模型大小** | **1.5B** | **1.5B** |**1.5B** | **7B** |**7B** | **32B** |
|---------------------|---|---|---|---------------------------|---|---|
| 节点 | 1 | 4 | 16 | 4 | 16 | 16 |
| GPU | 8 张 H800 | 每节点 8 张 H800 |每节点 8 张 H800 | 每节点 8 张 H800 |每节点 8 张 H800 |每节点 8 张 H800 |
| CPU | 48 核 | 每节点 48 核 |每节点 48 核 | 每节点 48 核 |每节点 48 核 | 每节点 48 核 |
| 内存 | 1 TB |每节点 1 TB|每节点 1 TB | 每节点 1 TB |每节点 1 TB | 每节点 1 TB |
| 通信 | NVSwitch |NVSwitch+RoCE 带宽 3.2 Tbps|NVSwitch+RoCE 带宽 3.2 Tbps| NVSwitch+RoCE 带宽 3.2 Tbps |NVSwitch+RoCE 带宽 3.2 Tbps| NVSwitch+RoCE 带宽 3.2 Tbps|
| 存储 | 1TB |共享存储NAS10TB |共享存储NAS10TB | 共享存储NAS10TB |共享存储NAS10TB | 共享存储NAS10TB |
| BatchSize x GroupSize | 512x16 | 512x16 | 512x16 | 512x16 | 512x16 | 512x32|
| 单步训练时间(秒) | **3461** | **997** | **391** | **2275** | **815** | **6707**|
| 训练至收敛需要步数 | **~250** |**~250** |**~250** |**~400** |**~400** | - |
| 总训练时间(小时) | **~240** | **~69** | **~27** | **~252** | **~90** | - |
关于硬件要求的说明:
@ -22,7 +25,7 @@
- 单节点训练时可以使用本地存储,但多节点训练必须要提供共享存储,否则无法进行训练。
- 所有训练均采用 16K 的 Context Length
- 目前32B模型没有训练出有意义的结果所以无法估计训练到收敛需要的步数和时间。
## 软件要求