[Feature] Amend yaml configurations for Ray experiments (#53)

* feat: one buffer for each task

* feat: support "one buffer for each task" for async

* make kv_cache_dtype configurable

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>

* style: use plural form

fix: use _seed_from_key to set different seeds for data loaders
fix: call load_data for one buffer each time

* PullRequest: 125 Support running async experiments in the 2407 image.

Merge branch fw/async2407 of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/125

Signed-off-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .

* fix: handle multiple datasets in recover indices
fix: `isinstance(self.__datasets, PullerStreamDataset)`
feat: use the "spec" request to obtain the number of datasets
fix: revert rollout worker

* fix: revert async_rl_exp.py

* fix flag for list (cuda_graph_bs)

* format

* [FIX] fix async task reward [sglang bf16-> fp16]

* fix: define `self.__datasets` in advance

* PullRequest: 130 [Refactor] Remove deprecated search related code

Merge branch mzy/remove-search of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/130

Signed-off-by: 博惟 <bowei.fw@antgroup.com>


* remove search related

* PullRequest: 131 [Refactor] Change terminology "model parallel" into "tensor parallel" to align with megatron.

Merge branch mzy/mp-to-tp of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/131?tab=comment

Signed-off-by: 博惟 <bowei.fw@antgroup.com>


* change mp to tp
* .
* .

* PullRequest: 142 Fix an error for megatron backend destroy

Merge branch fw/fix-meagatron-destroy of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/142

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .

* PullRequest: 143 Fix the port conflict issue of generation servers

Merge branch fw/fix-gen-port of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/143?tab=comment

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* somehow fix the port issue
* add clearance period
* .
* .

* PullRequest: 145 Add code environment

Merge branch fw/code-env of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/145?tab=comment

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* add code env
* somehow fix the port issue
* fix

* PullRequest: 144 Add decoupled PPO loss

Merge branch fw/decoupled-ppo-loss of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/144?tab=comment

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* fix ppo step logging, nan in stats tracker, and add decoupled loss
* .
* somehow fix the port issue
* fix typo

* PullRequest: 146 Merge SLURM logs and save experiment configs in yaml format.

Merge branch fw/better-logging of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/146

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* merge all slurm logs into one
* write config to yaml

* PullRequest: 141 Merge changes during NeurIPS submission

Merge branch fw/async-dev of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/141

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* .
* .
* .
* .
* .
* .
* .
* .
* update script
* .
* .
* .
* .
* [ADD] add least req scheduling
* fix test genreq
* .
* .
* fix stats tracker nan
* .
* .
* .
* .
* .
* .
* .
* uppper clip decoupled objective
* add throughput exp script
* .
* remove behav upper clip param
* .
* .
* .
* plot curve
* update thpt script
* .
* master worker raise error when exiting
* update script
* add gen throughput logging
* .
* .
* add decoupled wandb data
* .
* fix port issue and add no training option
* .
* enlarge ttl
* remove gserver manager await staled
* update weights in groups
* .
* .
* .
* add port clearance period
* .
* .
* .
* add plot script
* add sft throughput eval
* .
* log tokens in null interface
* 消融实验和interruptible generation
* 画图脚本/运行脚本/数据结果
* .
* remove scripts
* add port test
* remove force_sync_reward
* revert some changes
* .
* revert
* revert fix
* fix
* revert
* fix typo

* support qwen3 training

* PullRequest: 147 Support interruption in SGLang and fix a KeyError in gather-scatter communication

Merge branch fw/sglang046-with-abort-request of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/147?tab=diff

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* fix ppo step logging, nan in stats tracker, and add decoupled loss
* .
* somehow fix the port issue
* initial commit
* add interupt request
* fix data transfer issue
* max concurrent rollouts defaults to train batch size
* merge main
* add patch
* fix patch typp
* revert sglang
* fix typo
* fix minor typo
* .
* pip show editable sglang path

* PullRequest: 149 fix: code faas max_retries

Merge branch xss/fix_code_verifier of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/149

Reviewed-by: 博惟 <bowei.fw@antgroup.com>


* fix: code faas max_retries

* PullRequest: 150 [Bug Fix] Fix key errors in `_run_scatter` in data transfer

Merge branch mzy/fix-scatter-groups of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/150

Reviewed-by: 博惟 <bowei.fw@antgroup.com>


* fix scatter groups key error

* fix test

* .

* PullRequest: 151 Fix Qwen3 import error when using transformers with a lower version

Merge branch fw/fix-qwen3 of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/151

Reviewed-by: 温差 <xushusheng.xss@antgroup.com>


* merge all slurm logs into one
* write config to yaml
* .

* PullRequest: 152 Support sglang0.4.6 and fix master_worker import error

Merge branch adopt_sglang046 of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/152

Reviewed-by: 博惟 <bowei.fw@antgroup.com>


* Support sglang0.4.6 and fix master_worker import error
* remove disable_mla option

* PullRequest: 155 [FIX] reduce port conflicts

Merge branch sxj/reduce_port_conflict of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/155

Reviewed-by: 博惟 <bowei.fw@antgroup.com>


* [FIX] reduce port conflicts

* PullRequest: 153 Fix stuck and recover issues for async experiments

Merge branch fw/stable-async of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/153

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* fix sample cnt stuck
* fix recover
* code cleanup
* merge all slurm logs into one
* write config to yaml
* .
* .
* .
* revert birth time change
* .
* enlarge sock connect timeout

* PullRequest: 158 [Fix] Fix the error where "accepted" is not defined

Merge branch fw/fix-rollout-accepted of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/158

Reviewed-by: 温差 <xushusheng.xss@antgroup.com>


* .

* PullRequest: 154 Fix unit tests and simplify package installation

Merge branch fw/v0.3.0-tests of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/154?tab=comment

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* fix some tests
* fix tests except for experiments
* fix tests
* fix tests
* .
* .

* PullRequest: 159 [fix] Enlarge the default aiohttp connection timeout and fix a recover error in model worker

Merge branch fw/stable-async of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/159

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* fix sample cnt stuck
* fix recover
* code cleanup
* merge all slurm logs into one
* write config to yaml
* .
* .
* .
* revert birth time change
* .
* enlarge sock connect timeout
* .

* PullRequest: 160 set sock_connect as rollout_request_timeout in partial_rollout.py

Merge branch xss/rollout_timeout of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/160

Reviewed-by: 博惟 <bowei.fw@antgroup.com>


* set sock_connect as rollout_request_timeout in partial_rollout.py

* PullRequest: 161 Prioritize rollouts that are submitted earlier rather than arrived earlier

Merge branch fw/birth-time of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/161

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* blocking push

* PullRequest: 163 [bugfix] Fix synchronized training when birth time is absent

Merge branch fw/fix-sync-birthtime of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/163

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .

* PullRequest: 164 [Refactor] Move cluster spec into CLI args

Merge branch fw/refactor-cluster-spec of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/164?tab=comment

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* set cluster spec path in args
* .
* fix
* add default cluster spec

* PullRequest: 165 Normally exit all workers after experiment completion

Merge branch fw/exit-all-workers of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/165

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* .

* PullRequest: 167 [Feature] Use chunked logits computation to alleviate SGLang OOM

Merge branch fw/patch-sglang-oom of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/167

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .

* PullRequest: 166 [Feature] Support single-script experiment launch with Ray

Merge branch fw/turbolaunch of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/166?tab=comment

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* add training script without ray name resolve
* add ray name resolve
* ray worker
* run
* run async
* local run
* set cluster spec path in args
* .
* .
* fix
* .
* .
* .
* .
* .
* update config
* .
* minor renaming

* PullRequest: 169 [Doc] Add v0.3.0 docs based on jupyter-book

Merge branch fw/doc of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/169

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* add docs
* refine doc
* refine doc

* PullRequest: 170 [Feature] Amend configs for ray scripts

Merge branch fw/ray-configs of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/170

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .

---------

Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Co-authored-by: wanghuaijie.whj <wanghuaijie.whj@antgroup.com>
Co-authored-by: Tiwei Bie <tiwei.btw@antgroup.com>
Co-authored-by: kira.gw <kira.gw@antgroup.com>
Co-authored-by: shenxujie.sxj <shenxujie.sxj@antgroup.com>
Co-authored-by: 晓雷 <meizhiyu.mzy@antgroup.com>
Co-authored-by: sam.gjx <sam.gjx@antgroup.com>
Co-authored-by: 温差 <xushusheng.xss@antgroup.com>
Co-authored-by: 履渊 <yuhong.gyh@antgroup.com>
This commit is contained in:
Wei Fu 2025-05-28 19:32:42 +08:00 committed by GitHub
parent cf46993a30
commit 7826fdbb87
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 232 additions and 0 deletions

View File

@ -0,0 +1,65 @@
max_head_offpolicyness: 4
experiment_name: async-ppo-1.7b-gpu32
trial_name: my-trial
mode: ray
cluster:
fileroot: /storage/ray/experiments
wandb:
mode: disabled
recover_mode: auto
recover_retries: 10
allocation_mode: sglang.d24p1m1+d4p2m1
n_nodes: 4
n_gpus_per_node: 8
cache_clear_freq: 1
exp_ctrl:
total_train_epochs: 5
save_freq_epochs: 1
ckpt_freq_secs: 600
torch_cache_mysophobia: true
dataset:
path: /storage/datasets/boba_106k_0319.jsonl
max_prompt_len: 1024
train_bs_n_seqs: 512
group_size: 16
group_adv_norm: false
actor:
type:
_class: qwen3
path: /storage/openpsi/models/Qwen3-1.7B/
optimizer:
lr: 2e-05
lr_scheduler_type: constant
eps: 1e-5
warmup_steps_proportion: 0.001
hysteresis: 2
sglang:
mem_fraction_static: 0.8
actor_train:
mb_spec:
max_tokens_per_mb: 30720
actor_gen:
mb_spec:
max_tokens_per_mb: 30720
actor_inf:
mb_spec:
max_tokens_per_mb: 30720
ppo:
gen:
max_new_tokens: 27648
min_new_tokens: 0
top_p: 1.0
top_k: 1000000
temperature: 1.0
ppo_n_minibatches: 4
kl_ctl: 0.0
discount: 1.0
value_eps_clip: 0.2
disable_value: true
reward_output_scaling: 5
reward_output_bias: 0.0
adv_norm: true
value_norm: true
recompute_logprob: true
use_decoupled_loss: true

View File

@ -0,0 +1,65 @@
max_head_offpolicyness: 4
experiment_name: async-ppo-1.7b-gpu8
trial_name: my-trial
mode: ray
cluster:
fileroot: /storage/ray/experiments
wandb:
mode: disabled
recover_mode: auto
recover_retries: 10
allocation_mode: sglang.d4p1m1+d2p2m1
n_nodes: 1
n_gpus_per_node: 8
cache_clear_freq: 1
exp_ctrl:
total_train_epochs: 5
save_freq_epochs: 1
ckpt_freq_secs: 600
torch_cache_mysophobia: true
dataset:
path: /storage/datasets/boba_106k_0319.jsonl
max_prompt_len: 1024
train_bs_n_seqs: 512
group_size: 16
group_adv_norm: false
actor:
type:
_class: qwen3
path: /storage/openpsi/models/Qwen3-1.7B/
optimizer:
lr: 2e-05
lr_scheduler_type: constant
eps: 1e-5
warmup_steps_proportion: 0.001
hysteresis: 2
sglang:
mem_fraction_static: 0.8
actor_train:
mb_spec:
max_tokens_per_mb: 30720
actor_gen:
mb_spec:
max_tokens_per_mb: 30720
actor_inf:
mb_spec:
max_tokens_per_mb: 30720
ppo:
gen:
max_new_tokens: 27648
min_new_tokens: 0
top_p: 1.0
top_k: 1000000
temperature: 1.0
ppo_n_minibatches: 4
kl_ctl: 0.0
discount: 1.0
value_eps_clip: 0.2
disable_value: true
reward_output_scaling: 5
reward_output_bias: 0.0
adv_norm: true
value_norm: true
recompute_logprob: true
use_decoupled_loss: true

View File

@ -0,0 +1,62 @@
experiment_name: ppo-1.7b-gpu32
trial_name: my-trial
mode: ray
cluster:
fileroot: /storage/ray/experiments
wandb:
mode: disabled
recover_mode: auto
recover_retries: 10
allocation_mode: sglang.d16p1m1+d8p2m1
n_nodes: 4
n_gpus_per_node: 8
cache_clear_freq: 1
exp_ctrl:
total_train_epochs: 5
save_freq_epochs: 1
ckpt_freq_secs: 600
torch_cache_mysophobia: true
dataset:
path: /storage/datasets/boba_106k_0319.jsonl
max_prompt_len: 1024
train_bs_n_seqs: 512
group_size: 16
group_adv_norm: false
actor:
type:
_class: qwen3
path: /storage/openpsi/models/Qwen3-1.7B/
optimizer:
lr: 2e-05
lr_scheduler_type: constant
eps: 1e-5
warmup_steps_proportion: 0.001
hysteresis: 2
sglang:
mem_fraction_static: 0.8
actor_train:
mb_spec:
max_tokens_per_mb: 30720
actor_gen:
mb_spec:
max_tokens_per_mb: 30720
actor_inf:
mb_spec:
max_tokens_per_mb: 30720
ppo:
gen:
max_new_tokens: 27648
min_new_tokens: 0
top_p: 1.0
top_k: 1000000
temperature: 1.0
ppo_n_minibatches: 4
kl_ctl: 0.0
discount: 1.0
value_eps_clip: 0.2
disable_value: true
reward_output_scaling: 5
reward_output_bias: 0.0
adv_norm: true
value_norm: true

View File

@ -0,0 +1,40 @@
experiment_name: sft-7b-gpu8
trial_name: my-trial
mode: ray
wandb:
mode: disabled
recover_mode: auto
recover_retries: 10
allocation_mode: d2p4m1
n_nodes: 1
n_gpus_per_node: 8
exp_ctrl:
total_train_epochs: 200
save_freq_epochs: 1
ckpt_freq_secs: 600
torch_cache_mysophobia: true
dataset:
train_path: /storage/datasets/boba-sft_200_0319.jsonl
valid_path: /storage/datasets/boba-sft_200_0319.jsonl
max_seqlen: 32768
train_bs_n_seqs: 16
valid_bs_n_seqs: 16
model:
type:
_class: qwen2
path: /storage/models/DeepSeek-R1-Distill-Qwen-7B
optimizer:
type: adam
lr_scheduler_type: constant
lr: 1e-5
warmup_steps_proportion: 0.03
initial_loss_scale: 262144.0
loss_scale_window: 10
hysteresis: 2
weight_decay: 0.1
eps: 1e-5
bf16: true
allocation:
mb_spec:
max_tokens_per_mb: 32768