mirror of https://github.com/inclusionAI/AReaL
[Feature] Amend yaml configurations for Ray experiments (#53)
* feat: one buffer for each task * feat: support "one buffer for each task" for async * make kv_cache_dtype configurable Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> * style: use plural form fix: use _seed_from_key to set different seeds for data loaders fix: call load_data for one buffer each time * PullRequest: 125 Support running async experiments in the 2407 image. Merge branch fw/async2407 of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/125 Signed-off-by: 晓雷 <meizhiyu.mzy@antgroup.com> * . * fix: handle multiple datasets in recover indices fix: `isinstance(self.__datasets, PullerStreamDataset)` feat: use the "spec" request to obtain the number of datasets fix: revert rollout worker * fix: revert async_rl_exp.py * fix flag for list (cuda_graph_bs) * format * [FIX] fix async task reward [sglang bf16-> fp16] * fix: define `self.__datasets` in advance * PullRequest: 130 [Refactor] Remove deprecated search related code Merge branch mzy/remove-search of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/130 Signed-off-by: 博惟 <bowei.fw@antgroup.com> * remove search related * PullRequest: 131 [Refactor] Change terminology "model parallel" into "tensor parallel" to align with megatron. Merge branch mzy/mp-to-tp of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/131?tab=comment Signed-off-by: 博惟 <bowei.fw@antgroup.com> * change mp to tp * . * . * PullRequest: 142 Fix an error for megatron backend destroy Merge branch fw/fix-meagatron-destroy of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/142 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * . * PullRequest: 143 Fix the port conflict issue of generation servers Merge branch fw/fix-gen-port of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/143?tab=comment Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * somehow fix the port issue * add clearance period * . * . * PullRequest: 145 Add code environment Merge branch fw/code-env of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/145?tab=comment Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * add code env * somehow fix the port issue * fix * PullRequest: 144 Add decoupled PPO loss Merge branch fw/decoupled-ppo-loss of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/144?tab=comment Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * fix ppo step logging, nan in stats tracker, and add decoupled loss * . * somehow fix the port issue * fix typo * PullRequest: 146 Merge SLURM logs and save experiment configs in yaml format. Merge branch fw/better-logging of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/146 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * merge all slurm logs into one * write config to yaml * PullRequest: 141 Merge changes during NeurIPS submission Merge branch fw/async-dev of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/141 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * . * . * . * . * . * . * . * . * . * update script * . * . * . * . * [ADD] add least req scheduling * fix test genreq * . * . * fix stats tracker nan * . * . * . * . * . * . * . * uppper clip decoupled objective * add throughput exp script * . * remove behav upper clip param * . * . * . * plot curve * update thpt script * . * master worker raise error when exiting * update script * add gen throughput logging * . * . * add decoupled wandb data * . * fix port issue and add no training option * . * enlarge ttl * remove gserver manager await staled * update weights in groups * . * . * . * add port clearance period * . * . * . * add plot script * add sft throughput eval * . * log tokens in null interface * 消融实验和interruptible generation * 画图脚本/运行脚本/数据结果 * . * remove scripts * add port test * remove force_sync_reward * revert some changes * . * revert * revert fix * fix * revert * fix typo * support qwen3 training * PullRequest: 147 Support interruption in SGLang and fix a KeyError in gather-scatter communication Merge branch fw/sglang046-with-abort-request of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/147?tab=diff Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * fix ppo step logging, nan in stats tracker, and add decoupled loss * . * somehow fix the port issue * initial commit * add interupt request * fix data transfer issue * max concurrent rollouts defaults to train batch size * merge main * add patch * fix patch typp * revert sglang * fix typo * fix minor typo * . * pip show editable sglang path * PullRequest: 149 fix: code faas max_retries Merge branch xss/fix_code_verifier of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/149 Reviewed-by: 博惟 <bowei.fw@antgroup.com> * fix: code faas max_retries * PullRequest: 150 [Bug Fix] Fix key errors in `_run_scatter` in data transfer Merge branch mzy/fix-scatter-groups of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/150 Reviewed-by: 博惟 <bowei.fw@antgroup.com> * fix scatter groups key error * fix test * . * PullRequest: 151 Fix Qwen3 import error when using transformers with a lower version Merge branch fw/fix-qwen3 of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/151 Reviewed-by: 温差 <xushusheng.xss@antgroup.com> * merge all slurm logs into one * write config to yaml * . * PullRequest: 152 Support sglang0.4.6 and fix master_worker import error Merge branch adopt_sglang046 of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/152 Reviewed-by: 博惟 <bowei.fw@antgroup.com> * Support sglang0.4.6 and fix master_worker import error * remove disable_mla option * PullRequest: 155 [FIX] reduce port conflicts Merge branch sxj/reduce_port_conflict of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/155 Reviewed-by: 博惟 <bowei.fw@antgroup.com> * [FIX] reduce port conflicts * PullRequest: 153 Fix stuck and recover issues for async experiments Merge branch fw/stable-async of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/153 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * fix sample cnt stuck * fix recover * code cleanup * merge all slurm logs into one * write config to yaml * . * . * . * revert birth time change * . * enlarge sock connect timeout * PullRequest: 158 [Fix] Fix the error where "accepted" is not defined Merge branch fw/fix-rollout-accepted of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/158 Reviewed-by: 温差 <xushusheng.xss@antgroup.com> * . * PullRequest: 154 Fix unit tests and simplify package installation Merge branch fw/v0.3.0-tests of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/154?tab=comment Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * fix some tests * fix tests except for experiments * fix tests * fix tests * . * . * PullRequest: 159 [fix] Enlarge the default aiohttp connection timeout and fix a recover error in model worker Merge branch fw/stable-async of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/159 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * fix sample cnt stuck * fix recover * code cleanup * merge all slurm logs into one * write config to yaml * . * . * . * revert birth time change * . * enlarge sock connect timeout * . * PullRequest: 160 set sock_connect as rollout_request_timeout in partial_rollout.py Merge branch xss/rollout_timeout of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/160 Reviewed-by: 博惟 <bowei.fw@antgroup.com> * set sock_connect as rollout_request_timeout in partial_rollout.py * PullRequest: 161 Prioritize rollouts that are submitted earlier rather than arrived earlier Merge branch fw/birth-time of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/161 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * . * blocking push * PullRequest: 163 [bugfix] Fix synchronized training when birth time is absent Merge branch fw/fix-sync-birthtime of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/163 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * . * PullRequest: 164 [Refactor] Move cluster spec into CLI args Merge branch fw/refactor-cluster-spec of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/164?tab=comment Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * set cluster spec path in args * . * fix * add default cluster spec * PullRequest: 165 Normally exit all workers after experiment completion Merge branch fw/exit-all-workers of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/165 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * . * . * PullRequest: 167 [Feature] Use chunked logits computation to alleviate SGLang OOM Merge branch fw/patch-sglang-oom of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/167 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * . * PullRequest: 166 [Feature] Support single-script experiment launch with Ray Merge branch fw/turbolaunch of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/166?tab=comment Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * add training script without ray name resolve * add ray name resolve * ray worker * run * run async * local run * set cluster spec path in args * . * . * fix * . * . * . * . * . * update config * . * minor renaming * PullRequest: 169 [Doc] Add v0.3.0 docs based on jupyter-book Merge branch fw/doc of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/169 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * add docs * refine doc * refine doc * PullRequest: 170 [Feature] Amend configs for ray scripts Merge branch fw/ray-configs of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/170 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * . --------- Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Co-authored-by: wanghuaijie.whj <wanghuaijie.whj@antgroup.com> Co-authored-by: Tiwei Bie <tiwei.btw@antgroup.com> Co-authored-by: kira.gw <kira.gw@antgroup.com> Co-authored-by: shenxujie.sxj <shenxujie.sxj@antgroup.com> Co-authored-by: 晓雷 <meizhiyu.mzy@antgroup.com> Co-authored-by: sam.gjx <sam.gjx@antgroup.com> Co-authored-by: 温差 <xushusheng.xss@antgroup.com> Co-authored-by: 履渊 <yuhong.gyh@antgroup.com>
This commit is contained in:
parent
cf46993a30
commit
7826fdbb87
|
@ -0,0 +1,65 @@
|
|||
max_head_offpolicyness: 4
|
||||
experiment_name: async-ppo-1.7b-gpu32
|
||||
trial_name: my-trial
|
||||
mode: ray
|
||||
cluster:
|
||||
fileroot: /storage/ray/experiments
|
||||
wandb:
|
||||
mode: disabled
|
||||
recover_mode: auto
|
||||
recover_retries: 10
|
||||
allocation_mode: sglang.d24p1m1+d4p2m1
|
||||
n_nodes: 4
|
||||
n_gpus_per_node: 8
|
||||
cache_clear_freq: 1
|
||||
exp_ctrl:
|
||||
total_train_epochs: 5
|
||||
save_freq_epochs: 1
|
||||
ckpt_freq_secs: 600
|
||||
torch_cache_mysophobia: true
|
||||
dataset:
|
||||
path: /storage/datasets/boba_106k_0319.jsonl
|
||||
max_prompt_len: 1024
|
||||
train_bs_n_seqs: 512
|
||||
group_size: 16
|
||||
group_adv_norm: false
|
||||
actor:
|
||||
type:
|
||||
_class: qwen3
|
||||
path: /storage/openpsi/models/Qwen3-1.7B/
|
||||
optimizer:
|
||||
lr: 2e-05
|
||||
lr_scheduler_type: constant
|
||||
eps: 1e-5
|
||||
warmup_steps_proportion: 0.001
|
||||
hysteresis: 2
|
||||
sglang:
|
||||
mem_fraction_static: 0.8
|
||||
actor_train:
|
||||
mb_spec:
|
||||
max_tokens_per_mb: 30720
|
||||
actor_gen:
|
||||
mb_spec:
|
||||
max_tokens_per_mb: 30720
|
||||
actor_inf:
|
||||
mb_spec:
|
||||
max_tokens_per_mb: 30720
|
||||
ppo:
|
||||
gen:
|
||||
max_new_tokens: 27648
|
||||
min_new_tokens: 0
|
||||
top_p: 1.0
|
||||
top_k: 1000000
|
||||
temperature: 1.0
|
||||
ppo_n_minibatches: 4
|
||||
kl_ctl: 0.0
|
||||
discount: 1.0
|
||||
value_eps_clip: 0.2
|
||||
disable_value: true
|
||||
reward_output_scaling: 5
|
||||
reward_output_bias: 0.0
|
||||
adv_norm: true
|
||||
value_norm: true
|
||||
recompute_logprob: true
|
||||
use_decoupled_loss: true
|
||||
|
|
@ -0,0 +1,65 @@
|
|||
max_head_offpolicyness: 4
|
||||
experiment_name: async-ppo-1.7b-gpu8
|
||||
trial_name: my-trial
|
||||
mode: ray
|
||||
cluster:
|
||||
fileroot: /storage/ray/experiments
|
||||
wandb:
|
||||
mode: disabled
|
||||
recover_mode: auto
|
||||
recover_retries: 10
|
||||
allocation_mode: sglang.d4p1m1+d2p2m1
|
||||
n_nodes: 1
|
||||
n_gpus_per_node: 8
|
||||
cache_clear_freq: 1
|
||||
exp_ctrl:
|
||||
total_train_epochs: 5
|
||||
save_freq_epochs: 1
|
||||
ckpt_freq_secs: 600
|
||||
torch_cache_mysophobia: true
|
||||
dataset:
|
||||
path: /storage/datasets/boba_106k_0319.jsonl
|
||||
max_prompt_len: 1024
|
||||
train_bs_n_seqs: 512
|
||||
group_size: 16
|
||||
group_adv_norm: false
|
||||
actor:
|
||||
type:
|
||||
_class: qwen3
|
||||
path: /storage/openpsi/models/Qwen3-1.7B/
|
||||
optimizer:
|
||||
lr: 2e-05
|
||||
lr_scheduler_type: constant
|
||||
eps: 1e-5
|
||||
warmup_steps_proportion: 0.001
|
||||
hysteresis: 2
|
||||
sglang:
|
||||
mem_fraction_static: 0.8
|
||||
actor_train:
|
||||
mb_spec:
|
||||
max_tokens_per_mb: 30720
|
||||
actor_gen:
|
||||
mb_spec:
|
||||
max_tokens_per_mb: 30720
|
||||
actor_inf:
|
||||
mb_spec:
|
||||
max_tokens_per_mb: 30720
|
||||
ppo:
|
||||
gen:
|
||||
max_new_tokens: 27648
|
||||
min_new_tokens: 0
|
||||
top_p: 1.0
|
||||
top_k: 1000000
|
||||
temperature: 1.0
|
||||
ppo_n_minibatches: 4
|
||||
kl_ctl: 0.0
|
||||
discount: 1.0
|
||||
value_eps_clip: 0.2
|
||||
disable_value: true
|
||||
reward_output_scaling: 5
|
||||
reward_output_bias: 0.0
|
||||
adv_norm: true
|
||||
value_norm: true
|
||||
recompute_logprob: true
|
||||
use_decoupled_loss: true
|
||||
|
|
@ -0,0 +1,62 @@
|
|||
experiment_name: ppo-1.7b-gpu32
|
||||
trial_name: my-trial
|
||||
mode: ray
|
||||
cluster:
|
||||
fileroot: /storage/ray/experiments
|
||||
wandb:
|
||||
mode: disabled
|
||||
recover_mode: auto
|
||||
recover_retries: 10
|
||||
allocation_mode: sglang.d16p1m1+d8p2m1
|
||||
n_nodes: 4
|
||||
n_gpus_per_node: 8
|
||||
cache_clear_freq: 1
|
||||
exp_ctrl:
|
||||
total_train_epochs: 5
|
||||
save_freq_epochs: 1
|
||||
ckpt_freq_secs: 600
|
||||
torch_cache_mysophobia: true
|
||||
dataset:
|
||||
path: /storage/datasets/boba_106k_0319.jsonl
|
||||
max_prompt_len: 1024
|
||||
train_bs_n_seqs: 512
|
||||
group_size: 16
|
||||
group_adv_norm: false
|
||||
actor:
|
||||
type:
|
||||
_class: qwen3
|
||||
path: /storage/openpsi/models/Qwen3-1.7B/
|
||||
optimizer:
|
||||
lr: 2e-05
|
||||
lr_scheduler_type: constant
|
||||
eps: 1e-5
|
||||
warmup_steps_proportion: 0.001
|
||||
hysteresis: 2
|
||||
sglang:
|
||||
mem_fraction_static: 0.8
|
||||
actor_train:
|
||||
mb_spec:
|
||||
max_tokens_per_mb: 30720
|
||||
actor_gen:
|
||||
mb_spec:
|
||||
max_tokens_per_mb: 30720
|
||||
actor_inf:
|
||||
mb_spec:
|
||||
max_tokens_per_mb: 30720
|
||||
ppo:
|
||||
gen:
|
||||
max_new_tokens: 27648
|
||||
min_new_tokens: 0
|
||||
top_p: 1.0
|
||||
top_k: 1000000
|
||||
temperature: 1.0
|
||||
ppo_n_minibatches: 4
|
||||
kl_ctl: 0.0
|
||||
discount: 1.0
|
||||
value_eps_clip: 0.2
|
||||
disable_value: true
|
||||
reward_output_scaling: 5
|
||||
reward_output_bias: 0.0
|
||||
adv_norm: true
|
||||
value_norm: true
|
||||
|
|
@ -0,0 +1,40 @@
|
|||
experiment_name: sft-7b-gpu8
|
||||
trial_name: my-trial
|
||||
mode: ray
|
||||
wandb:
|
||||
mode: disabled
|
||||
recover_mode: auto
|
||||
recover_retries: 10
|
||||
allocation_mode: d2p4m1
|
||||
n_nodes: 1
|
||||
n_gpus_per_node: 8
|
||||
exp_ctrl:
|
||||
total_train_epochs: 200
|
||||
save_freq_epochs: 1
|
||||
ckpt_freq_secs: 600
|
||||
torch_cache_mysophobia: true
|
||||
dataset:
|
||||
train_path: /storage/datasets/boba-sft_200_0319.jsonl
|
||||
valid_path: /storage/datasets/boba-sft_200_0319.jsonl
|
||||
max_seqlen: 32768
|
||||
train_bs_n_seqs: 16
|
||||
valid_bs_n_seqs: 16
|
||||
model:
|
||||
type:
|
||||
_class: qwen2
|
||||
path: /storage/models/DeepSeek-R1-Distill-Qwen-7B
|
||||
optimizer:
|
||||
type: adam
|
||||
lr_scheduler_type: constant
|
||||
lr: 1e-5
|
||||
warmup_steps_proportion: 0.03
|
||||
initial_loss_scale: 262144.0
|
||||
loss_scale_window: 10
|
||||
hysteresis: 2
|
||||
weight_decay: 0.1
|
||||
eps: 1e-5
|
||||
bf16: true
|
||||
allocation:
|
||||
mb_spec:
|
||||
max_tokens_per_mb: 32768
|
||||
|
Loading…
Reference in New Issue