[Feature] Amend yaml configurations for Ray experiments (#53)

* feat: one buffer for each task * feat: support "one buffer for each task" for async * make kv_cache_dtype configurable Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> * style: use plural form fix: use _seed_from_key to set different seeds for data loaders fix: call load_data for one buffer each time * PullRequest: 125 Support running async experiments in the 2407 image. Merge branch fw/async2407 of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/125 Signed-off-by: 晓雷 <meizhiyu.mzy@antgroup.com> * . * fix: handle multiple datasets in recover indices fix: `isinstance(self.__datasets, PullerStreamDataset)` feat: use the "spec" request to obtain the number of datasets fix: revert rollout worker * fix: revert async_rl_exp.py * fix flag for list (cuda_graph_bs) * format * [FIX] fix async task reward [sglang bf16-> fp16] * fix: define `self.__datasets` in advance * PullRequest: 130 [Refactor] Remove deprecated search related code Merge branch mzy/remove-search of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/130 Signed-off-by: 博惟 <bowei.fw@antgroup.com> * remove search related * PullRequest: 131 [Refactor] Change terminology "model parallel" into "tensor parallel" to align with megatron. Merge branch mzy/mp-to-tp of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/131?tab=comment Signed-off-by: 博惟 <bowei.fw@antgroup.com> * change mp to tp * . * . * PullRequest: 142 Fix an error for megatron backend destroy Merge branch fw/fix-meagatron-destroy of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/142 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * . * PullRequest: 143 Fix the port conflict issue of generation servers Merge branch fw/fix-gen-port of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/143?tab=comment Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * somehow fix the port issue * add clearance period * . * . * PullRequest: 145 Add code environment Merge branch fw/code-env of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/145?tab=comment Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * add code env * somehow fix the port issue * fix * PullRequest: 144 Add decoupled PPO loss Merge branch fw/decoupled-ppo-loss of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/144?tab=comment Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * fix ppo step logging, nan in stats tracker, and add decoupled loss * . * somehow fix the port issue * fix typo * PullRequest: 146 Merge SLURM logs and save experiment configs in yaml format. Merge branch fw/better-logging of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/146 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * merge all slurm logs into one * write config to yaml * PullRequest: 141 Merge changes during NeurIPS submission Merge branch fw/async-dev of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/141 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * . * . * . * . * . * . * . * . * . * update script * . * . * . * . * [ADD] add least req scheduling * fix test genreq * . * . * fix stats tracker nan * . * . * . * . * . * . * . * uppper clip decoupled objective * add throughput exp script * . * remove behav upper clip param * . * . * . * plot curve * update thpt script * . * master worker raise error when exiting * update script * add gen throughput logging * . * . * add decoupled wandb data * . * fix port issue and add no training option * . * enlarge ttl * remove gserver manager await staled * update weights in groups * . * . * . * add port clearance period * . * . * . * add plot script * add sft throughput eval * . * log tokens in null interface * 消融实验和interruptible generation * 画图脚本/运行脚本/数据结果 * . * remove scripts * add port test * remove force_sync_reward * revert some changes * . * revert * revert fix * fix * revert * fix typo * support qwen3 training * PullRequest: 147 Support interruption in SGLang and fix a KeyError in gather-scatter communication Merge branch fw/sglang046-with-abort-request of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/147?tab=diff Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * fix ppo step logging, nan in stats tracker, and add decoupled loss * . * somehow fix the port issue * initial commit * add interupt request * fix data transfer issue * max concurrent rollouts defaults to train batch size * merge main * add patch * fix patch typp * revert sglang * fix typo * fix minor typo * . * pip show editable sglang path * PullRequest: 149 fix: code faas max_retries Merge branch xss/fix_code_verifier of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/149 Reviewed-by: 博惟 <bowei.fw@antgroup.com> * fix: code faas max_retries * PullRequest: 150 [Bug Fix] Fix key errors in `_run_scatter` in data transfer Merge branch mzy/fix-scatter-groups of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/150 Reviewed-by: 博惟 <bowei.fw@antgroup.com> * fix scatter groups key error * fix test * . * PullRequest: 151 Fix Qwen3 import error when using transformers with a lower version Merge branch fw/fix-qwen3 of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/151 Reviewed-by: 温差 <xushusheng.xss@antgroup.com> * merge all slurm logs into one * write config to yaml * . * PullRequest: 152 Support sglang0.4.6 and fix master_worker import error Merge branch adopt_sglang046 of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/152 Reviewed-by: 博惟 <bowei.fw@antgroup.com> * Support sglang0.4.6 and fix master_worker import error * remove disable_mla option * PullRequest: 155 [FIX] reduce port conflicts Merge branch sxj/reduce_port_conflict of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/155 Reviewed-by: 博惟 <bowei.fw@antgroup.com> * [FIX] reduce port conflicts * PullRequest: 153 Fix stuck and recover issues for async experiments Merge branch fw/stable-async of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/153 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * fix sample cnt stuck * fix recover * code cleanup * merge all slurm logs into one * write config to yaml * . * . * . * revert birth time change * . * enlarge sock connect timeout * PullRequest: 158 [Fix] Fix the error where "accepted" is not defined Merge branch fw/fix-rollout-accepted of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/158 Reviewed-by: 温差 <xushusheng.xss@antgroup.com> * . * PullRequest: 154 Fix unit tests and simplify package installation Merge branch fw/v0.3.0-tests of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/154?tab=comment Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * fix some tests * fix tests except for experiments * fix tests * fix tests * . * . * PullRequest: 159 [fix] Enlarge the default aiohttp connection timeout and fix a recover error in model worker Merge branch fw/stable-async of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/159 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * fix sample cnt stuck * fix recover * code cleanup * merge all slurm logs into one * write config to yaml * . * . * . * revert birth time change * . * enlarge sock connect timeout * . * PullRequest: 160 set sock_connect as rollout_request_timeout in partial_rollout.py Merge branch xss/rollout_timeout of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/160 Reviewed-by: 博惟 <bowei.fw@antgroup.com> * set sock_connect as rollout_request_timeout in partial_rollout.py * PullRequest: 161 Prioritize rollouts that are submitted earlier rather than arrived earlier Merge branch fw/birth-time of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/161 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * . * blocking push * PullRequest: 163 [bugfix] Fix synchronized training when birth time is absent Merge branch fw/fix-sync-birthtime of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/163 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * . * PullRequest: 164 [Refactor] Move cluster spec into CLI args Merge branch fw/refactor-cluster-spec of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/164?tab=comment Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * set cluster spec path in args * . * fix * add default cluster spec * PullRequest: 165 Normally exit all workers after experiment completion Merge branch fw/exit-all-workers of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/165 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * . * . * PullRequest: 167 [Feature] Use chunked logits computation to alleviate SGLang OOM Merge branch fw/patch-sglang-oom of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/167 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * . * PullRequest: 166 [Feature] Support single-script experiment launch with Ray Merge branch fw/turbolaunch of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/166?tab=comment Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * add training script without ray name resolve * add ray name resolve * ray worker * run * run async * local run * set cluster spec path in args * . * . * fix * . * . * . * . * . * update config * . * minor renaming * PullRequest: 169 [Doc] Add v0.3.0 docs based on jupyter-book Merge branch fw/doc of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/169 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * add docs * refine doc * refine doc * PullRequest: 170 [Feature] Amend configs for ray scripts Merge branch fw/ray-configs of git@code.alipay.com:inclusionAI/AReaL.git into main https://code.alipay.com/inclusionAI/AReaL/pull_requests/170 Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com> * . --------- Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Co-authored-by: wanghuaijie.whj <wanghuaijie.whj@antgroup.com> Co-authored-by: Tiwei Bie <tiwei.btw@antgroup.com> Co-authored-by: kira.gw <kira.gw@antgroup.com> Co-authored-by: shenxujie.sxj <shenxujie.sxj@antgroup.com> Co-authored-by: 晓雷 <meizhiyu.mzy@antgroup.com> Co-authored-by: sam.gjx <sam.gjx@antgroup.com> Co-authored-by: 温差 <xushusheng.xss@antgroup.com> Co-authored-by: 履渊 <yuhong.gyh@antgroup.com>
2025-05-28 19:32:42 +08:00 · 2025-05-28 19:32:42 +08:00 · 7826fdbb87
parent cf46993a30
commit 7826fdbb87
4 changed files with 232 additions and 0 deletions
--- a/training/configs/async-ppo/async-ppo-1.7b-gpu32.yaml
+++ b/training/configs/async-ppo/async-ppo-1.7b-gpu32.yaml
@ -0,0 +1,65 @@
+max_head_offpolicyness: 4
+experiment_name: async-ppo-1.7b-gpu32
+trial_name: my-trial
+mode: ray
+cluster:
+  fileroot: /storage/ray/experiments
+wandb:
+  mode: disabled
+recover_mode: auto
+recover_retries: 10
+allocation_mode: sglang.d24p1m1+d4p2m1
+n_nodes: 4
+n_gpus_per_node: 8
+cache_clear_freq: 1
+exp_ctrl:
+  total_train_epochs: 5
+  save_freq_epochs: 1
+  ckpt_freq_secs: 600
+torch_cache_mysophobia: true
+dataset:
+  path: /storage/datasets/boba_106k_0319.jsonl
+  max_prompt_len: 1024
+  train_bs_n_seqs: 512
+group_size: 16
+group_adv_norm: false
+actor:
+  type:
+    _class: qwen3
+  path: /storage/openpsi/models/Qwen3-1.7B/
+  optimizer:
+    lr: 2e-05
+    lr_scheduler_type: constant
+    eps: 1e-5
+    warmup_steps_proportion: 0.001
+    hysteresis: 2
+  sglang:
+    mem_fraction_static: 0.8
+actor_train:
+  mb_spec:
+    max_tokens_per_mb: 30720
+actor_gen:
+  mb_spec:
+    max_tokens_per_mb: 30720
+actor_inf:
+  mb_spec:
+    max_tokens_per_mb: 30720
+ppo:
+  gen:
+    max_new_tokens: 27648
+    min_new_tokens: 0
+    top_p: 1.0
+    top_k: 1000000
+    temperature: 1.0
+  ppo_n_minibatches: 4
+  kl_ctl: 0.0
+  discount: 1.0
+  value_eps_clip: 0.2
+  disable_value: true
+  reward_output_scaling: 5
+  reward_output_bias: 0.0
+  adv_norm: true
+  value_norm: true
+  recompute_logprob: true
+  use_decoupled_loss: true
+
--- a/training/configs/async-ppo/async-ppo-1.7b-gpu8.yaml
+++ b/training/configs/async-ppo/async-ppo-1.7b-gpu8.yaml
@ -0,0 +1,65 @@
+max_head_offpolicyness: 4
+experiment_name: async-ppo-1.7b-gpu8
+trial_name: my-trial
+mode: ray
+cluster:
+  fileroot: /storage/ray/experiments
+wandb:
+  mode: disabled
+recover_mode: auto
+recover_retries: 10
+allocation_mode: sglang.d4p1m1+d2p2m1
+n_nodes: 1
+n_gpus_per_node: 8
+cache_clear_freq: 1
+exp_ctrl:
+  total_train_epochs: 5
+  save_freq_epochs: 1
+  ckpt_freq_secs: 600
+torch_cache_mysophobia: true
+dataset:
+  path: /storage/datasets/boba_106k_0319.jsonl
+  max_prompt_len: 1024
+  train_bs_n_seqs: 512
+group_size: 16
+group_adv_norm: false
+actor:
+  type:
+    _class: qwen3
+  path: /storage/openpsi/models/Qwen3-1.7B/
+  optimizer:
+    lr: 2e-05
+    lr_scheduler_type: constant
+    eps: 1e-5
+    warmup_steps_proportion: 0.001
+    hysteresis: 2
+  sglang:
+    mem_fraction_static: 0.8
+actor_train:
+  mb_spec:
+    max_tokens_per_mb: 30720
+actor_gen:
+  mb_spec:
+    max_tokens_per_mb: 30720
+actor_inf:
+  mb_spec:
+    max_tokens_per_mb: 30720
+ppo:
+  gen:
+    max_new_tokens: 27648
+    min_new_tokens: 0
+    top_p: 1.0
+    top_k: 1000000
+    temperature: 1.0
+  ppo_n_minibatches: 4
+  kl_ctl: 0.0
+  discount: 1.0
+  value_eps_clip: 0.2
+  disable_value: true
+  reward_output_scaling: 5
+  reward_output_bias: 0.0
+  adv_norm: true
+  value_norm: true
+  recompute_logprob: true
+  use_decoupled_loss: true
+
--- a/training/configs/ppo/ppo-1.5b-gpu32.yaml
+++ b/training/configs/ppo/ppo-1.5b-gpu32.yaml
@ -0,0 +1,62 @@
+experiment_name: ppo-1.7b-gpu32
+trial_name: my-trial
+mode: ray
+cluster:
+  fileroot: /storage/ray/experiments
+wandb:
+  mode: disabled
+recover_mode: auto
+recover_retries: 10
+allocation_mode: sglang.d16p1m1+d8p2m1
+n_nodes: 4
+n_gpus_per_node: 8
+cache_clear_freq: 1
+exp_ctrl:
+  total_train_epochs: 5
+  save_freq_epochs: 1
+  ckpt_freq_secs: 600
+torch_cache_mysophobia: true
+dataset:
+  path: /storage/datasets/boba_106k_0319.jsonl
+  max_prompt_len: 1024
+  train_bs_n_seqs: 512
+group_size: 16
+group_adv_norm: false
+actor:
+  type:
+    _class: qwen3
+  path: /storage/openpsi/models/Qwen3-1.7B/
+  optimizer:
+    lr: 2e-05
+    lr_scheduler_type: constant
+    eps: 1e-5
+    warmup_steps_proportion: 0.001
+    hysteresis: 2
+  sglang:
+    mem_fraction_static: 0.8
+actor_train:
+  mb_spec:
+    max_tokens_per_mb: 30720
+actor_gen:
+  mb_spec:
+    max_tokens_per_mb: 30720
+actor_inf:
+  mb_spec:
+    max_tokens_per_mb: 30720
+ppo:
+  gen:
+    max_new_tokens: 27648
+    min_new_tokens: 0
+    top_p: 1.0
+    top_k: 1000000
+    temperature: 1.0
+  ppo_n_minibatches: 4
+  kl_ctl: 0.0
+  discount: 1.0
+  value_eps_clip: 0.2
+  disable_value: true
+  reward_output_scaling: 5
+  reward_output_bias: 0.0
+  adv_norm: true
+  value_norm: true
+
--- a/training/configs/sft/sft-7b-gpu8.yaml
+++ b/training/configs/sft/sft-7b-gpu8.yaml
@ -0,0 +1,40 @@
+experiment_name: sft-7b-gpu8
+trial_name: my-trial
+mode: ray
+wandb:
+  mode: disabled
+recover_mode: auto
+recover_retries: 10
+allocation_mode: d2p4m1
+n_nodes: 1
+n_gpus_per_node: 8
+exp_ctrl:
+  total_train_epochs: 200
+  save_freq_epochs: 1
+  ckpt_freq_secs: 600
+torch_cache_mysophobia: true
+dataset:
+  train_path: /storage/datasets/boba-sft_200_0319.jsonl
+  valid_path: /storage/datasets/boba-sft_200_0319.jsonl
+  max_seqlen: 32768
+  train_bs_n_seqs: 16
+  valid_bs_n_seqs: 16
+model:
+    type:
+      _class: qwen2
+    path: /storage/models/DeepSeek-R1-Distill-Qwen-7B
+    optimizer:
+      type: adam
+      lr_scheduler_type: constant
+      lr: 1e-5
+      warmup_steps_proportion: 0.03
+      initial_loss_scale: 262144.0
+      loss_scale_window: 10
+      hysteresis: 2
+      weight_decay: 0.1
+      eps: 1e-5
+    bf16: true
+allocation:
+  mb_spec:
+    max_tokens_per_mb: 32768
+