xssstory
17ea7fe94d
fix math reward verifier ( #156 )
...
* PullRequest: 293 fix get_param_realloc_path
Merge branch xss/debug of git@code.alipay.com:inclusionAI/AReaL.git into gh
https://code.alipay.com/inclusionAI/AReaL/pull_requests/293
Reviewed-by: 博惟 <bowei.fw@antgroup.com>
* fix get_param_realloc_path
* PullRequest: 297 bugfix: reward is always -5
Merge branch xss/debug of git@code.alipay.com:inclusionAI/AReaL.git into gh
https://code.alipay.com/inclusionAI/AReaL/pull_requests/297
Reviewed-by: 博惟 <bowei.fw@antgroup.com>
* bugfix: reward is always -5
* PullRequest: 321 fix checkpoint save dir
Merge branch xss/debug of git@code.alipay.com:inclusionAI/AReaL.git into gh
https://code.alipay.com/inclusionAI/AReaL/pull_requests/321
Reviewed-by: 博惟 <bowei.fw@antgroup.com>
* fix checkpoint save dir
* PullRequest: 328 [Doc] update installation
Merge branch sxj/doc of git@code.alipay.com:inclusionAI/AReaL.git into gh
https://code.alipay.com/inclusionAI/AReaL/pull_requests/328
Reviewed-by: 博惟 <bowei.fw@antgroup.com>
* [Doc] update installation
* PullRequest: 329 bugfix: math verifier blocks the async training
Merge branch xss/debug of git@code.alipay.com:inclusionAI/AReaL.git into gh
https://code.alipay.com/inclusionAI/AReaL/pull_requests/329
Reviewed-by: 博惟 <bowei.fw@antgroup.com>
* bugfix: math verifier block the async training
* format
---------
Co-authored-by: 冰临 <shenxujie.sxj@antgroup.com>
Co-authored-by: garrett4wade <fuwth17@gmail.com>
2025-07-07 15:49:13 +08:00
Wei Fu
448bb05a3d
Fix formatting ( #90 )
2025-06-06 21:42:12 +08:00
Ligeng Zhu
de134b4a7a
[Feature] Switch dataset path / model path to HF location to ease community usage ( #82 )
...
* Update .gitignore and modify dataset paths in scripts for improved file management and compatibility with Hugging Face datasets. Additionally, refactor dataset loading functions to utilize load_hf_or_local_file for better flexibility.
* Remove sglang subproject and update dataset path format in load_hf_or_local_file function for compatibility with Hugging Face datasets.
* Refactor imports in grader.py and parser.py to include sympy for improved functionality.
2025-06-06 21:38:06 +08:00
Wei Fu
4fab3ac769
[Doc & Fix] Simplify the environment setup procedure ( #62 )
...
* PullRequest: 176 [FIX] clear sensitive info
Merge branch fw/fix-sensitive-info of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/176
Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>
* .
* .
* .
* .
* .
* test env setup
* fix
* allow cached model
* .
* revise docs
* change docs
* format docs
* update readme
2025-06-01 14:57:21 +08:00
Wei Fu
c60d128b14
Support asynchronous RL training, Qwen3, and the latest SGLang ( #47 )
...
* feat: one buffer for each task
* feat: support "one buffer for each task" for async
* make kv_cache_dtype configurable
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
* style: use plural form
fix: use _seed_from_key to set different seeds for data loaders
fix: call load_data for one buffer each time
* PullRequest: 125 Support running async experiments in the 2407 image.
Merge branch fw/async2407 of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/125
Signed-off-by: 晓雷 <meizhiyu.mzy@antgroup.com>
* .
* fix: handle multiple datasets in recover indices
fix: `isinstance(self.__datasets, PullerStreamDataset)`
feat: use the "spec" request to obtain the number of datasets
fix: revert rollout worker
* fix: revert async_rl_exp.py
* fix flag for list (cuda_graph_bs)
* format
* [FIX] fix async task reward [sglang bf16-> fp16]
* fix: define `self.__datasets` in advance
* PullRequest: 130 [Refactor] Remove deprecated search related code
Merge branch mzy/remove-search of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/130
Signed-off-by: 博惟 <bowei.fw@antgroup.com>
* remove search related
* PullRequest: 131 [Refactor] Change terminology "model parallel" into "tensor parallel" to align with megatron.
Merge branch mzy/mp-to-tp of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/131?tab=comment
Signed-off-by: 博惟 <bowei.fw@antgroup.com>
* change mp to tp
* .
* .
* PullRequest: 142 Fix an error for megatron backend destroy
Merge branch fw/fix-meagatron-destroy of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/142
Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>
* .
* PullRequest: 143 Fix the port conflict issue of generation servers
Merge branch fw/fix-gen-port of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/143?tab=comment
Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>
* somehow fix the port issue
* add clearance period
* .
* .
* PullRequest: 145 Add code environment
Merge branch fw/code-env of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/145?tab=comment
Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>
* add code env
* somehow fix the port issue
* fix
* PullRequest: 144 Add decoupled PPO loss
Merge branch fw/decoupled-ppo-loss of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/144?tab=comment
Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>
* fix ppo step logging, nan in stats tracker, and add decoupled loss
* .
* somehow fix the port issue
* fix typo
* PullRequest: 146 Merge SLURM logs and save experiment configs in yaml format.
Merge branch fw/better-logging of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/146
Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>
* merge all slurm logs into one
* write config to yaml
* PullRequest: 141 Merge changes during NeurIPS submission
Merge branch fw/async-dev of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/141
Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>
* .
* .
* .
* .
* .
* .
* .
* .
* .
* update script
* .
* .
* .
* .
* [ADD] add least req scheduling
* fix test genreq
* .
* .
* fix stats tracker nan
* .
* .
* .
* .
* .
* .
* .
* uppper clip decoupled objective
* add throughput exp script
* .
* remove behav upper clip param
* .
* .
* .
* plot curve
* update thpt script
* .
* master worker raise error when exiting
* update script
* add gen throughput logging
* .
* .
* add decoupled wandb data
* .
* fix port issue and add no training option
* .
* enlarge ttl
* remove gserver manager await staled
* update weights in groups
* .
* .
* .
* add port clearance period
* .
* .
* .
* add plot script
* add sft throughput eval
* .
* log tokens in null interface
* 消融实验和interruptible generation
* 画图脚本/运行脚本/数据结果
* .
* remove scripts
* add port test
* remove force_sync_reward
* revert some changes
* .
* revert
* revert fix
* fix
* revert
* fix typo
* support qwen3 training
* PullRequest: 147 Support interruption in SGLang and fix a KeyError in gather-scatter communication
Merge branch fw/sglang046-with-abort-request of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/147?tab=diff
Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>
* fix ppo step logging, nan in stats tracker, and add decoupled loss
* .
* somehow fix the port issue
* initial commit
* add interupt request
* fix data transfer issue
* max concurrent rollouts defaults to train batch size
* merge main
* add patch
* fix patch typp
* revert sglang
* fix typo
* fix minor typo
* .
* pip show editable sglang path
* PullRequest: 149 fix: code faas max_retries
Merge branch xss/fix_code_verifier of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/149
Reviewed-by: 博惟 <bowei.fw@antgroup.com>
* fix: code faas max_retries
* PullRequest: 150 [Bug Fix] Fix key errors in `_run_scatter` in data transfer
Merge branch mzy/fix-scatter-groups of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/150
Reviewed-by: 博惟 <bowei.fw@antgroup.com>
* fix scatter groups key error
* fix test
* .
* PullRequest: 151 Fix Qwen3 import error when using transformers with a lower version
Merge branch fw/fix-qwen3 of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/151
Reviewed-by: 温差 <xushusheng.xss@antgroup.com>
* merge all slurm logs into one
* write config to yaml
* .
* PullRequest: 152 Support sglang0.4.6 and fix master_worker import error
Merge branch adopt_sglang046 of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/152
Reviewed-by: 博惟 <bowei.fw@antgroup.com>
* Support sglang0.4.6 and fix master_worker import error
* remove disable_mla option
---------
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Co-authored-by: wanghuaijie.whj <wanghuaijie.whj@antgroup.com>
Co-authored-by: Tiwei Bie <tiwei.btw@antgroup.com>
Co-authored-by: kira.gw <kira.gw@antgroup.com>
Co-authored-by: shenxujie.sxj <shenxujie.sxj@antgroup.com>
Co-authored-by: 晓雷 <meizhiyu.mzy@antgroup.com>
Co-authored-by: sam.gjx <sam.gjx@antgroup.com>
Co-authored-by: 温差 <xushusheng.xss@antgroup.com>
Co-authored-by: 履渊 <yuhong.gyh@antgroup.com>
2025-05-26 09:45:13 +08:00
nuzant
ffc52a1520
Merge updates from ant repository. ( #34 )
...
* Cherry-pick commit 90dfd575
"PullRequest: 84 [ADD..." 到当前分支
* Cherry-pick commit 15e787b7
"PullRequest: 44 eval..." 到当前分支
* Cherry-pick commit f255ef60
"PullRequest: 85 add ..." 到当前分支
* Cherry-pick commit c2b4006a
"PullRequest: 86 Supp..." 到当前分支
* Cherry-pick commit fa6c0f3d
"PullRequest: 87 upda..." 到当前分支
* Cherry-pick commit a9ff4af0
"PullRequest: 88 Bump..." 到当前分支
* Cherry-pick commit 763839aa
"PullRequest: 89 Add ..." 到当前分支
* Cherry-pick commit 21e8064a
"PullRequest: 90 Merg..." 到当前分支
* Cherry-pick commit 94e97670
"PullRequest: 92 Supp..." 到当前分支
* Cherry-pick commit 92710522
"PullRequest: 91 Supp..." 到当前分支
* Cherry-pick commit 95aa3f28
"PullRequest: 93 Supp..." 到当前分支
* Cherry-pick commit 62191f8f
"PullRequest: 94 Add ..." 到当前分支
* Cherry-pick commit baa0249a
"PullRequest: 95 Form..." 到当前分支
* Cherry-pick commit e32945f2
"PullRequest: 96 Chan..." 到当前分支
* Cherry-pick commit b59286e3
"PullRequest: 98 fix ..." 到当前分支
* Cherry-pick commit ca2ba43e
"PullRequest: 97 Move..." 到当前分支
* Cherry-pick commit f941700b
"PullRequest: 99 Refa..." 到当前分支
* Cherry-pick commit 95439e70
"PullRequest: 100 Add..." 到当前分支
* Cherry-pick commit f3ebd941
"PullRequest: 101 Add..." 到当前分支
* Cherry-pick commit ee4779ea
"PullRequest: 103 [Fe..." 到当前分支
* Cherry-pick commit ce5e24ec
"PullRequest: 104 [Fi..." 到当前分支
* Cherry-pick commit b385761f
"PullRequest: 105 [Bu..." 到当前分支
* Cherry-pick commit 4c21fbb5
"PullRequest: 106 [Bu..." 到当前分支
* Cherry-pick commit 7f3f14e0
"PullRequest: 108 [Fi..." 到当前分支
* Cherry-pick commit 8de62701
"PullRequest: 107 [Fe..." 到当前分支
* Cherry-pick commit ea864b21
"PullRequest: 24 [Fea..." 到当前分支
* Cherry-pick commit 4a658db3
"PullRequest: 109 [Bu..." 到当前分支
* Cherry-pick commit aaa12bf1
"PullRequest: 110 [Bu..." 到当前分支
* Cherry-pick commit 6adb6d9f
"PullRequest: 112 [Fi..." 到当前分支
* Cherry-pick commit 55556bc5
"PullRequest: 111 [Fe..." 到当前分支
* Cherry-pick commit bfe5ec94
"PullRequest: 114 pri..." 到当前分支
* Cherry-pick commit 44529c9b
"PullRequest: 113 spl..." 到当前分支
* Cherry-pick commit b1cc73df
"PullRequest: 116 [FI..." 到当前分支
* Cherry-pick commit eff598ce
"PullRequest: 115 [Fi..." 到当前分支
* Cherry-pick commit f7149475
"PullRequest: 119 [Fi..." 到当前分支
* Cherry-pick commit f1017bfe
"PullRequest: 121 add..." 到当前分支
* Cherry-pick commit 56f6de8d
"PullRequest: 120 set..." 到当前分支
---------
Co-authored-by: 冰临 <shenxujie.sxj@antgroup.com>
Co-authored-by: 温差 <xushusheng.xss@antgroup.com>
Co-authored-by: 郭唯 <kira.gw@antgroup.com>
Co-authored-by: 博惟 <bowei.fw@antgroup.com>
Co-authored-by: 君末 <meijun.mei@antgroup.com>
2025-04-27 11:09:25 +08:00
bowei.fw
25c45c7e83
.
2025-03-22 12:22:34 +08:00
bowei.fw
9dcdb7a684
.
2025-03-21 22:38:21 +08:00
bowei.fw
d1554585a4
Merge branch 'main' of code.alipay.com:inclusionAI/AReaL into async-ref-rew
2025-03-21 21:25:35 +08:00
bowei.fw
de8243cc78
.
2025-03-21 21:22:50 +08:00
博惟
f90fe19e00
PullRequest: 53 Fix a potential reward hacking issue related to "emptyset"
...
Merge branch fw/fix-rwd-hacking of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/53
Signed-off-by: 温差 <xushusheng.xss@antgroup.com>
* .
* .
2025-03-21 16:42:24 +08:00
晓雷
4ac9595295
PullRequest: 43 Reduce GPU memory used by data transfer.
...
Merge branch mzy/fix-data-transfer-oom of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/43
Signed-off-by: 博惟 <bowei.fw@antgroup.com>
* add oom observe logs
* tested
* format and clear code
* .
* format
* remove logging
* .
* add comments
2025-03-18 15:20:57 +08:00
Jun Mo
8310d7beb7
code format
2025-03-18 09:00:13 +08:00
meijun.mei
92e0777178
optimize code/math functioncall param
2025-03-17 15:45:52 +08:00
博惟
fb23009e99
PullRequest: 27 support bf16 training
...
Merge branch fw/bf16 of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/27
Signed-off-by: 晓雷 <meizhiyu.mzy@antgroup.com>
* support bf16 training
2025-03-12 13:04:30 +08:00
meijun.mei
6fd0db01c7
add functioncall switch
2025-03-11 18:41:46 +08:00
博惟
ca42e43638
PullRequest: 9 Refactoring data transfer for v2 workers.
...
Merge branch fw/datatransfer-v2 of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/9
Signed-off-by: 晓雷 <meizhiyu.mzy@antgroup.com>
* .
* fw/fix-dataloading-not-shuffle
* .
* .
* .
* .
* .
* add v2 master worker
* cpu test pass
* ppo run
* .
* pass sft test
* pass ppo dp test
* format
* fix
* run
* .
* cleanup
* .
* format
* run
* merge and format
* refactor
* sft pass
* .
* format
* format
* format
* .
* .
2025-03-08 18:26:24 +08:00
君末
b3bedd7b9d
PullRequest: 17 Support functioncall for math and code verify
...
Merge branch functioncall-code of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/17
Signed-off-by: 晓雷 <meizhiyu.mzy@antgroup.com>
* test code evaluation with faas
* support functioncall for code
* fix code crash bug
* format
* .
2025-03-07 11:26:57 +08:00