Wei Fu
|
c0200f10d0
|
[Feature] Support behavior importance weight capping and update evaluation scripts (#59)
* PullRequest: 168 添加Codeforces测试,修复其它测试问题
Merge branch areal-eval-0.3 of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/168
Reviewed-by: 博惟 <bowei.fw@antgroup.com>
* fix eval and add codeforces elo calc
* fix codeforce test
* fix qwen3 prompt
* change annotations to eng
* add code verify files
* PullRequest: 173 [FIX} format code and fix a recover error in rollout worker
Merge branch fw/fix-rollout-recover of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/173
Reviewed-by: 温差 <xushusheng.xss@antgroup.com>
* format code and fix a recover error in rollout worker
* PullRequest: 171 更新评估文档
Merge branch eval-doc of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/171?tab=diff
Reviewed-by: 博惟 <bowei.fw@antgroup.com>
* update eval doc
* complete eval doc
* complete eval doc
* fix ood info
* add data obtaining guide
* fix supported datasets
* PullRequest: 174 decouple max_behav_imp_weight and c_clip & track entropy, positve_seq_len and negative_seq_len
Merge branch xss/max_behav_imp_weight of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/174
Reviewed-by: 博惟 <bowei.fw@antgroup.com>
* decouple max_behav_imp_weight and c_clip
* rename log: positve_* -> correct_*, negative_* -> incorrect_*
* rename hyper-parameter: max_behav_imp_weight -> behav_imp_weight_cap
* PullRequest: 175 [Fix] Fix the "event loop is already running" error in ray scripts
Merge branch fw/fix-ray-asyncio of git@code.alipay.com:inclusionAI/AReaL.git into main
https://code.alipay.com/inclusionAI/AReaL/pull_requests/175
Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>
* format code and fix a recover error in rollout worker
* .
---------
Co-authored-by: 乘鹭 <hechuyi.hcy@antgroup.com>
Co-authored-by: 温差 <xushusheng.xss@antgroup.com>
|
2025-05-30 10:29:21 +08:00 |