Commit Graph

160 Commits

Author SHA1 Message Date
朱晗 f5924b1851 0724_merge7 2025-07-24 19:34:22 +08:00
Wei Fu 311bcd7697
[lite] [feature] Bump to SGLang v0.4.9.post2 and use NCCL to update weights (#196)
* PullRequest: 353 [Lite] Add gradient checkpointing to FSDPEngine

Merge branch mzy/add-gradient-ckpt of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/353

Reviewed-by: 博惟 <bowei.fw@antgroup.com>


* add gradient checkpointing

* PullRequest: 354 [lite] GRPO pre-commit: minor changes in FSDP  engine

Merge branch fw/lite-fix1 of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/354

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* .
* .
* .

* PullRequest: 355 [Lite] GRPO pre-commit 2: Refactor RemoteSGLangEngine thread and SGLang configuration

Merge branch fw/lite-fix1 of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/355?tab=commit

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* .
* .
* .
* .
* .
* fix
* .

* PullRequest: 357 [lite] GRPO pre-commit 3: Fix typos and experiment utilities

Merge branch fw/lite-fix2 of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/357?tab=comment

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* .
* .
* .
* .
* fix destroy process group

* PullRequest: 358 [lite] Support GRPO training locally with the GSM8k dataset

Merge branch fw/lite-fix3 of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/358

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* .
* .
* .
* fix loss mask
* fix
* .

* PullRequest: 368 [lite] Refactor train engine after merging contributions from GitHub

Merge branch fw/lite-train-engine of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/368

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* .

* PullRequest: 371 [lite] [fix] fix misc bugs in GRPO implementation

Merge branch fw/lite-fix0716 of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/371

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .

* PullRequest: 370 [lite] Add Slurm Launcher and Ray Launcher

Merge branch mzy/lite/launcher of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/370

Reviewed-by: 博惟 <bowei.fw@antgroup.com>


* .
* .
* .
* fix

* PullRequest: 392 [lite] Fix several bugs regarding RL learning and add an example to reproduce boba-math results.

Merge branch fw/lite-boba of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/392

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* support fsdp engine and sglang remote engine
* minor fix
* .
* refactor trainer
* add close
* rm mb_spec
* .
* fix
* .
* qwen2 grpo works
* fix
* fix
* async works
* fix
* slurm launcher not tested
* fix arg parse
* .
* sglang server wrapper
* .
* .
* slurm run
* ready for boba
* debug
* 32k run
* .
* .
* fix
* .
* .
* .
* .
* .
* fix
* .
* fix
* .
* .
* .
* .
* fix
* .
* .
* .
* .
* .
* .
* .
* refactor train engine
* refactor train engine
* .
* fix update weight error
* .
* .
* match train
* format
* .
* fix
* seems to work
* .
* .
* .
* .

* .

* PullRequest: 408 [Feature] Bump SGLang version to v0.4.9.post2

Merge branch fw/sgl049 of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/408

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* bump arealite to sglang 0.4.9.post2
* .
* PullRequest: 412 [lite] Minor refactor on `UpdateWeightMeta`

* PullRequest: 422 [lite] Fix tests and scripts after updating sgl to 0.4.9

Merge branch fw/sgl049 of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/422

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* bump arealite to sglang 0.4.9.post2
* .
* PullRequest: 412 [lite] Minor refactor on `UpdateWeightMeta`
* .

* PullRequest: 423 [lite] Remove the boba example for github release.

Merge branch fw/remove-boba of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/423

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* .

---------

Co-authored-by: 晓雷 <meizhiyu.mzy@antgroup.com>
2025-07-24 15:34:52 +08:00
朱晗 176ec4bb23 0724_merge3 2025-07-24 14:29:29 +08:00
朱晗 e97e33fca8 0724_merge2 2025-07-24 14:24:54 +08:00
朱晗 c816a3ce44 0724_merge1 2025-07-24 14:18:42 +08:00
朱晗 6bde86a934 reformatted 2025-07-23 13:58:25 +08:00
朱晗 c27a51bc1e 0722_5 2025-07-22 16:22:59 +08:00
lichangye.lcy ea12141f2b 0722_4 2025-07-22 16:03:29 +08:00
朱晗 f451dbd692 0721_merge7 2025-07-21 18:10:24 +08:00
lichangye.lcy 9fcc177237 0721_4 2025-07-21 13:28:23 +08:00
朱晗 a157510799 0721_3 2025-07-21 11:34:28 +08:00
朱晗 c0176b5bee 0718_1 2025-07-18 10:25:06 +08:00
Wei Fu 29e164a69d
[Fix] [lite] Merge from the internal repo to fix GRPO bugs and refactor the train engine (#181)
* PullRequest: 353 [Lite] Add gradient checkpointing to FSDPEngine

Merge branch mzy/add-gradient-ckpt of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/353

Reviewed-by: 博惟 <bowei.fw@antgroup.com>


* add gradient checkpointing

* PullRequest: 354 [lite] GRPO pre-commit: minor changes in FSDP  engine

Merge branch fw/lite-fix1 of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/354

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* .
* .
* .

* PullRequest: 355 [Lite] GRPO pre-commit 2: Refactor RemoteSGLangEngine thread and SGLang configuration

Merge branch fw/lite-fix1 of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/355?tab=commit

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* .
* .
* .
* .
* .
* fix
* .

* PullRequest: 357 [lite] GRPO pre-commit 3: Fix typos and experiment utilities

Merge branch fw/lite-fix2 of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/357?tab=comment

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* .
* .
* .
* .
* fix destroy process group

* PullRequest: 358 [lite] Support GRPO training locally with the GSM8k dataset

Merge branch fw/lite-fix3 of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/358

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* .
* .
* .
* fix loss mask
* fix
* .

* PullRequest: 368 [lite] Refactor train engine after merging contributions from GitHub

Merge branch fw/lite-train-engine of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/368

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* .

* PullRequest: 371 [lite] [fix] fix misc bugs in GRPO implementation

Merge branch fw/lite-fix0716 of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/371

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .

---------

Co-authored-by: 晓雷 <meizhiyu.mzy@antgroup.com>
2025-07-16 17:26:49 +08:00
博惟 b56f5998ec PullRequest: 371 [lite] [fix] fix misc bugs in GRPO implementation
Merge branch fw/lite-fix0716 of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/371

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
2025-07-16 17:22:54 +08:00
Wei Fu e13db01f67
[lite] [refactor] Add GSM8k GRPO example. (#179)
* PullRequest: 353 [Lite] Add gradient checkpointing to FSDPEngine

Merge branch mzy/add-gradient-ckpt of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/353

Reviewed-by: 博惟 <bowei.fw@antgroup.com>


* add gradient checkpointing

* PullRequest: 354 [lite] GRPO pre-commit: minor changes in FSDP  engine

Merge branch fw/lite-fix1 of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/354

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* .
* .
* .

* PullRequest: 355 [Lite] GRPO pre-commit 2: Refactor RemoteSGLangEngine thread and SGLang configuration

Merge branch fw/lite-fix1 of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/355?tab=commit

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* .
* .
* .
* .
* .
* fix
* .

* PullRequest: 357 [lite] GRPO pre-commit 3: Fix typos and experiment utilities

Merge branch fw/lite-fix2 of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/357?tab=comment

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* .
* .
* .
* .
* fix destroy process group

* fix ci

* PullRequest: 358 [lite] Support GRPO training locally with the GSM8k dataset

Merge branch fw/lite-fix3 of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/358

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* .
* .
* .
* fix loss mask
* fix
* .

---------

Co-authored-by: 晓雷 <meizhiyu.mzy@antgroup.com>
2025-07-16 13:10:26 +08:00
朱晗 0435aa5394 merge2 2025-07-15 17:44:21 +08:00
博惟 8a15551949 PullRequest: 357 [lite] GRPO pre-commit 3: Fix typos and experiment utilities
Merge branch fw/lite-fix2 of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/357?tab=comment

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* .
* .
* .
* .
* .
* fix destroy process group
2025-07-14 18:41:22 +08:00
lichangye.lcy cad44882a5 0711_6 2025-07-11 15:52:44 +08:00
朱晗 8c7affe108 merge_lite 2025-07-10 15:59:25 +08:00
朱晗 50cf951a1b 0710_3 2025-07-10 13:04:52 +08:00
Wei Fu 3bf9c85e40
[Fix] Merge previous contributions from fw/refactor to lite (#163)
* initial proposal

* add arealite

* .

* change api

* .

* remove LOG_ROOT

* remove MODEL_SAVE_PATH

* remove PARAM_REALLOC_PATH, DATASET_CACHE

* prepare for testing

* prepare for testing

* ready for run

* local run

* tests mainly pass

* format

* .

* amend cluster.py

* .

* .

* client test pass

* pass rollout test

* remove unused imports

* add arealite readme

* change api

* .

* .

* .

* .

* .

* .

* .

* .

* format

* .

* implement iteraptable generation (#112)

Co-authored-by: zhaochenyang <zhaochenyang20@gmail.com>

* .

* fix

* .

* .

* .

* pass controller generate batch test

* .

* refactor rollout controller into worker and controller

* .

* .

* .

* change to async rollout

* pass rollout controller test

* pass test

* .

* update readme

* .

* sft debug

* .

* add lisence

* remove unused files

* remove unsed args in ppo

* add hf engine wrapper  (#116)

* add hf engine

* fix issues

* fix ppo bugs and add test

* add hf client interface and modify cli args

* fix bugs

* fix issues

* Merge fw/refactor

* Finish hf wrapper test

* add test

---------

Co-authored-by: Wei Fu <36355462+garrett4wade@users.noreply.github.com>

* format

* format

* .

* refine hf engine

* .

* fix

* add fsdp engine and sft tests

* .

* .

* .

* pass ppo unittest

* pass ppo and rollout controller tests

* clear unused imports

* rename ppo to grpo

* change reward function organization

* reorganize code

* add dataset api

* .

* .

* .

* format

* chmod fix

* .

* rename workflow to collector

* refactor llm_client location

* .

* .

* fix llm server api

* refactor config structure

* .

* fix tests

* .

* .

* .

* Fix unresolved issue in SFTTrainer PR (#139)

* .

* .

* efficient loading

* format

* .

* .

* .

* .

* .

* .

* Add CI for testing AReaLite (#150)

* ci: add test-arealite

* ci: add checkout before running test-arealite

* ci: add USERNAME

* ci: add test script

* ci: add GitHub mirror

* ci: fix typo

* ci: clone one commit

* ci: fix condition

* ci: set command timeout to 60m

* ci: enable pip cache

* ci: optimize container lifecycle

* ci: split into many stages

* ci(test-arealite): fix typo

* ci: fix wrong env

* ci: fix pytest

* ci: uninstall transformer-engine

* ci: uninstall transformer-engine

* ci: fix model paths

* ci: show stdout/stderr

* ci: fix not clean up

* ci: backup sglang

* ci: remove tmp repo dir when run

* ci: fix docker run exit 1 condition

* ci(test-arealite): limit the concurrency and extend command timeout

* .

* merge fw/refactor

* revert some changes

* fix

---------

Co-authored-by: meizhiyu.mzy <meizhiyu.mzy@antgroup.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: zhaochenyang <zhaochenyang20@gmail.com>
Co-authored-by: Jayon02 <qiujiangc@outlook.com>
Co-authored-by: root <meizhiyu.mzy>
Co-authored-by: Zijian Zhang <futrime@outlook.com>
2025-07-10 12:56:24 +08:00
博惟 d48bf007cf Merge branch 'main' of https://github.com/inclusionAI/AReaL into lite 2025-07-10 12:53:30 +08:00
博惟 c38cffc023 PullRequest: 340 [lite] Refactor trainer API into utilities and remove mb_spec in engine methods
Merge branch fw/lite-dev of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/340

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* support fsdp engine and sglang remote engine
* minor fix
* .
* refactor trainer
* add close
* rm mb_spec
* fix
2025-07-10 11:10:10 +08:00
博惟 15dfbe837c PullRequest: 332 [lite] Support FSDP engines
Merge branch mzy/lite/engines of git@code.alipay.com:inclusionAI/AReaL.git into lite
https://code.alipay.com/inclusionAI/AReaL/pull_requests/332

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* fsdp2 engine
* fix utils
* add fsdp engine test
* .
* fsdp engine test passed
* unsqueeze immediately before model inputs and after model outputts
* add optimizer save/load, add position id calculation for input
* .
* format
* not to squeeze
* add train and eval api
* .
* .
* improve fsdp engine data preprocessing
* format
* PullRequest: 337 [lite] Add SFT trainer example.
* trainer log
* minor changes
* add update weights from disk
* fix type annotation
2025-07-09 16:24:25 +08:00
xssstory 17ea7fe94d
fix math reward verifier (#156)
* PullRequest: 293 fix get_param_realloc_path

Merge branch xss/debug of git@code.alipay.com:inclusionAI/AReaL.git into gh
https://code.alipay.com/inclusionAI/AReaL/pull_requests/293

Reviewed-by: 博惟 <bowei.fw@antgroup.com>


* fix get_param_realloc_path

* PullRequest: 297 bugfix: reward is always -5

Merge branch xss/debug of git@code.alipay.com:inclusionAI/AReaL.git into gh
https://code.alipay.com/inclusionAI/AReaL/pull_requests/297

Reviewed-by: 博惟 <bowei.fw@antgroup.com>


* bugfix: reward is always -5

* PullRequest: 321 fix checkpoint save dir

Merge branch xss/debug of git@code.alipay.com:inclusionAI/AReaL.git into gh
https://code.alipay.com/inclusionAI/AReaL/pull_requests/321

Reviewed-by: 博惟 <bowei.fw@antgroup.com>


* fix checkpoint save dir

* PullRequest: 328 [Doc] update installation

Merge branch sxj/doc of git@code.alipay.com:inclusionAI/AReaL.git into gh
https://code.alipay.com/inclusionAI/AReaL/pull_requests/328

Reviewed-by: 博惟 <bowei.fw@antgroup.com>


* [Doc] update installation

* PullRequest: 329 bugfix: math verifier blocks the async training

Merge branch xss/debug of git@code.alipay.com:inclusionAI/AReaL.git into gh
https://code.alipay.com/inclusionAI/AReaL/pull_requests/329

Reviewed-by: 博惟 <bowei.fw@antgroup.com>


* bugfix: math verifier block the async training

* format

---------

Co-authored-by: 冰临 <shenxujie.sxj@antgroup.com>
Co-authored-by: garrett4wade <fuwth17@gmail.com>
2025-07-07 15:49:13 +08:00
朱晗 17ed423746 Merge branch 'lcy/refactor' into fw/refactor 2025-07-07 12:31:30 +08:00
Wei Fu 0ff8c59435
[Fix] Merge error fixes. (#152)
* PullRequest: 293 fix get_param_realloc_path

Merge branch xss/debug of git@code.alipay.com:inclusionAI/AReaL.git into gh
https://code.alipay.com/inclusionAI/AReaL/pull_requests/293

Reviewed-by: 博惟 <bowei.fw@antgroup.com>


* fix get_param_realloc_path

* PullRequest: 297 bugfix: reward is always -5

Merge branch xss/debug of git@code.alipay.com:inclusionAI/AReaL.git into gh
https://code.alipay.com/inclusionAI/AReaL/pull_requests/297

Reviewed-by: 博惟 <bowei.fw@antgroup.com>


* bugfix: reward is always -5

* PullRequest: 321 fix checkpoint save dir

Merge branch xss/debug of git@code.alipay.com:inclusionAI/AReaL.git into gh
https://code.alipay.com/inclusionAI/AReaL/pull_requests/321

Reviewed-by: 博惟 <bowei.fw@antgroup.com>


* fix checkpoint save dir

* PullRequest: 328 [Doc] update installation

Merge branch sxj/doc of git@code.alipay.com:inclusionAI/AReaL.git into gh
https://code.alipay.com/inclusionAI/AReaL/pull_requests/328

Reviewed-by: 博惟 <bowei.fw@antgroup.com>


* [Doc] update installation

---------

Co-authored-by: 温差 <xushusheng.xss@antgroup.com>
Co-authored-by: 冰临 <shenxujie.sxj@antgroup.com>
2025-07-07 10:30:27 +08:00
bowei.fw 89a8d8c46a . 2025-07-04 16:28:32 +08:00
antoiegg1 685045fee2 image_process0703_1 2025-07-03 11:44:54 +08:00
bowei.fw a5299b1bed . 2025-07-02 11:28:03 +08:00
antoinegg1 3c36e4e266 image_process0701_3 2025-07-01 17:43:27 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 e0aee03109
Fix several syntax warning related to invalid escape sequence (#144)
by using raw strings or properly escaping the backslashes.

```log
AReaL/realhf/impl/dataset/math_parser.py:292: SyntaxWarning: invalid escape sequence '\%'
  string = string.replace("\%", "")
AReaL/realhf/impl/dataset/math_parser.py:402: SyntaxWarning: invalid escape sequence '\d'
  pattern = "-?\d*\.?\d+"
AReaL/realhf/impl/model/parallelism/tensor_parallel/modules.py:1125: SyntaxWarning: invalid escape sequence '\s'
```

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-07-01 10:12:39 +08:00
antoiegg1 b13c3a68f1 test_vlm 2025-06-30 10:56:33 +08:00
bowei.fw d2a317d1fc refactor config structure 2025-06-28 11:23:22 +08:00
antoiegg1 d96f77a57b processor 2025-06-27 13:47:02 +08:00
bowei.fw 15537cb013 . 2025-06-26 21:50:03 +08:00
bowei.fw 1be260ea6a change reward function organization 2025-06-26 13:36:44 +08:00
bowei.fw 242243b6a6 pass ppo and rollout controller tests 2025-06-26 10:55:07 +08:00
garrett4wade 9b8306c28c . 2025-06-25 17:05:19 +08:00
Wei Fu adeb8eb13f
[Fix] Fix yaml configurations for v0.2 experiments. (#129)
* .

* fix
2025-06-24 13:48:02 +08:00
garrett4wade 7f1397e37b Merge branch 'main' of https://github.com/inclusionAI/AReaL into fw/refactor 2025-06-23 20:14:52 +08:00
Wei Fu 1ec1399f19
PullRequest: 252 [Feature] Fix constants initialization. (#122)
Merge branch fw/gh/fix-init-constants of git@code.alipay.com:inclusionAI/AReaL.git into gh
https://code.alipay.com/inclusionAI/AReaL/pull_requests/252?tab=comment

Reviewed-by: 晓雷 <meizhiyu.mzy@antgroup.com>


* remove LOG_ROOT
* remove MODEL_SAVE_PATH
* remove PARAM_REALLOC_PATH, DATASET_CACHE
* prepare for testing
* prepare for testing
* ready for run
* local run
* tests mainly pass
* format
* amend cluster.py
* fix
2025-06-23 12:52:49 +08:00
bowei.fw 9bf9b18a94 pass rollout controller test 2025-06-22 19:52:15 +08:00
bowei.fw 6ce5ec3de4 . 2025-06-20 14:59:46 +08:00
bowei.fw eeda029084 Merge branch 'fw/gh/fix-init-constants' of code.alipay.com:inclusionAI/AReaL into fw/refactor 2025-06-20 10:33:39 +08:00
bowei.fw 88bae72720 fix 2025-06-20 10:32:39 +08:00
xichengpro bb14f022dc
Support using SwanLab for experiment tracking (#98)
* Support using SwanLab for experiment tracking

* docs: improve WandB and SwanLab integration documentation
- Added official links for better user reference
- Used backticks to quote commands and parameters
- Unified mode settings to use "online" / "cloud" convention
- Merged WandB and SwanLab descriptions into a single concise statement
- Added note on using `swanlab.mode="local"` when server connection is unavailable

* refactor: update default value of api_key

* fix: correct help description from WandB to SwanLab in SwanLabConfig

* refactor: merge log_swanlab_tensorboard and log_wandb_tensorboard into log_swanlab_wandb_tensorboard

 - Unified logging logic for SwanLab, WandB, and TensorBoard to reduce code duplication

* chore: update swanlab version in dependency config files

 - Updated SwanLab version in pyproject.toml
 - Updated SwanLab version in requirements.txt

* refactor: enhance SwanLab config handling for logging purposes
- Config now uses provided arguments first
- Falls back to reading from config.yaml if no input is given

* docs: add note on using  when server connection is unavailable

* refactor: merge _LATEST_WANDB_STEP and _LATEST_SWANLAB_STEP into _LATEST_LOG_STEP

* Format code with black and isort

* chore: update swanlab version in dependency config files
- Updated SwanLab version in requirements.txt

* refactor: rename swanlab_wandb_data to log_data

---------

Co-authored-by: dubingnan <dubingnan@360.cn>
2025-06-16 19:51:31 +08:00
bowei.fw 29b3d50bd0 client test pass 2025-06-16 17:34:51 +08:00
garrett4wade cd771add95 . 2025-06-16 10:32:35 +08:00
bowei.fw 0d075661ea amend cluster.py 2025-06-16 10:10:46 +08:00