Commit Graph

2034 Commits

Author SHA1 Message Date
CHEN Xinsheng 6edc1f74a3 add cub_cumsum & cumprod 2024-10-17 21:23:32 +08:00
CHEN Xinsheng c886f01b53 fix warp (class case) 2024-10-17 21:22:26 +08:00
lidongyang fc1fff8c0e update version to 1.3.9.11 2024-10-08 23:03:24 +08:00
lidongyang 72be1396d9 fix: jupyter restart error 2024-10-08 23:02:27 +08:00
zjp_shadow 8966ca4320 fix transpose 2024-10-06 21:20:12 +08:00
uyzhang a078268e18 polish 2024-10-01 19:34:51 +08:00
uyzhang 33898421e4 Merge branch 'main' of https://github.com/CSCG-Lab/JittorHW into main 2024-10-01 18:16:47 +08:00
uyzhang 4c6d726a4c Refactor transpose_acl function and fix bug in matmul_acl 2024-10-01 18:14:08 +08:00
张仪 cb75c8dedd format 2024-09-29 13:47:22 +08:00
uyzhang c268a0bfaf Refactor aclnn.h and acl_op.h to add support for FlashAttention and FlashAttentionBackward 2024-09-29 12:29:41 +08:00
uyzhang 146574d7d1 Refactor transpose_acl function and fix bug in matmul_acl 2024-09-27 19:47:13 +08:00
uyzhang 4329f3b287 Refactor transpose_acl function and fix bug in matmul_acl 2024-09-27 19:44:04 +08:00
zjp_shadow b48d8664a1 add transpose 2024-09-27 19:37:19 +08:00
uyzhang c7c7326456 fixed the bug in matmul 2024-09-27 16:54:43 +08:00
Yi Zhang 810530b3cc Merge pull request #1 from CSCG-Lab/concat
Update concat
2024-09-25 12:19:26 +08:00
zjp_shadow d648713ec5 Update concat 2024-09-25 00:37:36 +08:00
uyzhang 934885c96e Merge branch 'main' of https://github.com/CSCG-Lab/JittorHW into main 2024-09-23 23:12:49 +08:00
uyzhang dc29fa69dc FEAT! opt transpose in matmul and bmm 2024-09-23 23:12:44 +08:00
uyzhang c3df41e77b Refactor acl_compiler.py to handle gradient accumulation in bmm_acl and matmul_acl functions 2024-09-23 23:11:04 +08:00
uyzhang d092b83d0b Merge branch 'main' of https://github.com/CSCG-Lab/JittorHW into main 2024-09-23 22:43:47 +08:00
uyzhang 74aa4e68c2 Refactor acl_compiler.py to handle gradient accumulation in bmm_acl and matmul_acl functions 2024-09-23 22:43:44 +08:00
uyzhang 7fa22e2e32 add Ellipsis 2024-09-23 22:27:44 +08:00
uyzhang 9578e30972 Refactor acl_compiler.py to handle gradient accumulation in bmm_acl and matmul_acl functions 2024-09-23 20:40:53 +08:00
uyzhang 37671ccec1 Refactor acl_compiler.py to handle gradient accumulation in bmm_acl and matmul_acl functions 2024-09-23 20:26:46 +08:00
uyzhang 657687e0c0 Refactor acl_compiler.py to handle gradient accumulation in bmm_acl and matmul_acl functions 2024-09-23 16:09:42 +08:00
uyzhang 2a142ae73d fix bug of setitem cpu when use acl 2024-09-23 15:34:45 +08:00
uyzhang 9907aad7de fix getitem&setitem slice bug 2024-09-23 13:58:37 +08:00
uyzhang 2c2e8abe59 fix slice setitem 2024-09-23 13:18:49 +08:00
uyzhang 0d5035443e fix setitem not in graph 2024-09-23 03:26:12 +08:00
uyzhang fa288cb4d9 Refactor acl_op.h to use __fp16 for alphaValue in the case of ACL_FLOAT16 dtype 2024-09-22 18:06:38 +08:00
uyzhang 9ff62acf7d Refactor acl_op.h to use __fp16 for alphaValue in the case of ACL_FLOAT16 dtype
Refactor grad method for improved performance and synchronization
Index indices to int32
Fix getitem bug
Add getitem&setitem mask
2024-09-22 16:41:43 +08:00
lidongyang 8888b25ea7 fix getitem bug 2024-09-22 02:30:16 +08:00
lidongyang 464009af42 add getitem&setitem mask 2024-09-21 22:57:54 +08:00
uyzhang a357a7913d Refactor acl_op.h to use __fp16 for alphaValue in the case of ACL_FLOAT16 dtype 2024-09-21 17:17:47 +08:00
uyzhang 631a9a3aaa Refactor grad method for improved performance and synchronization 2024-09-21 14:20:10 +08:00
lidongyang 0705ed9d8f index indices to int32 2024-09-20 22:10:15 +08:00
uyzhang 015bd10210 Refactor flip and squeeze operations for improved performance and synchronization 2024-09-20 21:54:49 +08:00
lidongyang 898ec600b4 polish getitem&setitem 2024-09-20 21:44:47 +08:00
lidongyang babd92a002 polish getitem&setitem -1 2024-09-20 20:01:45 +08:00
lidongyang cdad66c01d polish output dtype 2024-09-20 19:43:56 +08:00
张仪 18afb843ad Fix synchronization issue in acl_op.h 2024-09-19 19:52:25 +08:00
张仪 4006f242de fixed bugs 2024-09-18 17:33:23 +08:00
张仪 e47a74a497 Fix broadcasting issue in acl_compiler.py and add support for setting item in jt.Var 2024-09-14 16:00:15 +08:00
lidongyang 651b24e634 add sigmoid embedding silu 2024-09-13 03:19:25 +08:00
lidongyang 0641a50a5d change op file to acl_op.h 2024-09-12 22:29:20 +08:00
lidongyang e00e4f099c add getitem&setitem 2024-09-12 20:25:48 +08:00
张仪 c55d49a8de add new aclop 2024-09-12 20:14:22 +08:00
张仪 3beeec78b1 add new aclop & fixed some bugs 2024-09-12 17:11:23 +08:00
张仪 eb89ae19ed add new aclop 2024-09-07 22:11:39 +08:00
张仪 21580ce80e update aclnn 2024-09-07 18:18:00 +08:00