lzhengning
  • Joined on 2022-03-22
lzhengning synced commits to refs/pull/2469/merge at lzhengning/cutlass from mirror 2025-07-29 22:33:59 +08:00
39a1a34e93 Merge 3ba63687086e2d15d476a1c1cd64f4672b5d8446 into 664c4f7b3ed1959414905025728eef5568209479
664c4f7b3e Update CUTLASS version to 4.1
0e026982ce Example 77 add blackwell fmha bwd for MLA shape (#2466)
9a9a579714 Merge pull request #2489 from NVIDIA/update_workflow_script
51d730b8be Support "CuTe DSL" auto-labeling in workflow
Compare 11 commits »
lzhengning synced commits to refs/pull/2466/head at lzhengning/cutlass from mirror 2025-07-29 22:33:59 +08:00
ac13cf44c0 code sync
26ae62b962 bug fix in casual mask backward
b468b12e85 Fix casual mask cnt when IsQBegin==false
c7a1f9dd52 bug fix & use existing value rather than pass one more argument to support different dim in bwd_convert
c68903ecae Update examples/77_blackwell_fmha/device/fmha_device_bwd.hpp
Compare 10 commits »
lzhengning synced commits to refs/pull/2465/merge at lzhengning/cutlass from mirror 2025-07-29 22:33:59 +08:00
9a13f55f92 Merge d258ff6c1fc0e9ee7b55910ffdc296b65a7085c0 into 9baa06dd57804ce8fb5efe9e471b3451341522c6
9baa06dd57 Add Blackwell MLA forward (shape: d=192, dv=128) implementation in example_77 (#2472)
ebe98c549a cache procedural_name in GemmOperation (#2317)
Compare 3 commits »
lzhengning synced commits to refs/pull/2457/merge at lzhengning/cutlass from mirror 2025-07-29 22:33:58 +08:00
e8b5d1b872 Merge 43a500fbd4fe595dd3838b6135fd7bca0ef5acdd into 664c4f7b3ed1959414905025728eef5568209479
664c4f7b3e Update CUTLASS version to 4.1
0e026982ce Example 77 add blackwell fmha bwd for MLA shape (#2466)
9a9a579714 Merge pull request #2489 from NVIDIA/update_workflow_script
51d730b8be Support "CuTe DSL" auto-labeling in workflow
Compare 10 commits »
lzhengning synced commits to refs/pull/2448/merge at lzhengning/cutlass from mirror 2025-07-29 22:33:58 +08:00
d9f5a17708 Merge 97dbe100958b2f2ff313f58b6c3a2e3330b366bc into 664c4f7b3ed1959414905025728eef5568209479
664c4f7b3e Update CUTLASS version to 4.1
0e026982ce Example 77 add blackwell fmha bwd for MLA shape (#2466)
9a9a579714 Merge pull request #2489 from NVIDIA/update_workflow_script
51d730b8be Support "CuTe DSL" auto-labeling in workflow
Compare 11 commits »
lzhengning synced commits to refs/pull/2447/merge at lzhengning/cutlass from mirror 2025-07-29 22:33:58 +08:00
94c6622b2d Merge a3ceac9cfe4abdb86d17758add216b2444500ff1 into 664c4f7b3ed1959414905025728eef5568209479
664c4f7b3e Update CUTLASS version to 4.1
0e026982ce Example 77 add blackwell fmha bwd for MLA shape (#2466)
9a9a579714 Merge pull request #2489 from NVIDIA/update_workflow_script
51d730b8be Support "CuTe DSL" auto-labeling in workflow
Compare 10 commits »
lzhengning synced commits to refs/pull/2419/head at lzhengning/cutlass from mirror 2025-07-29 22:33:58 +08:00
27ab3c8fdd Update evt_store_sm80_90.py
10e765ff36 Update evt_store_sm80_90.py
c84bbf7aa0 Update evt_compute_sm80_90.py
a04e733513 Merge branch 'main' into fix/python_evt_tracer
9baa06dd57 Add Blackwell MLA forward (shape: d=192, dv=128) implementation in example_77 (#2472)
Compare 10 commits »
lzhengning synced commits to refs/pull/2416/merge at lzhengning/cutlass from mirror 2025-07-29 22:33:58 +08:00
72741ee7e3 Merge 0a670e346fa35cdb7e1c3588dba1ae3a5ad124d8 into 664c4f7b3ed1959414905025728eef5568209479
664c4f7b3e Update CUTLASS version to 4.1
0e026982ce Example 77 add blackwell fmha bwd for MLA shape (#2466)
9a9a579714 Merge pull request #2489 from NVIDIA/update_workflow_script
51d730b8be Support "CuTe DSL" auto-labeling in workflow
Compare 11 commits »
lzhengning synced commits to refs/pull/2402/merge at lzhengning/cutlass from mirror 2025-07-29 22:33:58 +08:00
ce05a7e188 Merge 960e79417b71c53e0a3dfdf84e8cbc4d6ad30e6b into 664c4f7b3ed1959414905025728eef5568209479
664c4f7b3e Update CUTLASS version to 4.1
0e026982ce Example 77 add blackwell fmha bwd for MLA shape (#2466)
9a9a579714 Merge pull request #2489 from NVIDIA/update_workflow_script
51d730b8be Support "CuTe DSL" auto-labeling in workflow
Compare 11 commits »
lzhengning synced commits to refs/pull/2385/merge at lzhengning/cutlass from mirror 2025-07-29 22:33:58 +08:00
4db5e83cc8 Merge ce0368f39d0cee4c3c4559dcacc329d669ac18d3 into 664c4f7b3ed1959414905025728eef5568209479
664c4f7b3e Update CUTLASS version to 4.1
0e026982ce Example 77 add blackwell fmha bwd for MLA shape (#2466)
9a9a579714 Merge pull request #2489 from NVIDIA/update_workflow_script
51d730b8be Support "CuTe DSL" auto-labeling in workflow
Compare 11 commits »
lzhengning synced commits to refs/pull/2378/merge at lzhengning/cutlass from mirror 2025-07-29 22:33:58 +08:00
e98806b720 Merge 954077ba9fa0e431e120b11e5e0c46b93c82cabf into 664c4f7b3ed1959414905025728eef5568209479
664c4f7b3e Update CUTLASS version to 4.1
0e026982ce Example 77 add blackwell fmha bwd for MLA shape (#2466)
9a9a579714 Merge pull request #2489 from NVIDIA/update_workflow_script
51d730b8be Support "CuTe DSL" auto-labeling in workflow
Compare 11 commits »
lzhengning synced commits to refs/pull/2351/merge at lzhengning/cutlass from mirror 2025-07-29 22:33:58 +08:00
ee3a7da3b1 Merge b9b4b40ed264e8a78f0d9ca354a5d920fd844521 into 664c4f7b3ed1959414905025728eef5568209479
664c4f7b3e Update CUTLASS version to 4.1
0e026982ce Example 77 add blackwell fmha bwd for MLA shape (#2466)
9a9a579714 Merge pull request #2489 from NVIDIA/update_workflow_script
51d730b8be Support "CuTe DSL" auto-labeling in workflow
Compare 11 commits »
lzhengning synced commits to refs/pull/2333/merge at lzhengning/cutlass from mirror 2025-07-29 22:33:58 +08:00
c0ef26c287 Merge 78ebb8f1bd7fee3fcc17a08e49027c6dc3aab8fe into 664c4f7b3ed1959414905025728eef5568209479
664c4f7b3e Update CUTLASS version to 4.1
0e026982ce Example 77 add blackwell fmha bwd for MLA shape (#2466)
9a9a579714 Merge pull request #2489 from NVIDIA/update_workflow_script
51d730b8be Support "CuTe DSL" auto-labeling in workflow
Compare 11 commits »
lzhengning synced commits to refs/pull/2328/merge at lzhengning/cutlass from mirror 2025-07-29 22:33:58 +08:00
ee0a908170 Merge ab3b26eb53a000493f49475355a7f76a6eb3d165 into 664c4f7b3ed1959414905025728eef5568209479
664c4f7b3e Update CUTLASS version to 4.1
0e026982ce Example 77 add blackwell fmha bwd for MLA shape (#2466)
9a9a579714 Merge pull request #2489 from NVIDIA/update_workflow_script
51d730b8be Support "CuTe DSL" auto-labeling in workflow
Compare 10 commits »
lzhengning synced commits to refs/pull/2324/merge at lzhengning/cutlass from mirror 2025-07-29 22:33:57 +08:00
089096e0b0 Merge 9321e8d4d669b154f1db21d9a99f93cb43d73102 into 0e026982ce2ed10b27ec569c6e42035cb9118f62
0e026982ce Example 77 add blackwell fmha bwd for MLA shape (#2466)
9a9a579714 Merge pull request #2489 from NVIDIA/update_workflow_script
51d730b8be Support "CuTe DSL" auto-labeling in workflow
6c0c8b7484 1. Update bug/feature report template to add component selection. (#2485)
Compare 10 commits »
lzhengning synced commits to refs/pull/2305/merge at lzhengning/cutlass from mirror 2025-07-29 22:33:57 +08:00
bbe76fc5d0 Merge 5c9a9b1d5b8488d70e4b061972e5ca2101ad44bb into 664c4f7b3ed1959414905025728eef5568209479
664c4f7b3e Update CUTLASS version to 4.1
0e026982ce Example 77 add blackwell fmha bwd for MLA shape (#2466)
9a9a579714 Merge pull request #2489 from NVIDIA/update_workflow_script
51d730b8be Support "CuTe DSL" auto-labeling in workflow
Compare 11 commits »
lzhengning synced commits to refs/pull/2305/head at lzhengning/cutlass from mirror 2025-07-29 22:33:57 +08:00
5c9a9b1d5b Merge branch 'NVIDIA:main' into main
9baa06dd57 Add Blackwell MLA forward (shape: d=192, dv=128) implementation in example_77 (#2472)
ebe98c549a cache procedural_name in GemmOperation (#2317)
9892624b66 Fix typos in the text (#2417)
a1aaf2300a v4.1 release
Compare 15 commits »
lzhengning synced commits to refs/pull/2269/merge at lzhengning/cutlass from mirror 2025-07-29 22:33:57 +08:00
25963c461b Merge 1823cf242b7edeb8becb204c2204ab12e549b800 into 9baa06dd57804ce8fb5efe9e471b3451341522c6
9baa06dd57 Add Blackwell MLA forward (shape: d=192, dv=128) implementation in example_77 (#2472)
ebe98c549a cache procedural_name in GemmOperation (#2317)
9892624b66 Fix typos in the text (#2417)
Compare 4 commits »
lzhengning synced commits to refs/pull/2221/merge at lzhengning/cutlass from mirror 2025-07-29 22:33:57 +08:00
5d32007626 Merge 8ac04a249093b2be36c2658c676723bcb44b89e1 into 9a9a579714a7075546be7aa20af89fb17d0cd56f
9a9a579714 Merge pull request #2489 from NVIDIA/update_workflow_script
51d730b8be Support "CuTe DSL" auto-labeling in workflow
6c0c8b7484 1. Update bug/feature report template to add component selection. (#2485)
e51efbfe18 Update CHANGELOG.md
Compare 9 commits »
lzhengning synced commits to refs/pull/2179/merge at lzhengning/cutlass from mirror 2025-07-29 22:33:57 +08:00
e29b1cf79f Merge f1f48ba441623a8cc43001400e9f736393326e41 into 0e026982ce2ed10b27ec569c6e42035cb9118f62
0e026982ce Example 77 add blackwell fmha bwd for MLA shape (#2466)
9a9a579714 Merge pull request #2489 from NVIDIA/update_workflow_script
51d730b8be Support "CuTe DSL" auto-labeling in workflow
6c0c8b7484 1. Update bug/feature report template to add component selection. (#2485)
Compare 10 commits »