Files
composable_kernel/include/ck/tensor_operation/gpu/device
Enrico Degregori 4ebc48a3cd WMMA gemm_add_relu_add_layernorm (#2989)
* Summary:

 - Refactor epilogue (with CShuffle) to support fused operations:
    - EpilogueCShuffleBase holds common parts
    - EpilogueCShuffle: runs CShuffle and write out
    - EpilogueWelfordCShuffle: holds Welford specific arguments, runs CShuffle, write out, Welford first part and Welford write out

 - Extend thread transfer v7r3:
    - Support for intermediate data type different from src and dst type
    - New functionality to write to dst buffer and keep data (to be able to use them for additional operations)

* Adress review comments
2025-10-31 11:19:26 -07:00
..
2024-03-08 17:11:51 -08:00
2025-03-10 11:16:44 +08:00
2023-08-15 02:25:28 +08:00
2024-06-25 16:37:35 -05:00