Files
composable_kernel/include/ck/tensor_operation/gpu/device
Enrico Degregori 507d81c3af Fix splitk preshuffle (#3137)
* Fix splitK multiply_multiply_wp

* Add tests for gemm_multiply_multiply_wp

* Add tests for gemm_universal_preshuffle (KBatch = 1)

* Add tests gemm_blockscale_wp

* Fix splitk gemm universal preshuffle

* Run new tests on arch supporting fp8

* Restore example

* Fix strides profiler

* Fix tests

* Fix clang format

* Finalize profiler preshuffle with tolerances

* Minor improvements to splitk related changes

* Address review comments: clang format and ckProfiler typo

* Remove b_k_split_offset from SplitKBatchOffset struct
2025-11-03 11:59:01 -08:00
..
2025-11-03 11:59:01 -08:00
2024-03-08 17:11:51 -08:00
2025-03-10 11:16:44 +08:00
2023-08-15 02:25:28 +08:00
2024-06-25 16:37:35 -05:00