Commit Graph

3 Commits

Author SHA1 Message Date
asleepzzz
de1a2544c6 Revert "[BlockScale GEMM] FP8 Blockscale GEMM optimization and ckProfiler (#1913)" (#1933)
This reverts commit 06e1eee9bbb737f0ee3b2374f1857838c2b8ef3f.

[ROCm/composable_kernel commit: ef16010273]
2025-03-03 07:17:39 -08:00
Haocong WANG
f43d87b09c [BlockScale GEMM] FP8 Blockscale GEMM optimization and ckProfiler (#1913)
* Added two kernel for M=32 problem

* Comment the first one

* Enable multiply_multiply for Scale_Block_M = 1 for deepseek

* Modify the a_thread offset since the A data load is different from B.

* edit fp8 ab scale for Scale_Block_M=1

* edit GemmSpec to MNKPadding

* enable blockwise pipelie v1 and v2. v1 is work for small K.

* add instance for gemm_ab_scale

* fix cmakelist of ckProfiler

* optimize blockscale gemm. todo: reduce vgpr usage

* fix a correctness bug

* sanity checked

* revert ckprofiler cmake changes

* clang format

* revert unnecessary changes.

* remove commented codes.

---------

Co-authored-by: mtgu0705 <mtgu@amd.com>
Co-authored-by: chenjun <junchen2@amd.com>

[ROCm/composable_kernel commit: 020148d0f7]
2025-02-25 15:42:20 +08:00
Haocong WANG
a0e0f3cdcc [GEMM] F8 GEMM, performance optimized. (#1384)
* add ab_scale init support

* enabled interwave

* add scale type; update isSupport

* adjust example

* clean

* enable f8 pure gemm rcr ckprofiler

* Add gemm_multiply_multiply instances

* clang format

* Optimize for ScaleBlockMNK=128

* enable abscale f8 gemm ck profiler

* Add pure f8 gemm test suite

* Reverting to the state of project at f60fd77

* update copyright

* clang format

* update copyright

---------

Co-authored-by: root <jizhan@amd.com>

[ROCm/composable_kernel commit: 8c90f25be3]
2024-07-19 22:06:52 +08:00