Commit Graph

589 Commits

Author SHA1 Message Date
letaoqin
66efcf9603 change g tile distribution 2024-11-27 03:37:08 +00:00
letaoqin
fe44e66e99 add gemm0 for tokens*G 2024-11-26 14:23:26 +00:00
letaoqin
f363ec7f3b add tag for gather index 2024-11-26 03:50:58 +00:00
letaoqin
c1d6f9ec42 clear code 2024-11-26 03:28:41 +00:00
letaoqin
ef8e3620cc gather and scatter right 2024-11-25 07:40:03 +00:00
letaoqin
eaf8e6165b write a data to lds 2024-11-22 08:17:20 +00:00
letaoqin
3b51749a76 remove fused_moegemm_pipeline_gl.hpp 2024-11-22 06:13:33 +00:00
letaoqin
f9ac2337af change file name 2024-11-22 06:10:42 +00:00
letaoqin
b5d6100bbc change file name 2024-11-22 04:24:07 +00:00
“letaoqin”
f912ca405c fix call indexing adaptor issue 2024-11-21 02:07:25 +00:00
“letaoqin”
1561fc22d6 change indexing adapter to gather matrix 2024-11-20 13:16:26 +00:00
“letaoqin”
1caa8198f7 write a, g,d and o tensor 2024-11-19 08:47:35 +00:00
“letaoqin”
84755f74ff format 2024-11-16 02:02:01 +00:00
letaoqin
eab497e87f format 2024-11-15 00:39:38 +00:00
letaoqin
1476d7bba4 add gl pipeline 2024-11-14 11:18:05 +00:00
root
16dc96ebbd remove print runing info 2024-11-14 07:27:53 +00:00
carlushuang
572865a667 update first gemm ok 2024-11-14 00:12:36 +08:00
carlushuang
7ccdbe1619 update 2024-11-13 15:34:54 +08:00
carlushuang
e2a318bcd8 Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant 2024-11-12 20:30:49 +08:00
Thomas Ning
2b6458ddf2 [CK Tile] Improve the Layout, Padding, and Alignment features of CK Tile GEMM (#1651)
* Finished the feature

* Modified the test file

* Test case update

* addresss comment

* Addressed the review comment

* Fixed the CI error
2024-11-11 18:08:25 -08:00
valarLip
8ef8a994e7 [CK_TILE] add more stride for layernorm to support un-continuous Tensor (#1650)
* [CK_TILE] add more stride for layernorm to support un-continuous Tensor

* align CK coding style

* extend strides to layernrom expample

* clang-format...
2024-11-11 16:02:28 +08:00
carlushuang
d0405504de update 2024-11-11 16:01:34 +08:00
carlushuang
9d3cdd21fc Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant 2024-11-11 12:03:38 +08:00
carlushuang
06914eedc3 block-asm 2024-11-11 11:57:08 +08:00
Po Yen Chen
13332998a4 Return nullptr when block index is invalid (#1649) 2024-11-11 09:28:32 +08:00
dummycoderfe
bec6fbc65f Ck tile/moe sorting (#1624)
* add moe_sorting & check ok

* fix comments & typo

* Run remod.py under include/ck_tile & example/ck_tile directories

* format codes

* fix output ci check bug

* fix moe sorting readme and error commit file

* use magiv div to accelerate compute

* add an loop unroll for moe lds ops

* add extblocksnel to set zeros for moebufs

* [Ck_tile] moe set zero run ok, add size check and fix ref check

* [Ck_tile]fix moe_sorting fuse set_zero remod

* [Ck_tile] change name style, fix zero buffer size err, change folder

* [Ck_tile] moe_sorting: fix name style

* [Ck_tile] moe_sorting, remove useless params in traits

* [Ck_tile] change outputtile cnt * unit_size; change output buf alloc

---------

Co-authored-by: dummycoderfe <noplydummmycoder@163.com>
Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>
2024-11-09 17:57:27 +08:00
dummycoderfe
686a58a912 [Ck tile] layernorm2d fwd optimize (#1637)
* optimze small N case using vec io and using rcp div

* [Ck_tile] layernorm, add param to control fastdiv; change generate codes and test pass

* [Ck_tile] fix blockSize compute in Generic2dBlockShape

* [Ck_tile]fix kfastfdiv template style

* [Ck_tile] layernorm, fix stype in review

---------

Co-authored-by: dummycoderfe <noplydummmycoder@163.com>
2024-11-08 12:28:23 +08:00
Illia Silin
75c5bfa364 enable compilation for generic navi targets (#1645) 2024-11-07 14:14:42 -08:00
carlushuang
b0dd570a7a rename to ex pipeline 2024-11-07 14:57:12 +08:00
carlushuang
7977f89db4 Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant 2024-11-07 14:47:21 +08:00
carlushuang
4513162988 update pipeline 2024-11-07 14:46:55 +08:00
carlushuang
f09dc1f341 compiler ok 2024-11-07 00:24:00 +08:00
valarLip
3bb718ad5a update pipeline_gemm0 2024-11-06 18:25:18 +08:00
carlushuang
c6c3c142a3 update cpu reference 2024-11-06 16:38:18 +08:00
valarLip
a288c57c71 update 2024-11-06 10:13:50 +08:00
darren-amd
d0e3a70a2e Statically Cast Pointer Offset (#1631)
* explicit cast ptr offset

* formating change
2024-11-05 09:59:08 -08:00
carlushuang
cf64618358 compile OK 2024-11-06 00:01:43 +08:00
carlushuang
70fa98adf8 update code 2024-11-05 16:06:52 +08:00
carlushuang
7c81aee830 Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant 2024-11-05 10:18:10 +08:00
carlushuang
49c39b5126 moe pipeline 2024-11-05 10:17:41 +08:00
carlushuang
cb6c5d39dc [CK_TILE] layernorm have more accurate residual (#1623)
* more accurate residual

* modify comment

* Fix literal case in README.md

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-11-02 13:30:16 +08:00
rocking
fbd654545a [Ck_tile] smoothquant (#1617)
* fix compile error

* fix typo of padding

* Add smoothquant op

* Add smoothquant instance library

* refine type

* add test script

* Re-generate smoothquant.hpp

* Always use 'current year' in copyright

* use Generic2dBlockShape instead

* Add vector = 8 instance back

* Find exe path automatically

* Simplify the api condition

* Remove debugging code

* update year

* Add blank line between function declaration

* explicitly cast return value to dim3

* refine return value

* Fix default warmup and repeat value

* Add comment

* refactor sommthquant cmake

* Add README

* Fix typo

---------

Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com>
2024-11-01 13:51:56 +08:00
carlushuang
550248deec [layernorm] hot fix (#1620)
* hot fix ln

* some rename
2024-11-01 11:52:50 +08:00
carlushuang
c3a4800c5f [CK_TILE] layernorm support fused-quant/fused-add (#1604)
* add prenorm/postnorm support, refactor using generate.py

* update README

* update README

* fix format

* update some description and fix format

* update format

* format

* use non-raw for loading

* format and update n4096

* dynamic-quant ready

* update readme

* support fused dynamic-quant

* update fused-quant, with smooth

* update README

* update args

* update some based on comment
2024-10-31 14:54:53 +08:00
Bartłomiej Kocot
9a8a52130d Remove virtual destructors from unary ops (#1610)
* Remove virtual destructors from unary ops

* Fixes

* Fixes

* clang format fixes
2024-10-30 17:42:50 +01:00
rocking
7d9111545f clang-format (#1612) 2024-10-30 08:13:30 -07:00
Adam Osewski
24d996aae1 [CK-Tile] Universal gemm memory bound pipeline (#1558)
* CK-Tile GEMM with memory bound pipeline.

* Memory bound gemm pipeline.

* Fix not closed namespace.

* Block gemm mem pipeline draft.

* Do not use ck_tile:: within ck_tile namespace.

* Refactoring & Move Layout info to pipeline problem.

* Get hot loop and TailNum information before lunching kernel.

* Fixes in pipeline.

* Add comment to load_tile_raw and change variable naming style.

* Few small changes & formatting.

* Do not use macro.

* Add gtests.

* Use AccDataType for Output of MFMA instruction.

* Formatting.

* Refactor gemm examples.

* Switch over to current block gemm.

* Use currently available pipeline policy.

* Refactoring and review comment.s

* Fixes after merge.

* Add missing include.

* Add load tile overload which accepts output tensor as parameter.

* This give 8% perf boost at the cost of using more registers.

* Rename example.

* Small changes.

* Fix compilation err and lower K.

* Support different layouts for A/B

* Fix vector size for different layouts.

* Rename Alignment into VectorSize

* Unblock tests.
2024-10-30 10:05:15 +01:00
rocking
3d60953477 [Ck tile] support rmsnorm and related fusion (#1605)
* Add reduce2d new api

* Prevent user use cross warp reduction

* Fix bug of std caculation

* Add rmsnorm2d

* Add rmsnorm small example

* Remove static assert to prevent compile fail

* Add script to test performance and correctness

* Add missing cmake change

* refine naming

* refine example of rmsnorm

* Fix bug of rmsnorm

* Refine naming

* Fix cmake

* clang format

* Refine pipeline name

* Add add_rmsnorm2d_rdquant kernel

* Add reduce op

* host verification

* Fix bug of one pass pipeline

* Refine tile size

* Add two pass pipeline

* Rename two pass to three pass

* Fix bug of kSaveX == false

* Add instance library

* Add test script

* Fix bug of x verification

* Add save_x to trait

* Add README

* Move reduce2d into reduce folder

* Fix bug of welford when number of m warp > 1

* remove reduncant comment

* 1. move 06_rmsnorm2d to 10_rmsnorm2d
2. move 07_add_rmsnorm2d_rdquant to 11_add_rmsnorm2d_rdquant

* clang format and add missing header

* Add host validation of add + layernorm2d + rsquant

* Revert "Add host validation of add + layernorm2d + rsquant"

This reverts commit 936cb45797.

* Remove deprecated flag
2024-10-30 15:22:56 +08:00
Qianfeng
8632221814 [CK_TILE] Add fmha fwd headdim96 support (#1608)
* Add ceil_to_qualified_tile_length()

* Rename kK0BlockLength to kQKHeaddim

* Add kSubQKHeaddim concept to support headdim96

* Fix in math.hpp to avoid using __half interfaces

* Add LdsBufferSequence instance for headdim96

* Update in fmha_fwd/fmha_fwd_splitkv codegen to support hd96 testing

* Disable hd96 instance generation in codegen fmha_fwd and fmha_fwd_splitkv to save compiling time

* Reformat one file

* Fix text alignment in fmha_fwd_splitkv.py

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-10-30 14:03:16 +08:00
valarLip
4d7e063a0a [CK_TILE] add scatter_gather (#1609) 2024-10-29 18:19:29 +08:00