Commit Graph

601 Commits

Author SHA1 Message Date
letaoqin
593dd7ad64 clear some code 2024-12-04 10:41:31 +00:00
letaoqin
6cb910352b add fp16 to test 2024-12-04 08:53:49 +00:00
letaoqin
4dd771958e add gelu to kernel 2024-12-03 14:18:29 +00:00
letaoqin
072dfbfe32 gemm0 debugged 2024-12-03 09:29:47 +00:00
letaoqin
69114f254c output sacc 2024-11-29 13:26:13 +00:00
letaoqin
bb7c411224 debugging 2024-11-29 09:10:30 +00:00
letaoqin
7881eff9d1 gemm down 2024-11-28 11:27:41 +00:00
letaoqin
6a03c66f89 start gemm down 2024-11-28 09:50:05 +00:00
letaoqin
b2030e3427 s_acc data to lds to shuffle 2024-11-28 05:30:26 +00:00
letaoqin
1d89463c00 add gmm0 code 2024-11-28 04:59:36 +00:00
letaoqin
7018dfb21d start gemm0 2024-11-27 13:27:34 +00:00
letaoqin
9ec586fc68 change a matrix lds desc 2024-11-27 09:43:04 +00:00
letaoqin
66efcf9603 change g tile distribution 2024-11-27 03:37:08 +00:00
letaoqin
fe44e66e99 add gemm0 for tokens*G 2024-11-26 14:23:26 +00:00
letaoqin
f363ec7f3b add tag for gather index 2024-11-26 03:50:58 +00:00
letaoqin
c1d6f9ec42 clear code 2024-11-26 03:28:41 +00:00
letaoqin
ef8e3620cc gather and scatter right 2024-11-25 07:40:03 +00:00
letaoqin
eaf8e6165b write a data to lds 2024-11-22 08:17:20 +00:00
letaoqin
3b51749a76 remove fused_moegemm_pipeline_gl.hpp 2024-11-22 06:13:33 +00:00
letaoqin
f9ac2337af change file name 2024-11-22 06:10:42 +00:00
letaoqin
b5d6100bbc change file name 2024-11-22 04:24:07 +00:00
“letaoqin”
f912ca405c fix call indexing adaptor issue 2024-11-21 02:07:25 +00:00
“letaoqin”
1561fc22d6 change indexing adapter to gather matrix 2024-11-20 13:16:26 +00:00
“letaoqin”
1caa8198f7 write a, g,d and o tensor 2024-11-19 08:47:35 +00:00
“letaoqin”
84755f74ff format 2024-11-16 02:02:01 +00:00
letaoqin
eab497e87f format 2024-11-15 00:39:38 +00:00
letaoqin
1476d7bba4 add gl pipeline 2024-11-14 11:18:05 +00:00
root
16dc96ebbd remove print runing info 2024-11-14 07:27:53 +00:00
carlushuang
572865a667 update first gemm ok 2024-11-14 00:12:36 +08:00
carlushuang
7ccdbe1619 update 2024-11-13 15:34:54 +08:00
carlushuang
e2a318bcd8 Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant 2024-11-12 20:30:49 +08:00
Thomas Ning
2b6458ddf2 [CK Tile] Improve the Layout, Padding, and Alignment features of CK Tile GEMM (#1651)
* Finished the feature

* Modified the test file

* Test case update

* addresss comment

* Addressed the review comment

* Fixed the CI error
2024-11-11 18:08:25 -08:00
valarLip
8ef8a994e7 [CK_TILE] add more stride for layernorm to support un-continuous Tensor (#1650)
* [CK_TILE] add more stride for layernorm to support un-continuous Tensor

* align CK coding style

* extend strides to layernrom expample

* clang-format...
2024-11-11 16:02:28 +08:00
carlushuang
d0405504de update 2024-11-11 16:01:34 +08:00
carlushuang
9d3cdd21fc Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant 2024-11-11 12:03:38 +08:00
carlushuang
06914eedc3 block-asm 2024-11-11 11:57:08 +08:00
Po Yen Chen
13332998a4 Return nullptr when block index is invalid (#1649) 2024-11-11 09:28:32 +08:00
dummycoderfe
bec6fbc65f Ck tile/moe sorting (#1624)
* add moe_sorting & check ok

* fix comments & typo

* Run remod.py under include/ck_tile & example/ck_tile directories

* format codes

* fix output ci check bug

* fix moe sorting readme and error commit file

* use magiv div to accelerate compute

* add an loop unroll for moe lds ops

* add extblocksnel to set zeros for moebufs

* [Ck_tile] moe set zero run ok, add size check and fix ref check

* [Ck_tile]fix moe_sorting fuse set_zero remod

* [Ck_tile] change name style, fix zero buffer size err, change folder

* [Ck_tile] moe_sorting: fix name style

* [Ck_tile] moe_sorting, remove useless params in traits

* [Ck_tile] change outputtile cnt * unit_size; change output buf alloc

---------

Co-authored-by: dummycoderfe <noplydummmycoder@163.com>
Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>
2024-11-09 17:57:27 +08:00
dummycoderfe
686a58a912 [Ck tile] layernorm2d fwd optimize (#1637)
* optimze small N case using vec io and using rcp div

* [Ck_tile] layernorm, add param to control fastdiv; change generate codes and test pass

* [Ck_tile] fix blockSize compute in Generic2dBlockShape

* [Ck_tile]fix kfastfdiv template style

* [Ck_tile] layernorm, fix stype in review

---------

Co-authored-by: dummycoderfe <noplydummmycoder@163.com>
2024-11-08 12:28:23 +08:00
Illia Silin
75c5bfa364 enable compilation for generic navi targets (#1645) 2024-11-07 14:14:42 -08:00
carlushuang
b0dd570a7a rename to ex pipeline 2024-11-07 14:57:12 +08:00
carlushuang
7977f89db4 Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant 2024-11-07 14:47:21 +08:00
carlushuang
4513162988 update pipeline 2024-11-07 14:46:55 +08:00
carlushuang
f09dc1f341 compiler ok 2024-11-07 00:24:00 +08:00
valarLip
3bb718ad5a update pipeline_gemm0 2024-11-06 18:25:18 +08:00
carlushuang
c6c3c142a3 update cpu reference 2024-11-06 16:38:18 +08:00
valarLip
a288c57c71 update 2024-11-06 10:13:50 +08:00
darren-amd
d0e3a70a2e Statically Cast Pointer Offset (#1631)
* explicit cast ptr offset

* formating change
2024-11-05 09:59:08 -08:00
carlushuang
cf64618358 compile OK 2024-11-06 00:01:43 +08:00
carlushuang
70fa98adf8 update code 2024-11-05 16:06:52 +08:00