Commit Graph

1553 Commits

Author SHA1 Message Date
letaoqin
66efcf9603 change g tile distribution 2024-11-27 03:37:08 +00:00
letaoqin
fe44e66e99 add gemm0 for tokens*G 2024-11-26 14:23:26 +00:00
letaoqin
f363ec7f3b add tag for gather index 2024-11-26 03:50:58 +00:00
letaoqin
c1d6f9ec42 clear code 2024-11-26 03:28:41 +00:00
letaoqin
ef8e3620cc gather and scatter right 2024-11-25 07:40:03 +00:00
letaoqin
eaf8e6165b write a data to lds 2024-11-22 08:17:20 +00:00
letaoqin
3b51749a76 remove fused_moegemm_pipeline_gl.hpp 2024-11-22 06:13:33 +00:00
letaoqin
f9ac2337af change file name 2024-11-22 06:10:42 +00:00
letaoqin
b5d6100bbc change file name 2024-11-22 04:24:07 +00:00
“letaoqin”
f912ca405c fix call indexing adaptor issue 2024-11-21 02:07:25 +00:00
“letaoqin”
1561fc22d6 change indexing adapter to gather matrix 2024-11-20 13:16:26 +00:00
“letaoqin”
1caa8198f7 write a, g,d and o tensor 2024-11-19 08:47:35 +00:00
“letaoqin”
84755f74ff format 2024-11-16 02:02:01 +00:00
letaoqin
eab497e87f format 2024-11-15 00:39:38 +00:00
letaoqin
1476d7bba4 add gl pipeline 2024-11-14 11:18:05 +00:00
letaoqin
6401c4cb53 change input parameters 2024-11-14 07:46:25 +00:00
root
16dc96ebbd remove print runing info 2024-11-14 07:27:53 +00:00
letaoqin
c8e91d41a4 comments 2024-11-14 14:42:05 +08:00
letaoqin
049cacff76 start 2024-11-14 12:12:23 +08:00
carlushuang
572865a667 update first gemm ok 2024-11-14 00:12:36 +08:00
carlushuang
9ec4e3f76e Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant 2024-11-13 15:35:04 +08:00
carlushuang
7ccdbe1619 update 2024-11-13 15:34:54 +08:00
Illia Silin
489c78d073 test rocm6.3 rc1 build 20 (#1659) 2024-11-12 09:35:33 -08:00
carlushuang
e2a318bcd8 Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant 2024-11-12 20:30:49 +08:00
Thomas Ning
2b6458ddf2 [CK Tile] Improve the Layout, Padding, and Alignment features of CK Tile GEMM (#1651)
* Finished the feature

* Modified the test file

* Test case update

* addresss comment

* Addressed the review comment

* Fixed the CI error
2024-11-11 18:08:25 -08:00
Illia Silin
5fb150dbe7 restore collecting performance of mixed prec gemms (#1648) 2024-11-11 09:25:08 -08:00
valarLip
8ef8a994e7 [CK_TILE] add more stride for layernorm to support un-continuous Tensor (#1650)
* [CK_TILE] add more stride for layernorm to support un-continuous Tensor

* align CK coding style

* extend strides to layernrom expample

* clang-format...
2024-11-11 16:02:28 +08:00
carlushuang
d0405504de update 2024-11-11 16:01:34 +08:00
carlushuang
9d3cdd21fc Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant 2024-11-11 12:03:38 +08:00
carlushuang
06914eedc3 block-asm 2024-11-11 11:57:08 +08:00
Po Yen Chen
13332998a4 Return nullptr when block index is invalid (#1649) 2024-11-11 09:28:32 +08:00
dummycoderfe
bec6fbc65f Ck tile/moe sorting (#1624)
* add moe_sorting & check ok

* fix comments & typo

* Run remod.py under include/ck_tile & example/ck_tile directories

* format codes

* fix output ci check bug

* fix moe sorting readme and error commit file

* use magiv div to accelerate compute

* add an loop unroll for moe lds ops

* add extblocksnel to set zeros for moebufs

* [Ck_tile] moe set zero run ok, add size check and fix ref check

* [Ck_tile]fix moe_sorting fuse set_zero remod

* [Ck_tile] change name style, fix zero buffer size err, change folder

* [Ck_tile] moe_sorting: fix name style

* [Ck_tile] moe_sorting, remove useless params in traits

* [Ck_tile] change outputtile cnt * unit_size; change output buf alloc

---------

Co-authored-by: dummycoderfe <noplydummmycoder@163.com>
Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>
2024-11-09 17:57:27 +08:00
Po Yen Chen
af9546d9f4 Fix 'sh' command compatibility of smoke_test_fwd.sh (#1553) 2024-11-09 09:55:14 +08:00
Bartłomiej Kocot
ea3640fdea Add generic instances for two stage conv bwd wei (#1643)
* Add generic instances for two stage conv bwd wei

* Update layout prefix
2024-11-08 10:04:33 +01:00
dummycoderfe
686a58a912 [Ck tile] layernorm2d fwd optimize (#1637)
* optimze small N case using vec io and using rcp div

* [Ck_tile] layernorm, add param to control fastdiv; change generate codes and test pass

* [Ck_tile] fix blockSize compute in Generic2dBlockShape

* [Ck_tile]fix kfastfdiv template style

* [Ck_tile] layernorm, fix stype in review

---------

Co-authored-by: dummycoderfe <noplydummmycoder@163.com>
2024-11-08 12:28:23 +08:00
Illia Silin
75c5bfa364 enable compilation for generic navi targets (#1645) 2024-11-07 14:14:42 -08:00
carlushuang
b0dd570a7a rename to ex pipeline 2024-11-07 14:57:12 +08:00
carlushuang
7977f89db4 Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant 2024-11-07 14:47:21 +08:00
carlushuang
4513162988 update pipeline 2024-11-07 14:46:55 +08:00
rocking
3599418aa8 Fix F16 type (#1583) 2024-11-06 11:32:44 -08:00
carlushuang
f09dc1f341 compiler ok 2024-11-07 00:24:00 +08:00
valarLip
3bb718ad5a update pipeline_gemm0 2024-11-06 18:25:18 +08:00
aledudek
dcafb1de15 Generic threshold calculation after merge fixes (#1618)
* Generic threshold calculation add passing num of accums

* Generic threshold - after merge fixes

* Fix cmakelists

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
2024-11-06 10:44:58 +01:00
carlushuang
c6c3c142a3 update cpu reference 2024-11-06 16:38:18 +08:00
valarLip
a288c57c71 update 2024-11-06 10:13:50 +08:00
Andriy Roshchenko
365f39aed0 Prevent instantiation of undefined FP8 operators. (#1639) 2024-11-05 13:58:29 -08:00
Illia Silin
54440cf562 remove gfx940;gfx941 from default target lists (#1640) 2024-11-05 13:56:20 -08:00
darren-amd
d0e3a70a2e Statically Cast Pointer Offset (#1631)
* explicit cast ptr offset

* formating change
2024-11-05 09:59:08 -08:00
Illia Silin
b6e74be1aa Make sure cmake can handle the xnack+/xnack- targets. (#1633)
* make sure cmake can handle xnack targets

* dont build xdl instances for gfx906:xnack-

* dont build xdl tests for gfx906:xnack-
2024-11-05 08:53:10 -08:00
carlushuang
cf64618358 compile OK 2024-11-06 00:01:43 +08:00