letaoqin
66efcf9603
change g tile distribution
2024-11-27 03:37:08 +00:00
letaoqin
fe44e66e99
add gemm0 for tokens*G
2024-11-26 14:23:26 +00:00
letaoqin
f363ec7f3b
add tag for gather index
2024-11-26 03:50:58 +00:00
letaoqin
c1d6f9ec42
clear code
2024-11-26 03:28:41 +00:00
letaoqin
ef8e3620cc
gather and scatter right
2024-11-25 07:40:03 +00:00
letaoqin
eaf8e6165b
write a data to lds
2024-11-22 08:17:20 +00:00
letaoqin
3b51749a76
remove fused_moegemm_pipeline_gl.hpp
2024-11-22 06:13:33 +00:00
letaoqin
f9ac2337af
change file name
2024-11-22 06:10:42 +00:00
letaoqin
b5d6100bbc
change file name
2024-11-22 04:24:07 +00:00
“letaoqin”
f912ca405c
fix call indexing adaptor issue
2024-11-21 02:07:25 +00:00
“letaoqin”
1561fc22d6
change indexing adapter to gather matrix
2024-11-20 13:16:26 +00:00
“letaoqin”
1caa8198f7
write a, g,d and o tensor
2024-11-19 08:47:35 +00:00
“letaoqin”
84755f74ff
format
2024-11-16 02:02:01 +00:00
letaoqin
eab497e87f
format
2024-11-15 00:39:38 +00:00
letaoqin
1476d7bba4
add gl pipeline
2024-11-14 11:18:05 +00:00
letaoqin
6401c4cb53
change input parameters
2024-11-14 07:46:25 +00:00
root
16dc96ebbd
remove print runing info
2024-11-14 07:27:53 +00:00
letaoqin
c8e91d41a4
comments
2024-11-14 14:42:05 +08:00
letaoqin
049cacff76
start
2024-11-14 12:12:23 +08:00
carlushuang
572865a667
update first gemm ok
2024-11-14 00:12:36 +08:00
carlushuang
9ec4e3f76e
Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant
2024-11-13 15:35:04 +08:00
carlushuang
7ccdbe1619
update
2024-11-13 15:34:54 +08:00
Illia Silin
489c78d073
test rocm6.3 rc1 build 20 ( #1659 )
2024-11-12 09:35:33 -08:00
carlushuang
e2a318bcd8
Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant
2024-11-12 20:30:49 +08:00
Thomas Ning
2b6458ddf2
[CK Tile] Improve the Layout, Padding, and Alignment features of CK Tile GEMM ( #1651 )
...
* Finished the feature
* Modified the test file
* Test case update
* addresss comment
* Addressed the review comment
* Fixed the CI error
2024-11-11 18:08:25 -08:00
Illia Silin
5fb150dbe7
restore collecting performance of mixed prec gemms ( #1648 )
2024-11-11 09:25:08 -08:00
valarLip
8ef8a994e7
[CK_TILE] add more stride for layernorm to support un-continuous Tensor ( #1650 )
...
* [CK_TILE] add more stride for layernorm to support un-continuous Tensor
* align CK coding style
* extend strides to layernrom expample
* clang-format...
2024-11-11 16:02:28 +08:00
carlushuang
d0405504de
update
2024-11-11 16:01:34 +08:00
carlushuang
9d3cdd21fc
Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant
2024-11-11 12:03:38 +08:00
carlushuang
06914eedc3
block-asm
2024-11-11 11:57:08 +08:00
Po Yen Chen
13332998a4
Return nullptr when block index is invalid ( #1649 )
2024-11-11 09:28:32 +08:00
dummycoderfe
bec6fbc65f
Ck tile/moe sorting ( #1624 )
...
* add moe_sorting & check ok
* fix comments & typo
* Run remod.py under include/ck_tile & example/ck_tile directories
* format codes
* fix output ci check bug
* fix moe sorting readme and error commit file
* use magiv div to accelerate compute
* add an loop unroll for moe lds ops
* add extblocksnel to set zeros for moebufs
* [Ck_tile] moe set zero run ok, add size check and fix ref check
* [Ck_tile]fix moe_sorting fuse set_zero remod
* [Ck_tile] change name style, fix zero buffer size err, change folder
* [Ck_tile] moe_sorting: fix name style
* [Ck_tile] moe_sorting, remove useless params in traits
* [Ck_tile] change outputtile cnt * unit_size; change output buf alloc
---------
Co-authored-by: dummycoderfe <noplydummmycoder@163.com >
Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
2024-11-09 17:57:27 +08:00
Po Yen Chen
af9546d9f4
Fix 'sh' command compatibility of smoke_test_fwd.sh ( #1553 )
2024-11-09 09:55:14 +08:00
Bartłomiej Kocot
ea3640fdea
Add generic instances for two stage conv bwd wei ( #1643 )
...
* Add generic instances for two stage conv bwd wei
* Update layout prefix
2024-11-08 10:04:33 +01:00
dummycoderfe
686a58a912
[Ck tile] layernorm2d fwd optimize ( #1637 )
...
* optimze small N case using vec io and using rcp div
* [Ck_tile] layernorm, add param to control fastdiv; change generate codes and test pass
* [Ck_tile] fix blockSize compute in Generic2dBlockShape
* [Ck_tile]fix kfastfdiv template style
* [Ck_tile] layernorm, fix stype in review
---------
Co-authored-by: dummycoderfe <noplydummmycoder@163.com >
2024-11-08 12:28:23 +08:00
Illia Silin
75c5bfa364
enable compilation for generic navi targets ( #1645 )
2024-11-07 14:14:42 -08:00
carlushuang
b0dd570a7a
rename to ex pipeline
2024-11-07 14:57:12 +08:00
carlushuang
7977f89db4
Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant
2024-11-07 14:47:21 +08:00
carlushuang
4513162988
update pipeline
2024-11-07 14:46:55 +08:00
rocking
3599418aa8
Fix F16 type ( #1583 )
2024-11-06 11:32:44 -08:00
carlushuang
f09dc1f341
compiler ok
2024-11-07 00:24:00 +08:00
valarLip
3bb718ad5a
update pipeline_gemm0
2024-11-06 18:25:18 +08:00
aledudek
dcafb1de15
Generic threshold calculation after merge fixes ( #1618 )
...
* Generic threshold calculation add passing num of accums
* Generic threshold - after merge fixes
* Fix cmakelists
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2024-11-06 10:44:58 +01:00
carlushuang
c6c3c142a3
update cpu reference
2024-11-06 16:38:18 +08:00
valarLip
a288c57c71
update
2024-11-06 10:13:50 +08:00
Andriy Roshchenko
365f39aed0
Prevent instantiation of undefined FP8 operators. ( #1639 )
2024-11-05 13:58:29 -08:00
Illia Silin
54440cf562
remove gfx940;gfx941 from default target lists ( #1640 )
2024-11-05 13:56:20 -08:00
darren-amd
d0e3a70a2e
Statically Cast Pointer Offset ( #1631 )
...
* explicit cast ptr offset
* formating change
2024-11-05 09:59:08 -08:00
Illia Silin
b6e74be1aa
Make sure cmake can handle the xnack+/xnack- targets. ( #1633 )
...
* make sure cmake can handle xnack targets
* dont build xdl instances for gfx906:xnack-
* dont build xdl tests for gfx906:xnack-
2024-11-05 08:53:10 -08:00
carlushuang
cf64618358
compile OK
2024-11-06 00:01:43 +08:00