letaoqin
593dd7ad64
clear some code
2024-12-04 10:41:31 +00:00
letaoqin
6cb910352b
add fp16 to test
2024-12-04 08:53:49 +00:00
letaoqin
4dd771958e
add gelu to kernel
2024-12-03 14:18:29 +00:00
letaoqin
072dfbfe32
gemm0 debugged
2024-12-03 09:29:47 +00:00
letaoqin
69114f254c
output sacc
2024-11-29 13:26:13 +00:00
letaoqin
bb7c411224
debugging
2024-11-29 09:10:30 +00:00
letaoqin
7881eff9d1
gemm down
2024-11-28 11:27:41 +00:00
letaoqin
6a03c66f89
start gemm down
2024-11-28 09:50:05 +00:00
letaoqin
b2030e3427
s_acc data to lds to shuffle
2024-11-28 05:30:26 +00:00
letaoqin
1d89463c00
add gmm0 code
2024-11-28 04:59:36 +00:00
letaoqin
7018dfb21d
start gemm0
2024-11-27 13:27:34 +00:00
letaoqin
9ec586fc68
change a matrix lds desc
2024-11-27 09:43:04 +00:00
letaoqin
66efcf9603
change g tile distribution
2024-11-27 03:37:08 +00:00
letaoqin
fe44e66e99
add gemm0 for tokens*G
2024-11-26 14:23:26 +00:00
letaoqin
f363ec7f3b
add tag for gather index
2024-11-26 03:50:58 +00:00
letaoqin
c1d6f9ec42
clear code
2024-11-26 03:28:41 +00:00
letaoqin
ef8e3620cc
gather and scatter right
2024-11-25 07:40:03 +00:00
letaoqin
eaf8e6165b
write a data to lds
2024-11-22 08:17:20 +00:00
letaoqin
3b51749a76
remove fused_moegemm_pipeline_gl.hpp
2024-11-22 06:13:33 +00:00
letaoqin
f9ac2337af
change file name
2024-11-22 06:10:42 +00:00
letaoqin
b5d6100bbc
change file name
2024-11-22 04:24:07 +00:00
“letaoqin”
f912ca405c
fix call indexing adaptor issue
2024-11-21 02:07:25 +00:00
“letaoqin”
1561fc22d6
change indexing adapter to gather matrix
2024-11-20 13:16:26 +00:00
“letaoqin”
1caa8198f7
write a, g,d and o tensor
2024-11-19 08:47:35 +00:00
“letaoqin”
84755f74ff
format
2024-11-16 02:02:01 +00:00
letaoqin
eab497e87f
format
2024-11-15 00:39:38 +00:00
letaoqin
1476d7bba4
add gl pipeline
2024-11-14 11:18:05 +00:00
root
16dc96ebbd
remove print runing info
2024-11-14 07:27:53 +00:00
carlushuang
572865a667
update first gemm ok
2024-11-14 00:12:36 +08:00
carlushuang
7ccdbe1619
update
2024-11-13 15:34:54 +08:00
carlushuang
e2a318bcd8
Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant
2024-11-12 20:30:49 +08:00
Thomas Ning
2b6458ddf2
[CK Tile] Improve the Layout, Padding, and Alignment features of CK Tile GEMM ( #1651 )
...
* Finished the feature
* Modified the test file
* Test case update
* addresss comment
* Addressed the review comment
* Fixed the CI error
2024-11-11 18:08:25 -08:00
valarLip
8ef8a994e7
[CK_TILE] add more stride for layernorm to support un-continuous Tensor ( #1650 )
...
* [CK_TILE] add more stride for layernorm to support un-continuous Tensor
* align CK coding style
* extend strides to layernrom expample
* clang-format...
2024-11-11 16:02:28 +08:00
carlushuang
d0405504de
update
2024-11-11 16:01:34 +08:00
carlushuang
9d3cdd21fc
Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant
2024-11-11 12:03:38 +08:00
carlushuang
06914eedc3
block-asm
2024-11-11 11:57:08 +08:00
Po Yen Chen
13332998a4
Return nullptr when block index is invalid ( #1649 )
2024-11-11 09:28:32 +08:00
dummycoderfe
bec6fbc65f
Ck tile/moe sorting ( #1624 )
...
* add moe_sorting & check ok
* fix comments & typo
* Run remod.py under include/ck_tile & example/ck_tile directories
* format codes
* fix output ci check bug
* fix moe sorting readme and error commit file
* use magiv div to accelerate compute
* add an loop unroll for moe lds ops
* add extblocksnel to set zeros for moebufs
* [Ck_tile] moe set zero run ok, add size check and fix ref check
* [Ck_tile]fix moe_sorting fuse set_zero remod
* [Ck_tile] change name style, fix zero buffer size err, change folder
* [Ck_tile] moe_sorting: fix name style
* [Ck_tile] moe_sorting, remove useless params in traits
* [Ck_tile] change outputtile cnt * unit_size; change output buf alloc
---------
Co-authored-by: dummycoderfe <noplydummmycoder@163.com >
Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
2024-11-09 17:57:27 +08:00
dummycoderfe
686a58a912
[Ck tile] layernorm2d fwd optimize ( #1637 )
...
* optimze small N case using vec io and using rcp div
* [Ck_tile] layernorm, add param to control fastdiv; change generate codes and test pass
* [Ck_tile] fix blockSize compute in Generic2dBlockShape
* [Ck_tile]fix kfastfdiv template style
* [Ck_tile] layernorm, fix stype in review
---------
Co-authored-by: dummycoderfe <noplydummmycoder@163.com >
2024-11-08 12:28:23 +08:00
Illia Silin
75c5bfa364
enable compilation for generic navi targets ( #1645 )
2024-11-07 14:14:42 -08:00
carlushuang
b0dd570a7a
rename to ex pipeline
2024-11-07 14:57:12 +08:00
carlushuang
7977f89db4
Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant
2024-11-07 14:47:21 +08:00
carlushuang
4513162988
update pipeline
2024-11-07 14:46:55 +08:00
carlushuang
f09dc1f341
compiler ok
2024-11-07 00:24:00 +08:00
valarLip
3bb718ad5a
update pipeline_gemm0
2024-11-06 18:25:18 +08:00
carlushuang
c6c3c142a3
update cpu reference
2024-11-06 16:38:18 +08:00
valarLip
a288c57c71
update
2024-11-06 10:13:50 +08:00
darren-amd
d0e3a70a2e
Statically Cast Pointer Offset ( #1631 )
...
* explicit cast ptr offset
* formating change
2024-11-05 09:59:08 -08:00
carlushuang
cf64618358
compile OK
2024-11-06 00:01:43 +08:00
carlushuang
70fa98adf8
update code
2024-11-05 16:06:52 +08:00