Commit Graph

1523 Commits

Author SHA1 Message Date
rocking
936cb45797 Add host validation of add + layernorm2d + rsquant 2024-10-29 17:33:06 +00:00
rocking
a93e5cc6e8 clang format and add missing header 2024-10-29 11:09:35 +00:00
rocking
68d7a4d100 1. move 06_rmsnorm2d to 10_rmsnorm2d
2. move 07_add_rmsnorm2d_rdquant to 11_add_rmsnorm2d_rdquant
2024-10-29 10:58:12 +00:00
rocking
0d7c30f534 Merge branch 'develop' into ck_tile/rmsnorm 2024-10-29 18:53:00 +08:00
rocking
7df4c49603 remove reduncant comment 2024-10-29 10:51:43 +00:00
rocking
9fa8b4c170 Fix bug of welford when number of m warp > 1 2024-10-29 10:51:33 +00:00
valarLip
4d7e063a0a [CK_TILE] add scatter_gather (#1609) 2024-10-29 18:19:29 +08:00
valarLip
9fbd72e97e [CK_TILE] add generic_permute (#1607) 2024-10-29 18:05:53 +08:00
rocking
8beda9d98d Move reduce2d into reduce folder 2024-10-29 10:02:09 +00:00
rocking
1654e6cd97 Merge branch 'develop' into ck_tile/rmsnorm 2024-10-29 14:42:48 +08:00
Illia Silin
922e42a039 fix compilation errors for gfx12 with clang20 (#1606) 2024-10-28 19:02:48 -07:00
rocking
356b045fd7 Add README 2024-10-28 19:55:57 +00:00
rocking
6a54faae25 Add save_x to trait 2024-10-28 19:55:44 +00:00
rocking
b683de6b32 Fix bug of x verification 2024-10-28 19:49:08 +00:00
rocking
88d3079065 Add test script 2024-10-28 19:39:37 +00:00
rocking
b83f8d242a Add instance library 2024-10-28 19:34:51 +00:00
rocking
9a22805e92 Fix bug of kSaveX == false 2024-10-27 11:42:58 +00:00
rocking
0f9969a894 Rename two pass to three pass 2024-10-26 20:29:55 +00:00
rocking
697558d856 Add two pass pipeline 2024-10-26 20:21:18 +00:00
carlushuang
b098b71b05 topk_softmax (#1592)
* topk_softmax

* remove some file

* fix atomix linear_offset

* address various comment, and change sfc get_index api to static(tuple)
2024-10-26 23:52:49 +08:00
Bartłomiej Kocot
31bf253aeb Add dynamic elementwise op (#1426)
* Add dynamic elementwise op

Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com>

* CI issues fix

* Custom parameter value for dynamic functions - Comments addressed

---------

Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com>
Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com>
2024-10-26 15:22:37 +02:00
Po Yen Chen
54f0e6f4bb [CK_TILE] More fmha splitkv optimizations (#1588)
* Use pre-defined constants for readability

* Use vector write for o_acc tensor

* Remove no-longer used policy method

* Deprecate no-longer used policy/pipeline

* Specify gemm0/gemm1 block warps separately in codegen

* Fix wrong ps_idx creation logic

* Add single-warp block gemm

* Supoprt single-warp gemm0

* Make MakeCBlockTile() as static method

* Use MakeCBlockTile() to get underlying tile distribution

* Use kNumGemm1Warps to compute # threads for gemm1

* Put normal case in the if clause

* Refine fmha splitkv block mapping

* Refine & fix the lse_acc/o_acc layout

* Fix wrong LDS size for K tile

* Use kK0=64 for hdim=128,256 fmha splitkv kernels

* Use kK1=64 for hdim=32,64,128 fmha splitkv kernels

* Undo kK0/kK1 changes

* Use more reasonable GetAlignmentV() computation

* Using store_tile() in fmha splitkv kernel epilogue
2024-10-26 18:35:45 +08:00
rocking
2d4480a123 Refine tile size 2024-10-26 10:23:20 +00:00
rocking
1c1f1e35b5 Fix bug of one pass pipeline 2024-10-26 10:22:50 +00:00
rocking
27d96b4031 host verification 2024-10-26 10:22:09 +00:00
valarLip
37f7afed1e add int8 gemm multiply multiply a8w8 (#1591)
* add int8 gemm multiply multiply a8w8

* uncomment

* clang-format-12

* Add example_gemm_multiply_multiply_xdl_int8

* Remove shell scripts

* update preprocess number for mi308; bring back printout in ckprofiler

* format

---------

Co-authored-by: chenjun <junchen2@amd.com>
Co-authored-by: Haocong WANG <haocwang@amd.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>
2024-10-26 16:39:34 +08:00
rocking
826ee18a11 Add reduce op 2024-10-25 22:51:15 +00:00
rocking
1e0c9fde51 Add add_rmsnorm2d_rdquant kernel 2024-10-25 20:50:48 +00:00
Max Podkorytov
eda5938386 add parsing grouped conv fwd instances 2024-10-25 08:25:53 -07:00
Rostyslav Geyyer
7d576f1748 Update GPU verification (#1596)
* Update inits

* Update static_cast to type_convert

* Add verification option selection
2024-10-25 08:13:46 -07:00
aledudek
9385caa306 Generic threshold calculation (#1546)
* Calculate generic relative threshold pool3dfwd

* Calculate absolute error threshold pool3d fwd

* Generic threshold calculation take max input for relative error pool3dfwd

* Remove max possible value for error calculation at runtime

* Remove debug print in pool3dfwd

* Pool3d fwd adjusted types in generic threshold calculation

* Generic threshold calculation take into account number of accumulations and accdatatype

* Generic threshold fix final error formula

* Generic threshold calculation - num of accs fix

* Generic threshold calculation - adjust absolute error

* Generic threshold calculation - OutDataType in absolute error
2024-10-25 12:46:24 +02:00
dummycoderfe
9183ce69ca hot_fix epsilon pos (#1597)
Co-authored-by: dummycoderfe <noplydummmycoder@163.com>
2024-10-25 11:17:45 +08:00
rocking
871af334d1 Refine pipeline name 2024-10-24 20:42:40 +00:00
rocking
c89d8ca95f clang format 2024-10-24 17:05:36 +00:00
rocking
1684d71a3f Fix cmake 2024-10-24 11:44:55 +00:00
rocking
1e6814a6bd Refine naming 2024-10-24 11:44:40 +00:00
rocking
d79715ba53 Fix bug of rmsnorm 2024-10-24 11:43:45 +00:00
rocking
e4a169dd47 refine example of rmsnorm 2024-10-24 11:43:15 +00:00
rocking
a50ec83d03 refine naming 2024-10-24 08:48:34 +00:00
rocking
df976ff6a1 Add missing cmake change 2024-10-24 06:13:03 +00:00
rocking
3d2e3be652 Add script to test performance and correctness 2024-10-24 06:12:42 +00:00
rocking
5b3108a62f Remove static assert to prevent compile fail 2024-10-24 06:09:23 +00:00
Illia Silin
8e22e1ae31 fix the logic of enabling XDL and WMMA instances (#1595) 2024-10-23 15:55:39 -07:00
rocking
a5986c70dc Add rmsnorm small example 2024-10-23 19:31:05 +00:00
rocking
382a2af212 Add rmsnorm2d 2024-10-23 19:23:51 +00:00
Bartłomiej Kocot
cedccd59c9 [POST MERGE PR] Enable grouped conv bwd wei bf16 NGCHW (#1594) 2024-10-23 12:02:33 +02:00
rocking
dfb4bf9488 Fix bug of std caculation 2024-10-22 20:36:25 +00:00
rocking
26f16dd20b Prevent user use cross warp reduction 2024-10-22 19:29:46 +00:00
Jatin Chaudhary
4d5248e2d1 Explicit cast values to half (#1593)
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2024-10-22 11:17:32 -07:00
rocking
9e7fcc0b37 Add reduce2d new api 2024-10-22 14:52:10 +00:00