rocking
936cb45797
Add host validation of add + layernorm2d + rsquant
2024-10-29 17:33:06 +00:00
rocking
a93e5cc6e8
clang format and add missing header
2024-10-29 11:09:35 +00:00
rocking
68d7a4d100
1. move 06_rmsnorm2d to 10_rmsnorm2d
...
2. move 07_add_rmsnorm2d_rdquant to 11_add_rmsnorm2d_rdquant
2024-10-29 10:58:12 +00:00
rocking
0d7c30f534
Merge branch 'develop' into ck_tile/rmsnorm
2024-10-29 18:53:00 +08:00
rocking
7df4c49603
remove reduncant comment
2024-10-29 10:51:43 +00:00
rocking
9fa8b4c170
Fix bug of welford when number of m warp > 1
2024-10-29 10:51:33 +00:00
valarLip
4d7e063a0a
[CK_TILE] add scatter_gather ( #1609 )
2024-10-29 18:19:29 +08:00
valarLip
9fbd72e97e
[CK_TILE] add generic_permute ( #1607 )
2024-10-29 18:05:53 +08:00
rocking
8beda9d98d
Move reduce2d into reduce folder
2024-10-29 10:02:09 +00:00
rocking
1654e6cd97
Merge branch 'develop' into ck_tile/rmsnorm
2024-10-29 14:42:48 +08:00
Illia Silin
922e42a039
fix compilation errors for gfx12 with clang20 ( #1606 )
2024-10-28 19:02:48 -07:00
rocking
356b045fd7
Add README
2024-10-28 19:55:57 +00:00
rocking
6a54faae25
Add save_x to trait
2024-10-28 19:55:44 +00:00
rocking
b683de6b32
Fix bug of x verification
2024-10-28 19:49:08 +00:00
rocking
88d3079065
Add test script
2024-10-28 19:39:37 +00:00
rocking
b83f8d242a
Add instance library
2024-10-28 19:34:51 +00:00
rocking
9a22805e92
Fix bug of kSaveX == false
2024-10-27 11:42:58 +00:00
rocking
0f9969a894
Rename two pass to three pass
2024-10-26 20:29:55 +00:00
rocking
697558d856
Add two pass pipeline
2024-10-26 20:21:18 +00:00
carlushuang
b098b71b05
topk_softmax ( #1592 )
...
* topk_softmax
* remove some file
* fix atomix linear_offset
* address various comment, and change sfc get_index api to static(tuple)
2024-10-26 23:52:49 +08:00
Bartłomiej Kocot
31bf253aeb
Add dynamic elementwise op ( #1426 )
...
* Add dynamic elementwise op
Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com >
* CI issues fix
* Custom parameter value for dynamic functions - Comments addressed
---------
Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com >
Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com >
2024-10-26 15:22:37 +02:00
Po Yen Chen
54f0e6f4bb
[CK_TILE] More fmha splitkv optimizations ( #1588 )
...
* Use pre-defined constants for readability
* Use vector write for o_acc tensor
* Remove no-longer used policy method
* Deprecate no-longer used policy/pipeline
* Specify gemm0/gemm1 block warps separately in codegen
* Fix wrong ps_idx creation logic
* Add single-warp block gemm
* Supoprt single-warp gemm0
* Make MakeCBlockTile() as static method
* Use MakeCBlockTile() to get underlying tile distribution
* Use kNumGemm1Warps to compute # threads for gemm1
* Put normal case in the if clause
* Refine fmha splitkv block mapping
* Refine & fix the lse_acc/o_acc layout
* Fix wrong LDS size for K tile
* Use kK0=64 for hdim=128,256 fmha splitkv kernels
* Use kK1=64 for hdim=32,64,128 fmha splitkv kernels
* Undo kK0/kK1 changes
* Use more reasonable GetAlignmentV() computation
* Using store_tile() in fmha splitkv kernel epilogue
2024-10-26 18:35:45 +08:00
rocking
2d4480a123
Refine tile size
2024-10-26 10:23:20 +00:00
rocking
1c1f1e35b5
Fix bug of one pass pipeline
2024-10-26 10:22:50 +00:00
rocking
27d96b4031
host verification
2024-10-26 10:22:09 +00:00
valarLip
37f7afed1e
add int8 gemm multiply multiply a8w8 ( #1591 )
...
* add int8 gemm multiply multiply a8w8
* uncomment
* clang-format-12
* Add example_gemm_multiply_multiply_xdl_int8
* Remove shell scripts
* update preprocess number for mi308; bring back printout in ckprofiler
* format
---------
Co-authored-by: chenjun <junchen2@amd.com >
Co-authored-by: Haocong WANG <haocwang@amd.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
2024-10-26 16:39:34 +08:00
rocking
826ee18a11
Add reduce op
2024-10-25 22:51:15 +00:00
rocking
1e0c9fde51
Add add_rmsnorm2d_rdquant kernel
2024-10-25 20:50:48 +00:00
Max Podkorytov
eda5938386
add parsing grouped conv fwd instances
2024-10-25 08:25:53 -07:00
Rostyslav Geyyer
7d576f1748
Update GPU verification ( #1596 )
...
* Update inits
* Update static_cast to type_convert
* Add verification option selection
2024-10-25 08:13:46 -07:00
aledudek
9385caa306
Generic threshold calculation ( #1546 )
...
* Calculate generic relative threshold pool3dfwd
* Calculate absolute error threshold pool3d fwd
* Generic threshold calculation take max input for relative error pool3dfwd
* Remove max possible value for error calculation at runtime
* Remove debug print in pool3dfwd
* Pool3d fwd adjusted types in generic threshold calculation
* Generic threshold calculation take into account number of accumulations and accdatatype
* Generic threshold fix final error formula
* Generic threshold calculation - num of accs fix
* Generic threshold calculation - adjust absolute error
* Generic threshold calculation - OutDataType in absolute error
2024-10-25 12:46:24 +02:00
dummycoderfe
9183ce69ca
hot_fix epsilon pos ( #1597 )
...
Co-authored-by: dummycoderfe <noplydummmycoder@163.com >
2024-10-25 11:17:45 +08:00
rocking
871af334d1
Refine pipeline name
2024-10-24 20:42:40 +00:00
rocking
c89d8ca95f
clang format
2024-10-24 17:05:36 +00:00
rocking
1684d71a3f
Fix cmake
2024-10-24 11:44:55 +00:00
rocking
1e6814a6bd
Refine naming
2024-10-24 11:44:40 +00:00
rocking
d79715ba53
Fix bug of rmsnorm
2024-10-24 11:43:45 +00:00
rocking
e4a169dd47
refine example of rmsnorm
2024-10-24 11:43:15 +00:00
rocking
a50ec83d03
refine naming
2024-10-24 08:48:34 +00:00
rocking
df976ff6a1
Add missing cmake change
2024-10-24 06:13:03 +00:00
rocking
3d2e3be652
Add script to test performance and correctness
2024-10-24 06:12:42 +00:00
rocking
5b3108a62f
Remove static assert to prevent compile fail
2024-10-24 06:09:23 +00:00
Illia Silin
8e22e1ae31
fix the logic of enabling XDL and WMMA instances ( #1595 )
2024-10-23 15:55:39 -07:00
rocking
a5986c70dc
Add rmsnorm small example
2024-10-23 19:31:05 +00:00
rocking
382a2af212
Add rmsnorm2d
2024-10-23 19:23:51 +00:00
Bartłomiej Kocot
cedccd59c9
[POST MERGE PR] Enable grouped conv bwd wei bf16 NGCHW ( #1594 )
2024-10-23 12:02:33 +02:00
rocking
dfb4bf9488
Fix bug of std caculation
2024-10-22 20:36:25 +00:00
rocking
26f16dd20b
Prevent user use cross warp reduction
2024-10-22 19:29:46 +00:00
Jatin Chaudhary
4d5248e2d1
Explicit cast values to half ( #1593 )
...
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2024-10-22 11:17:32 -07:00
rocking
9e7fcc0b37
Add reduce2d new api
2024-10-22 14:52:10 +00:00