composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-14 10:09:41 +00:00

Author	SHA1	Message	Date
valarLip	c67379f54b	[CK_TILE] add scatter_gather (#1609 ) [ROCm/composable_kernel commit: `4d7e063a0a`]	2024-10-29 18:19:29 +08:00
valarLip	a712223d4d	[CK_TILE] add generic_permute (#1607 ) [ROCm/composable_kernel commit: `9fbd72e97e`]	2024-10-29 18:05:53 +08:00
Illia Silin	968f0ffd6b	fix compilation errors for gfx12 with clang20 (#1606 ) [ROCm/composable_kernel commit: `922e42a039`]	2024-10-28 19:02:48 -07:00
carlushuang	ea3af1dfbc	topk_softmax (#1592 ) * topk_softmax * remove some file * fix atomix linear_offset * address various comment, and change sfc get_index api to static(tuple) [ROCm/composable_kernel commit: `b098b71b05`]	2024-10-26 23:52:49 +08:00
Bartłomiej Kocot	1ade932aed	Add dynamic elementwise op (#1426 ) * Add dynamic elementwise op Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com> * CI issues fix * Custom parameter value for dynamic functions - Comments addressed --------- Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com> Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com> [ROCm/composable_kernel commit: `31bf253aeb`]	2024-10-26 15:22:37 +02:00
Po Yen Chen	919046f86c	[CK_TILE] More fmha splitkv optimizations (#1588 ) * Use pre-defined constants for readability * Use vector write for o_acc tensor * Remove no-longer used policy method * Deprecate no-longer used policy/pipeline * Specify gemm0/gemm1 block warps separately in codegen * Fix wrong ps_idx creation logic * Add single-warp block gemm * Supoprt single-warp gemm0 * Make MakeCBlockTile() as static method * Use MakeCBlockTile() to get underlying tile distribution * Use kNumGemm1Warps to compute # threads for gemm1 * Put normal case in the if clause * Refine fmha splitkv block mapping * Refine & fix the lse_acc/o_acc layout * Fix wrong LDS size for K tile * Use kK0=64 for hdim=128,256 fmha splitkv kernels * Use kK1=64 for hdim=32,64,128 fmha splitkv kernels * Undo kK0/kK1 changes * Use more reasonable GetAlignmentV() computation * Using store_tile() in fmha splitkv kernel epilogue [ROCm/composable_kernel commit: `54f0e6f4bb`]	2024-10-26 18:35:45 +08:00
valarLip	59e7fe3ac8	add int8 gemm multiply multiply a8w8 (#1591 ) * add int8 gemm multiply multiply a8w8 * uncomment * clang-format-12 * Add example_gemm_multiply_multiply_xdl_int8 * Remove shell scripts * update preprocess number for mi308; bring back printout in ckprofiler * format --------- Co-authored-by: chenjun <junchen2@amd.com> Co-authored-by: Haocong WANG <haocwang@amd.com> Co-authored-by: carlushuang <carlus.huang@amd.com> [ROCm/composable_kernel commit: `37f7afed1e`]	2024-10-26 16:39:34 +08:00
aledudek	c534ed750d	Generic threshold calculation (#1546 ) * Calculate generic relative threshold pool3dfwd * Calculate absolute error threshold pool3d fwd * Generic threshold calculation take max input for relative error pool3dfwd * Remove max possible value for error calculation at runtime * Remove debug print in pool3dfwd * Pool3d fwd adjusted types in generic threshold calculation * Generic threshold calculation take into account number of accumulations and accdatatype * Generic threshold fix final error formula * Generic threshold calculation - num of accs fix * Generic threshold calculation - adjust absolute error * Generic threshold calculation - OutDataType in absolute error [ROCm/composable_kernel commit: `9385caa306`]	2024-10-25 12:46:24 +02:00
dummycoderfe	6cd6bf04fb	hot_fix epsilon pos (#1597 ) Co-authored-by: dummycoderfe <noplydummmycoder@163.com> [ROCm/composable_kernel commit: `9183ce69ca`]	2024-10-25 11:17:45 +08:00
Jatin Chaudhary	6c9232e5bc	Explicit cast values to half (#1593 ) Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `4d5248e2d1`]	2024-10-22 11:17:32 -07:00
ltqin	b887c7b709	update layernorm (#1570 ) * port layernorm * change warp_welford.hpp * Update warpshuffle * 1. Add save mean and save std back 2. Move construction of tensor_view and tile_window to operator() * refine welford max count calculation * unify layernorm api * Rename file * Remove save mean and inv std * Revert "refine welford max count calculation" This reverts commit `022365802b`. * Fix order of parameter * refine welford max count calculation again * Remove fp32 instances * Fix bug of padding * refactor api * Support bf16 * Extract common function * Refine arg of operator() * Add kMThreadPerBlock to template parameter * clang format * Refine variable name * Refine file name * remove redundant line * refactor layernorm2d pipeline and add block-per-block utility * fix name * rename more * add more block-per-tile instance * remove duplicated define * update instance for 2048, 1024 case * support up to 2048 now * opt loading * add n1536 * Add two pass pipeline * format * Fix incorrect type * parallel compilation * Use smaller N * fix 2p pass * Support Repeat_M in distribution * Refine nameing * Add reduce example --------- Co-authored-by: letaoqin <letaoqin@amd.com> Co-authored-by: aska-0096 <haocwang@amd.com> Co-authored-by: rocking <ChunYu.Lai@amd.com> Co-authored-by: carlushuang <carlus.huang@amd.com> [ROCm/composable_kernel commit: `0394f8a713`]	2024-10-22 09:26:18 +08:00
Po Yen Chen	267bf490e5	[CK_TILE] Optimize fmha splitkv & splitkv combine kernels (#1577 ) * Use smaller width for lse_accum dist tensor * Update pipeline comment * Fix wrong distribution for lse_accum * Remove duplicate dim in lse_accum dist encoding * Decide fmha splitkv combine kernel kBlockSize by kM0 * Remove assumption of MPerThread=1 * Add log<4> & log<8> specialization * Enlarge occupancy array * Fix vector size for small tile * Add support for kMaxSplits=8 * Re-format gemm.hpp * Use 16x16x16 warp gemm for fwd_splitkv * Centralize policy code changes * Leave fp8/bf8 tile settings unchanged [ROCm/composable_kernel commit: `95e722a3b3`]	2024-10-21 10:52:11 +08:00
Qianfeng	3e0b77670e	[CK_TILE] Improve headdim96 performance for fmha-bwd (#1573 ) * Add kQKHeaddimForGemmN and kVHeaddimForGemmN in order to support headdim 96 * Remove the using of MakeKRegBlockDescriptor and MakeVRegBlockDescriptor * Fix in bwd_piple_default_policy * Remove kQKHeaddim and rename kQKHeaddimForGemmN to kQKHeaddim in the bwd kernel and pipelines * Replace kVHeaddimForGemmN by kVHeaddim and kDoDvHeaddim * Update to hd96 tile settings * Add smoke test scripts for fmha-bwd hd96 * Revert "Add smoke test scripts for fmha-bwd hd96" This reverts commit `7ca7e1a93d`. * Remove hd96 tile settings in fmha_bwd codegen to save compiling * Fix lost code line in bwd_pipeline_default_policy * Merge kDoDvHeaddim/kPadHeadDimDoDv to kVHeaddim/kPadHeadDimV and remove TileFmhaBwdTraits * Rename KRegSliceBlockDescriptor/VRegSliceBlockDescriptor to KRegBlockDescriptor/VRegBlockDescriptor * tiny adjustments --------- Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> Co-authored-by: danyao12 <Dan.Yao@amd.com> [ROCm/composable_kernel commit: `14c3cfb1c6`]	2024-10-16 18:14:32 +08:00
Bartłomiej Kocot	acb8a72aad	[CK_TILE] Add block universal gemm pipeline policy (#1557 ) * [CK_TILE] Add block universal gemm pipeline policy * Fixes * fixes2 * Fixes3 * fixeS [ROCm/composable_kernel commit: `d02a92cc0d`]	2024-10-15 13:53:41 +02:00
Po Yen Chen	739f5210b0	Apply ROCm 6.2 WA to ROCm 6.3 and later (#1563 ) [ROCm/composable_kernel commit: `9868fd0245`]	2024-10-15 18:02:41 +08:00
Rostyslav Geyyer	0d8b3d36b2	Add custom type vector support (#1333 ) * Add non_native_vector_type * Add a test * Add non-native vector type * Fix CTOR * Fix non-native vector type of 1 * Fix CTORs * Use vector_type to cover non-native implementation as well * Update the test * Format * Format * Fix copyright years * Remove BoolVecT so far * Add AsType test cases * Update assert error message * Remove redundant type * Update naming * Add complex half type with tests * Add tests for vector reshaping * Add missing alignas * Update test/data_type/test_custom_type.cpp Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> * Compare custom types to built-in types * Add default constructor test * Add an alignment test --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `4cf70b36c1`]	2024-10-14 11:56:45 -05:00
Bartłomiej Kocot	87e4507543	Add transpose scale amax example (#1547 ) * Add transpose scale amax example * fixes * Tune reduce instance [ROCm/composable_kernel commit: `f21cda2536`]	2024-10-14 17:39:38 +02:00
Thomas Ning	2117e76277	decouple the calling from gemm_pipeline (#1571 ) * decouple the calling from gemm_pipeline * clang format [ROCm/composable_kernel commit: `35c1777d59`]	2024-10-14 13:59:26 +08:00
Adam Osewski	b05ec1b096	Implement GetWorkSpaceSize from BaseOperator. (#1564 ) [ROCm/composable_kernel commit: `29d384d0b2`]	2024-10-12 14:05:11 +08:00
Thomas Ning	0d711b3edf	Ck tile gemm cshuffle & CK Tile GEMM restructure (#1535 ) * ake the cshuffle compilable * modify Mhe reference on gpu and cpu. Correaccess of cshuffle * fix the cpu reference code * Complete the in tile shuffle logic * restructure the kernel template input * change the naming pattern of ck_tile gemm pipeline * Re-format files using remod.py * Solve the fmha conflict with gemm * Comment Addressed from Carlus --------- Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `6f27bc9872`]	2024-10-10 18:02:22 +08:00
Christopher Millette	d6eae63f60	Fixes small memory leak from missing hipEventDestroy (#1554 ) [ROCm/composable_kernel commit: `ceaed8e097`]	2024-10-09 09:41:35 +02:00
Po Yen Chen	50f0f55fbc	[CK_TILE] Update example README files & fix script compatibility issue (#1548 ) * Fix text alignment of ArgParser::print() * Update example README files * Clarify make-ck-dev.sh <arch> usage * Only keep some of the argument from '-?' output * Undo command line output changes in README * Only keep existing argument on doc and update description * Fix text alignment * Make cmake-ck-*.sh compatible with 'sh' command [ROCm/composable_kernel commit: `0c094daa7e`]	2024-10-08 10:45:12 +08:00
Qianfeng	1ca2b3d76c	[CK_TILE] Simplify the codes in splitkv_combine pipeline (#1549 ) * Simplify the codes in splitkv_combine pipeline * Always set kPadSeqLenK=true for fmha splitkv kernels * Change in Oacc Alignment and TileDistribution to be more adaptable to tile sizes --------- Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `74d68e3b99`]	2024-10-08 10:44:34 +08:00
Illia Silin	881bc2c930	Fix build logic using GRU_ARCHS. (#1536 ) * update build logic with GPU_ARCHS * fix the GPU_ARCHS build for codegen * unset GPU_TARGETS when GPU_ARCHS are set [ROCm/composable_kernel commit: `7d8ea5f08b`]	2024-10-07 08:18:23 -07:00
Bartłomiej Kocot	4aaf6ad633	[CK_TILE] Fix conv param multiple definition (#1550 ) Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `cc8f466a7e`]	2024-10-07 15:21:21 +02:00
rocking	36b2a932b0	[Ck tile] Support layernorm one pass (#1512 ) * Fix compile error * Add one pass pipeline * Extract creating tile_window to operator() * clang format * reduce duplicated code * do not hardcode * Support padding in layernorm --------- Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `0023f01ab0`]	2024-10-07 14:25:53 +08:00
kylasa	6f048f54dc	Adding seed and offset pointer support to the philox random number generator. (#1523 ) * Adding seed and offset pointer support to the philox random number generator. * Separating seed and offset pointer checks with different condition statements. * Changes include, adding support for device seed and offset pointers, union is used to store seed/offset values and device pointers to minimize device SGPRs. * Correcting a typo in the readme file * Re-format files using remod.py * Use STL type for API parameters * Use simpler struct design for drop_seed & drop_offset * Undo unnecessary changes * Sync kargs style for fmha_fwd.hpp/.cpp * Use templated union to reduce code * Use structured binding to make code more readable --------- Co-authored-by: Sudhir Kylasa <sukylasa@amd.com> Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `c24fae2346`]	2024-10-05 02:48:47 +08:00
Bartłomiej Kocot	47a2eb1cce	Fix grouped gemm check to avoid overflow (#1545 ) [ROCm/composable_kernel commit: `6b54d2faf8`]	2024-10-04 17:32:43 +02:00
macurtis-amd	164963bf83	Fix compilation errors generated by forthcoming Clang changes (#1544 ) Without this change, the following diagnostic is generated: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] See C++17 spec [temp.names] p5. [ROCm/composable_kernel commit: `aeb7c91f48`]	2024-10-02 13:56:22 -07:00
Illia Silin	ef193e048a	[CK_TILE] add missing vector header (#1537 ) * add missing vector header * Re-format header using remod.py --------- Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com> [ROCm/composable_kernel commit: `8e4c3fb1bc`]	2024-10-01 07:58:20 -07:00
Po Yen Chen	1c4c07c669	[CK_TILE] Change output accum tensor layout of fmha fwd split-kv & combine kernels (#1527 ) * Use same layout for o_acc and o tensor * Use better param names in partitioner * Remove redundant kargs 'max_seqlen_q' * Use better param names in splitkv kernel * Add comment for additional kernel arguments * Sync empty loop early return logics between pipelines * Pass more arguments to cmake in scripts * Align backslashes * Fix wrong o_acc tensor view strides * Change o_acc layout if o_perm=0 * Handle whole row masked via attn_bias * Use use vector width = 1 for o_acc * Use more even split sizes [ROCm/composable_kernel commit: `a1c07e8d91`]	2024-10-01 22:13:52 +08:00
Bartłomiej Kocot	da3172955b	[CK_TILE] Image to Column kernel (#1532 ) * [CK_TILE] Image to Column kernel * Fixes * Vector loads and stores * Fixes * Fixes * change test dir name [ROCm/composable_kernel commit: `de3e3b6424`]	2024-09-27 22:57:38 +02:00
Dan Yao	7460d19460	[CK_TILE] Fix compiler related FA bwd issues (#1530 ) * add barriers * tail bias barriers * adjust bf16/hd256 tol * continue adjust bf16/hd256 tol [ROCm/composable_kernel commit: `9d69a099a4`]	2024-09-26 12:18:39 -07:00
Illia Silin	04c756ea93	Fix compilation errors with Clang20.0. (#1533 ) * fix clang20 compilation errors for gfx90a * fix clang20 compilation errors for gfx11 targets [ROCm/composable_kernel commit: `42e6dceacc`]	2024-09-25 13:45:38 -07:00
Po Yen Chen	53b581e122	Early return if seqlen_k=0 on group mode (#1524 ) [ROCm/composable_kernel commit: `770d2b7725`]	2024-09-22 20:05:58 +08:00
Bartłomiej Kocot	e4f4e04add	Add support for NGCHW in grouped conv fwd (#1499 ) * Support NGCHW in grouped conv fwd * Remove not needed variable * Fixes [ROCm/composable_kernel commit: `4ba52b35dc`]	2024-09-20 10:45:46 +02:00
Adam Osewski	f6c6c375db	Remove unsupported (fp8) type from Add memory operation. (#1521 ) The dynamic buffer doesn't have support for fp8 in `Update` operation thus fp8 is not supporting `InMemoryDataOperation::Add` [ROCm/composable_kernel commit: `0c39954da9`]	2024-09-20 09:40:45 +02:00
Thomas Ning	2ded318de8	Ck tile gemm padding dim (#1516 ) * Support the N dimension padding * Finished the padding feature for different dimension of K [ROCm/composable_kernel commit: `694c300145`]	2024-09-18 11:32:29 -07:00
Thomas Ning	84f3413bb2	Ck tile GPU verification sample develop & Add the CK TILE GEMM to the CI/CD test (#1505 ) * Finished the feature of gpu verification * Add the ck_tile_gemm test in the CI CD * add the include of tensor_layou in reference_gemm * Comment Addressed * split ck_tile fhma and gemm tests into separate stages * restructure the reference gemm * restructure a new reference_gemm api that could read the device mem --------- Co-authored-by: carlushuang <carlus.huang@amd.com> Co-authored-by: illsilin <Illia.Silin@amd.com> [ROCm/composable_kernel commit: `844f5a1712`]	2024-09-14 21:08:40 +08:00
Jun Liu	04a8584b87	Customize filesystem in CK for legacy systems (#1509 ) * Legacy support: customized filesystem * Update cmakefile for python alternative path * fix build issues * CK has no boost dependency * More fixes to issues found on legay systems * fix clang format issue * Check if blob is correctly generated in cmake * fix the python issues * add a compiler flag for codegen when using alternative python * use target_link_options instead of target_compile_options --------- Co-authored-by: illsilin <Illia.Silin@amd.com> [ROCm/composable_kernel commit: `81bc1496b2`]	2024-09-13 07:51:07 -07:00
Mateusz Ozga	9c0316d853	Pool2d max/avg kernel in the BWD version (#1494 ) * Add pool2d instance BWD AVG * Add pool2d instance BWD MAX * Fix: avg review * Fix review: part2 * Fix - enable test when type is compiled * Fix review part3 [ROCm/composable_kernel commit: `448c0f56d8`]	2024-09-12 11:47:52 +02:00
jakpiase	8aeb2afbe2	Rewrite pool2d fwd (#1462 ) * added pool2d fwd * add tests * add reviewers changes * Revert "Merge remote-tracking branch 'origin/develop' into jakpiase/pool2d_fwd_new" This reverts commit `6b2ba7ff89`, reversing changes made to `22c82bea0c`. * Revert "add reviewers changes" This reverts commit `22c82bea0c`. * added reviewers comments * revert some old files * add reviewers requests --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> [ROCm/composable_kernel commit: `e8d2887cb2`]	2024-09-11 15:21:00 +02:00
jakpiase	681d36db5f	Added structural sparsity blockwise gemm (#1435 ) * Implemented smfmac xdlops * Added smfmac blockwise xdlops * fixes * add reviewers suggestions --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> [ROCm/composable_kernel commit: `2a261afcdf`]	2024-09-11 15:19:42 +02:00
Dan Yao	df8769d3c8	[CK_TILE] FA bwd repair (#1502 ) * fix fa bwd * revert kernelBlockSize in gemm_kernel.hpp [ROCm/composable_kernel commit: `d09572e8c2`]	2024-09-10 10:45:32 -07:00
Thomas Ning	b736c9b51a	Ck tile gemm example (#1488 ) * Checkpoint: Finished with the tile example & kernel verification, working on the different matrix layout * Finished the Matrix Layout feature set up. Note: Need to modify the inner block to solve the shuffle problem in the future. * Fix: Clang Format, API fixed from fmha * fix with better naming convention * revert back the pipeline code of fmha * Fixed: Addressed the comments and merge the GEMM shape of GEMM Operator and FMHA Operator to one. * clang format with the reference_gemm file * convert the clang format with the remod.py * Changed the format and variable name of the kernel gemm_shape and partitioner --------- Co-authored-by: thomasning <thomasning@banff-cyxtera-s70-4.ctr.dcgpu> [ROCm/composable_kernel commit: `caacd38830`]	2024-09-07 16:23:32 +08:00
M.Emin Ozturk	483f33772d	Moficiation to fix this issue "threadwise_tensor_slice_transfer_v5r1 issue #1279 " (#1492 ) * issue fix, one line changed for tmp * clang --------- Co-authored-by: Emin Ozturk <emin.ozturk@utah.edu> Co-authored-by: Harisankar Sadasivan <135730918+hsadasiv@users.noreply.github.com> [ROCm/composable_kernel commit: `8378855361`]	2024-09-04 21:52:55 -07:00
Haocong WANG	505351b016	Add gemm universal bf16 instances (#1484 ) * revert ckprofiler change * temp save * Add test and test pass * test pass * Fix bug inside rotating buffer when tensor is not packed * bug fix * clang format --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `5b10dae6a4`]	2024-09-04 20:58:54 -07:00
Bartłomiej Kocot	691144def1	Add support for NGCHW in grouped conv bwd wei (#1491 ) * Add support for NGCHW in grouped conv bwd wei * Comments fixes * navi fixes * Update function names [ROCm/composable_kernel commit: `73b67f290f`]	2024-09-03 10:52:03 +02:00
Bartłomiej Kocot	ebb827260e	Revert "Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382 ) (#1406 ) (#1415 )" (#1455 )" (#1490 ) This reverts commit `a05bad520a`. [ROCm/composable_kernel commit: `a9b170b541`]	2024-09-02 10:39:49 +02:00
Dan Yao	34c2080c73	[CK_TILE] float -> bf16 inline asm rtn (#1482 ) * asm rtn * add asm rtn macro * reorder macro --------- Co-authored-by: carlushuang <carlus.huang@amd.com> [ROCm/composable_kernel commit: `b8addae293`]	2024-08-30 15:38:09 +08:00

1 2 3 4 5 ...

540 Commits