composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-03-23 16:47:40 +00:00

Author	SHA1	Message	Date
Ville Pietilä	1fc5a3f3ac	Build CK on Windows (#3458 ) * CMakeLists.txt hack for Windows. * Add Windows build instructions. * Fix type issue with variadic min function. * Use std::common_type to fix the variadic min/max functions. * Enable CPU guard compilation on Windows. * Suppress warnings related to std::getenv on Windows platform. * Git ignore the output directory on Windows platform. * Powershell script for running tests and generating reports. * Improve test logging. * Disable non-conv tests. * Fix Debug build on Windows. * More debug build changes. * Update Windows build instructions. * Enable all tests. * Test fixes. * Suppress not found linker options warning. * Update unsigned long literals and format specifiers to work correctly in Windows * Fix conv 3D bwd weight bilinear tests on Windows. * Revert changes on .gitignore. * Clean-up CMake project file for Windows builds. * clang-format * Fix definition of CMAKE_PREFIX_PATH on both Linux and Windows platforms. * Fix building examples on Windows. * Update Readme. * Remove the suppression of the deprecated warnings. * Remove Windows specific min/max implementations from CK Tile math core. * Remove unnecessary no-op on Windows. --------- Co-authored-by: User <user@example.com> Co-authored-by: Ville Pietilä <none> Co-authored-by: John Afaganis <john.afaganis@amd.com> Co-authored-by: Ville Pietilä <>	2026-01-14 07:31:45 -08:00
yinglu	8fec8054b2	ck: add tf32 in `DTYPES` to control instances build(#3317 )	2025-12-08 16:24:20 +08:00
Vidyasagar Ananthan	c7ded76cc7	Adding note on CMake convenience script (#3139 ) * Adding note on convenience script * Addressing feedback * Update README.md reword --------- Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com>	2025-11-03 12:21:57 -08:00
Anton Gorenko	ec006bb8e0	[CK_TILE] Add gtests for FMHA (#2744 ) * Improve random number generation * use different seed for each input (Q, K, V...); * use deterministic generation of: * seqstart_q/k (for group mode); * block_table (for paged-kvcahe); * cache_batch_idx (for kvcache); * Extract arg_parser-related code from run functions to use them as tests * Split examples into main programs and fmha runners, build instances separately * Add dummy tests that use instances and runners * Fix a missed corner case of f32->f8 conversion When value if < min f8 denormal but > min f8 denormal / 2, it must be rounded to min f8 denormal (i.e. 0b1), not to 0. * Fix incorrect fp8 scales for P and O in validation code DataTypeConfig was incorrectly compared with fp8_t. * Add host generation of dropout random values and use it for validation Previously host validation (reference_batched_dropout) used random numbers generated by BlockDropout of the kernel, meaning that incorrect generation on device (bad distribution, repeated numbers, too many zeros, etc.) would not trigger any validation errors. * Implement tests from smoke_test_bwd.sh * Return result as enum to distinguish failure and missing instance * Add tests for bwd features: bias, alibi, dropout * Implement tests from smoke_test_fwd.sh * Pass seqlen_q/k as vectors to fwd and bwd runners * Add tests for fwd features: bias, alibi, dropout * Add tests for pagedkv and splitkv * Fix conditions when to use splitkv and pagedkv kernels splitkv was executed only when use_kvcache which == (need_append_kvcache \|\| use_cache_batch_idx \|\| 0 < page_block_size). In the SplitKV tests: the regular fwd kernel was executed if use_cache_batch_idx was not requested even when num_splitkv > 1. In the AppendKV tests: the pagedkv kernel was executed but it often failed to find an instance. * Add tests for appendkv * Use is_v_rowmajor = true because there are no instances with column layout anymore * Split public and private compile options for instances Tests and examples need to know only about CK_TILE_FMHA_FWD__API. Improve parsing validation in bias and mask * Pass bias as string for consistency with mask * Catch parsing and other exceptions * Add bwd test for deterministic flag * Initialize fp8 tensors (-init=ufq) similarly to uf * Fix splitkv/pagedkv invocation: use padded sk when seqlen_k_ptr is not null seqlen_k cannot be used to determine padding when seqlen_k_ptr is provided. The actual seqlen_k is taken from seqlen_k_ptr[b]. Even seqlen_k values (% bn0 == 0) use padded seqlen_k while seqlen_k_ptr may contain arbitrary values. In the example or tests this produces incorrect results with appendkv (for example, -d=32 -s=1 -s_k=64 -s_knew=7 -vlayout=c -b=8). * Fix use_pagedkv value when kvcache = true but page_block_size = 0 In this case block_table_ptr is nullptr which is accessed in the kernel. * Clean up bwd tests * Unify fwd tests for f16/bf16 and fp8 * Use better explicit instantiation declaration for fmha_bwd<2> * Use the same seed for all tests, allow to override it with env variable * Undo clang-format of one irrelevant file For some reason my local clang-format-18 and the one in CI work differently. * Do not build instances and tests on unsupported archs * Build instance libraries as OBJECT library * CI: Enable sccache for HIP There are source files with LANGUAGE HIP, they need -DCMAKE_HIP_COMPILER_LAUNCHER=sccache * Add tests to REGRESSION_TESTS * Fix OOB accesses in deterministic bwd due to incorrectly assumed kN0 The runner assumes kN0 = (hdim_q <= 128) ? 128 : 64 but there are smaller tiles (for tr_load or fp32). This can create too small dq_acc_buf. * Pass CK_TILE_FMHA_FWD__API as INTERFACE compile options The instances don't actually depend on them, only examples and tests do. Passing these definitions as INTERFACE allows to change FMHA_FWD_ENABLE_APIS without recompiling instances that are already in ccache. Fix formatting and names	2025-09-10 08:06:14 +05:00
geozhai	1e1ee758fa	update CK build instruction step 4 (#2563 ) Co-authored-by: Aviral Goel <aviral.goel@amd.com>	2025-08-11 00:26:13 -04:00
spolifroni-amd	a426f67301	creation of install doc and refactor of doc in general (#1908 ) * creation of install doc and refactor of doc in general * updates based on review comments * updated based on review comments * updated readme and contributors markdown * added extra note to not use -j on its own * added note about smoke tests and regression tests * made changes as per Illia's feedback --------- Co-authored-by: Aviral Goel <aviral.goel@amd.com>	2025-03-27 15:13:18 -06:00
Illia Silin	43c90b5234	RE-enable DL and DPP instances by default. (#1954 ) * enable DL and DPP instances by default * fix cmake logic	2025-03-06 21:45:31 -08:00
Illia Silin	9b51c08bf7	remove support for gfx940 and gfx941 targets (#1944 ) * remove support for gfx940 and gfx941 targets * update changelog	2025-03-05 11:07:33 -08:00
Aviral Goel	54de3e55e1	Implementing Test Filters for Smoke and Regression Tests (#1819 ) * smoke and regression targets working with tests * test filters work for both examples and test * removed uneccesary comments * added a missing comment * added a missing comment * fixed typo in the comments * updated README * Update PULL_REQUEST_TEMPLATE.md updating the template for future addition of test cases * Update PULL_REQUEST_TEMPLATE.md	2025-01-16 16:40:08 -08:00
darren-amd	26b3829c02	Disable building DPP kernels by default (#1804 ) * Disable building DPP kernels by default * Disable building dpp instances, examples, or tests if DPP_KERNELS is not set * Add new DPP_KERNELS flag to readme	2025-01-08 13:50:42 -05:00
Bartłomiej Kocot	5affda819d	Add basic documentation structure (#1715 ) * Add basic documentation structure * Add terminology placeholder * Add codegen placeholder * Create template for each page	2024-12-04 00:46:47 +01:00
Harisankar Sadasivan	d6d4c2788b	universal streamk fp8 changes (#1665 ) * universal streamk fp8 changes & ckprofiler instances * revert strides to -1 and verification options * fp8 exclusion on pre-gfx94 for universal_streamk * PR review based revisions: permissions reverted, removed hip err checks --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2024-11-21 08:21:37 -08:00
Illia Silin	03c6448ba3	Reduce build time. (#1621 ) * disable fp8 gemm_universal on gfx90a and gfx908 by default * fix cmake syntax * fix clang format * add ifdefs in amd_xdlops * disable fp8 gemm instances on gfx90a by default * update readme	2024-11-01 13:52:23 +08:00
spolifroni-amd	794f2d64a8	added link to documentation (#1578 )	2024-10-21 08:35:57 -07:00
Illia Silin	f46a9eee9d	only build tests and examples if user sets GPU_TARGETS (#1565 )	2024-10-10 15:31:56 -07:00
Illia Silin	7d8ea5f08b	Fix build logic using GRU_ARCHS. (#1536 ) * update build logic with GPU_ARCHS * fix the GPU_ARCHS build for codegen * unset GPU_TARGETS when GPU_ARCHS are set	2024-10-07 08:18:23 -07:00
Lisa	281f836903	fix typo (#1067 ) Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2023-12-14 14:21:18 -08:00
Illia Silin	d939411dae	Switch from ROCmSoftwarePlatform to ROCm org (#1091 ) * switch from ROCmSoftwarePlatform to ROCm org * replace ROCmSoftwarePlatform with ROCm in few more places	2023-12-07 15:59:34 -08:00
Illia Silin	4e44a9e8da	Enable sccache in the default docker and CI. (#1009 ) * replace ccache with sccache, pin package versions * put ccache back temporarily to avoid breaking other CI jobs * add sccashe_wrapper.sh script * fix the package version syntax * fix the pymysql package issue * run sccache_wrapper before build if ccache server found * set the paths before calling the sccache_wrapper * use /tmp instead of /usr/local for cache * try using sccache --start-server instead of wrapper * try using redis server with sccache * define SCCACHE_REDIS * add redis and ping packages, and redis port * use the new sccache redis server * do not use sccache with staging compiler * fix the condition syntax * add stunnel to redis * add tunnel verification * separate caches for different architectures * fix syntax for the cache tag * quse double brackets for conditions * add bash line to the script * add a switch for sccache and only use it in build stage * run check_host function when enabling sccache * fix the invocation tags for sccache * fix groovy syntax * set the invocation tag in groovy * disable sccache in clang-format stage * try another syntax for invocation tags * use local sccache server if can't connect to redis * fix script syntax * update README * refresh readme * readme updates * remove the timing and verification caveat from readme --------- Co-authored-by: Lisa Delaney <lisa.delaney@amd.com>	2023-10-30 13:16:29 -07:00
Illia Silin	9195435c77	Disable DL kernels by default. (#816 )	2023-07-26 11:06:45 -05:00
Adam Osewski	237f9cd3aa	Add basic setup for precommit (#749 ) (#764 ) * Add basic setup for precommit * Update README.md with instructions on installing precommit hooks --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: Bartlomiej Wroblewski <bwroblewski10@gmail.com>	2023-07-06 11:01:06 -05:00
Sam Wu	3cff340423	Documentation Updates (#710 ) * update documentation dependencies add version number to docs rename doc config directories enable more doc formats on rtd add license section in docs	2023-05-18 11:08:38 -06:00
Sam Wu	f80776d937	standardize docs (#655 )	2023-03-23 20:58:59 -07:00
Po Yen Chen	337642a48c	Add quotes for string option values (#472 )	2022-10-27 15:33:14 -06:00
Chao Liu	6de749e29c	Update doc (#464 ) * update cmake script * update readme * Update README.md * add citation * add images * Update README.md * update * Update README.md * Update CONTRIBUTORS.md * Update README.md * Update CITATION.cff * Update README.md * Update CITATION.cff * update doc * Update CONTRIBUTORS.md * Update LICENSE	2022-10-03 14:34:40 -05:00
Chao Liu	473ba5bc4a	update document: Readme, contributors, citation, (#463 ) * update cmake script * update readme * Update README.md * add citation * add images * Update README.md * update * Update README.md * Update CONTRIBUTORS.md * Update README.md * Update CITATION.cff * Update README.md * Update CITATION.cff	2022-10-03 00:48:24 -05:00
Chao Liu	500fa99512	Clean up conv example, Instances, profiler and test (#324 ) * convnd_fwd fp16 example * update example * update example * update instance * updating refernce conv * update reference conv * update conv fwd profiler * update conv 1d and 3d instance * update include path * clean * update profiler for conv bwd data and weight * update conv bwd weight * clean * update conv example * update profiler for conv bwd weight * update ckprofiler for conv bwd data * fix reference conv bwd data bug; update conv bwd data test * update examples * fix initialization issue * update test for conv fwd * clean * clean * remove test case too sensitive to error threshhold * fix test * clean * fix build * adding conv multiple d * adding conv multiple D * add matrix padder * add gemm padding to convnd * adding group conv * update gemm multi-d * refactor * refactor * refactor * clean * clean * refactor * refactor * reorg * add ds * add bias * clean * add G * adding group * adding group * adding group * update Tensor * clean * update example * update DeviceGemmMultipleD_Xdl_CShuffle * update conv bwd-data and bwd-weight * upate contraction example * update gemm and batch gemm with e permute * fix example build * instance for grouped conv1d * update example * adding group conv instance * update gemm bilinear instance * update gemm+add+add+fastgelu instance * update profiler * update profiler * update test * update test and client example * clean * add grouped conv into profiler * update profiler * clean * add test grouped conv, update all conv test to gtest * update test	2022-07-29 18:19:25 -05:00
Chao Liu	0dcb3496cf	Improve external interface for GEMM and GEMM+add+add+fastgelu (#311 ) * interface for GEMM and GEMM+add+add+fastgelu * rename namespace * instance factory * fix build * fix build; add GEMM client example * clean	2022-06-30 22:11:00 -05:00
Liam Wrubleski	b653c5eb2e	Switch to standard ROCm packaging (#301 ) * Switch to standard ROCm packaging * Revert .gitignore changes * install new rocm-cmake version * update readme Co-authored-by: illsilin <Illia.Silin@amd.com> Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-06-25 09:35:16 -05:00
Chao Liu	ccbd8d907b	update readme and script (#290 )	2022-06-20 23:34:32 -05:00
JD	cec69bc3bc	Add host API (#220 ) * Add host API * manually rebase on develop * clean * manually rebase on develop * exclude tests from all target * address review comments * update client app name * fix missing lib name * clang-format update * refactor * refactor * refactor * refactor * refactor * fix test issue * refactor * refactor * refactor * upate cmake and readme Co-authored-by: Chao Liu <chao.liu2@amd.com>	2022-05-12 09:21:01 -05:00
Wen-Heng (Jack) Chung	968bd93285	Update README.md (#228 )	2022-05-09 15:00:04 -05:00
Chao Liu	cd167e492a	Compile for gfx908 and gfx90a (#130 ) * adding compilation for multiple targets * fix build * clean * update Jekinsfile * update readme * update Jenkins * use ck::half_t instead of ushort for bf16 * rename enum classes * clean * rename * clean	2022-03-31 12:33:34 -05:00
Chao Liu	e823d518cb	ckProfiler and device-level XDL GEMM operator (#48 ) * add DeviceGemmXdl * update script * fix naming issue * fix comment * output HostTensorDescriptor * rename * padded GEMM for fwd v4r4r4 nhwc * refactor * refactor * refactor * adding ckProfiler * adding ckProfiler * refactor * fix tuning parameter bug * add more gemm instances * add more fp16 GEMM instances * fix profiler driver * fix bug in tuning parameter * add fp32 gemm instances * small fix * refactor * rename * refactor gemm profiler; adding DeviceConv and conv profiler * refactor * fix * add conv profiler * refactor * adding more GEMM and Conv instance * Create README.md Add build instruction for ckProfiler * Create README.md Add Readme for gemm_xdl example * Update README.md Remove build instruction from top most folder * Update README.md * clean up	2021-11-14 11:28:32 -06:00
Chao Liu	c03045ce2d	rename	2021-08-10 23:45:36 +00:00
Chao Liu	85a1429301	Update README.md	2021-07-28 09:41:38 -05:00
Chao Liu	56f93c6f33	Update README.md	2021-07-28 09:40:44 -05:00
Chao Liu	4682d070a6	Create README.md (#45 ) * Create README.md	2021-07-08 13:32:29 -05:00

38 Commits