composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-03 05:01:25 +00:00

Author	SHA1	Message	Date
linqunAMD	b0ee317d83	[CK_TILE] Enable ck_tile tests on gfx11 and gfx12 (#2821 ) * [CK_TILE] Enable ck_tile test on gfx11 & gfx12 * revert an unnecessary change * enable pk_int4 on gfx11 & gfx12 * revert .pre-commit-config.yaml	2025-09-12 12:45:14 -07:00
linqunAMD	9fcc1ee9fd	Support Wave32 in CK_TILE - Part 1 (#2594 ) * Support wave32/wave64 in CK_TILE - Part 1 * remove blocksize in kernel launch * fix build error * fix clang format * fix clang format 2 * fix clang format 3 * fix fmha build error * fix fmha build 2 * fix fmha build 3 * fix build error 4 * address review comment * update change log * replace KernelBlockSize with kBlockSize * fix CI fail * fix clang format * address review comment and rebase code. * fix universal test fail --------- Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com> Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>	2025-08-18 10:08:31 -07:00
Max Podkorytov	ab26026835	[CK-tile] add more tests for batched transpose testing the rectangular block tile sizes (#2634 ) * add failing tests * swap out and reference * add constraint assert to transpose input distribution * test both pipelines with rectangular block tile * print mismatched indices * add a smaller failing test for old pipeline * print grid and block * fill output before operating on it * swap m/n tile sizes and make one test pass * add device syncs * add one more flipped test case * flip block tile at host arg init * fix tiles for lds pipeline * clang-format * rename tests * roll back error check * remove device syncs * reduce large test case's size	2025-08-07 16:51:53 -07:00
Max Podkorytov	0d9439760f	remove std::format (#2604 )	2025-08-01 19:22:07 -07:00
Max Podkorytov	f36cb5b2aa	[CK-tile] remove old ck-tile transpose test (#2591 ) * remove old ck-tile transpose test * rename test exe for consistency * replace batched transpose regression test	2025-08-01 14:50:09 -07:00
Max Podkorytov	de0cdb4c31	[CK-tile] add gtest for ck-tile batched transpose kernels (#2585 ) * add a dummy test file * add kernel launch logic to the test * transfer all test cases into gtest params * factor kernel out into test config * add load transpose pipeline tests * add padded tests and skip invalid kernels at runtime * enum class for pipeline type * add multiwarp test cases * fix type * try to solve the problem --------- Co-authored-by: ThomasNing <thomas.ning@amd.com>	2025-07-30 07:31:05 -07:00
Max Podkorytov	821cd26c13	[CK-Tile] Merge transpose examples (#2450 ) * unify pipeline signature with existing example * iwyu * move stuff around in load-tile-transpose * cleanups in batched transpose pipeline * comments * use same inputs size * cleaner printf * print host args * use 64 block sides in the 37_transpose example * roll back grid dimension size adjustment for 37_transpose example * transpose grid for 37_transpose to unify with 35_batched_transpose * unify grid computation logic * make policy methods device only (since they are used only on device from the pipeline) * more host/device attribute cleanups * copy over problem * move over pipeline and policy * add switch to batched transpose api * make the lds problem more similar to original problem * factor out logic into traits * factor out conditional compilation into trait parameter * propagate pipeline to args * unhardcode pipeline dispatch parameter * refactor vector size * put warp tile out of dispatch * rename template parameter for trait * rewrite vector size in terms of problem * mark policy-internal struct variable as device * factor out input distribution and thread access pattern from policies * reword vector size * use datatype across batched transpose pipelines, problems and kernel * remove transpose traits from lds pipeline * add padding to the lds pipeline interface * add comment * remove ck_tile example #37 * update cmakelists * add test for new pipeline * update batched transpose test * roll back load_tile_transpose changes * remove comments * pack dispatch parameters into a config * padM can be enabled * adjust lds vector size to enable padding along N * update test * clean up logic * swap m/n input vector size * adjust perf test script * sweep over C/W in perf test * count both read and written bytes into bandwidth (x2 the number) * clang-format * widen size range for perf test * remove 64k x 64k case; it's too large for index * remove thread tile from dispatch * Solve merge conflict * fix compile * modify the transpose * solve the test error and clang format * Add v3 support for Groupd fwd conv+bias+clamp & ckProfiler (#2463) * Add logging to IsSupported. * Less casting in AddClamp * Conv+bias+clamp instances & profiler BF16 * Fix 3D instances & run just 1x for verification. * :Run just once for verification conv fwd. * ckProfiler conv fwd clampwq * Remove exec bit & formatting * Add support for MultiD for grouped conv fwd v3. * Enable 2Lds. * clean * align instances * align instances * profiler fixes * Fixes * fix * fix --------- Co-authored-by: Adam Osewski <root@quanta-ccs-aus-f01-19.cs-aus.dcgpu> Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> * Fixing 0ms and inf GB/s issue in img2col (#2565) issue : ==== ``` sh $ bin/tile_example_img2col Perf: 0 ms, inf GB/s ``` solution : ====== Problem occured because config.time_kernel is false by default. if false, then no need to calculate perf, just print proper message `image_to_coloumn: pass, No Perf generated due to config.time_kernel=0` * merge with develop * solve clang format --------- Co-authored-by: ThomasNing <thomas.ning@amd.com> Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by: Adam Osewski <root@quanta-ccs-aus-f01-19.cs-aus.dcgpu> Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> Co-authored-by: rahjain-amd <Rahul.Jain@amd.com>	2025-07-26 21:51:54 -07:00
Emily Martins	c08986b026	Tests for CK Tile Batched Transpose and Smoothquant (#2453 ) * Create tests for ck tile batched transpose using example * Create ck tile tests for smoothquant using examples * fix precision input strings and convert batched transpose to regression tests * Code cleanup and fix asserts * add missing licenses * update copyright and licensing in files * Update smoothquant tests to use example's smoothquant.cpp * Add custom target for batched transpose tests * Add missing new lines at end of files for CMakelists * fix typo in batched transpose CMakeList target_compile_options --------- Co-authored-by: root <root@ctr-ubbsmc16.amd.com>	2025-07-17 09:53:34 -06:00

8 Commits