composable_kernel/example/ck_tile at 77123600ee4b6fae077a2145b68b00a8b2ce9460 - composable_kernel - Public git mirror

ROCm/composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-29 03:07:02 +00:00

Files

History

Anton Gorenko 77123600ee Improve fmha_bwd tests performance (#2376 )

* Avoid passing indices (std::vector) by value to host tensor's operator()

Each access requires 2 allocations and copies of the vector.

* Remove 1 unneeded vector copy from the slowest part of fmha_bwd's verification

* Compute ds_hp_host_ref in parallel

This sequntial ForEach is the slowest part of validation and it benefits
from parallel computation.

* Do not use ForEach for simple copy and conversion of large tensors

These tensors all have the same shape {nhead, real_seqlen_q, real_seqlen_k} and
can be copied/converted without complex computations of linear indices.

2025-06-24 07:45:24 -07:00

..

Improve fmha_bwd tests performance (#2376 )

2025-06-24 07:45:24 -07:00

[CK_TILE] Fix compilation errors introduced in #2320 , #2219 and #2214 (#2388 )

2025-06-23 12:29:15 +08:00

[CK_TILE] Support multi-config in tile_example_gemm_universal (#2240 )

2025-06-17 17:27:46 -07:00

Revert "Add ck tile examples to package (#1880 )" (#2150 )

2025-04-30 10:20:16 -07:00

[CK_TILE] Fix compilation errors introduced in #2320 , #2219 and #2214 (#2388 )

2025-06-23 12:29:15 +08:00

Revert "Add ck tile examples to package (#1880 )" (#2150 )

2025-04-30 10:20:16 -07:00

09_topk_softmax

Revert "Add ck tile examples to package (#1880 )" (#2150 )

2025-04-30 10:20:16 -07:00

[CK_TILE] Fix compilation errors introduced in #2320 , #2219 and #2214 (#2388 )

2025-06-23 12:29:15 +08:00

11_add_rmsnorm2d_rdquant

[CK_TILE] fix build error in tile_add_rmsnorm2d_rdquant_fwd (#2243 )

2025-06-17 21:37:59 -07:00

[CK_TILE] Fix compilation errors introduced in #2320 , #2219 and #2214 (#2388 )

2025-06-23 12:29:15 +08:00

[CK_TILE] moe_sorting support "local_tokens" feature for EP case (#2335 )

2025-06-18 10:49:43 +08:00

14_moe_smoothquant

[CK_TILE] Fix compilation errors introduced in #2320 , #2219 and #2214 (#2388 )

2025-06-23 12:29:15 +08:00

[CK_TILE] moe_sorting support "local_tokens" feature for EP case (#2335 )

2025-06-18 10:49:43 +08:00

16_batched_gemm

[CK_TILE] Multiple-D GEMM example (#2219 )

2025-06-13 19:39:11 +02:00

17_grouped_gemm

[CK_TILE] Fix compilation errors introduced in #2320 , #2219 and #2214 (#2388 )

2025-06-23 12:29:15 +08:00

Add missing copyright headers (#2359 )

2025-06-17 14:29:45 -07:00

19_gemm_multi_d

fix the mi350 error (#2378 )

2025-06-20 12:50:13 -07:00

20_grouped_convolution

[CK TILE] Grouped Convolution Forward Kernel (#2188 )

2025-06-20 15:44:36 -07:00

35_batched_transpose

Add missing copyright headers (#2359 )

2025-06-17 14:29:45 -07:00

Simple copy kernel, which can be a tool to experiment with CK_Tile API with minimal code. (#2156 )

2025-05-07 00:02:59 -07:00

transpose load api development (#2177 )

2025-06-18 01:28:34 -07:00

CMakeLists.txt

[CK TILE] Grouped Convolution Forward Kernel (#2188 )

2025-06-20 15:44:36 -07:00

remod.py

introducing ck_tile! (#1216 )

2024-04-15 19:27:12 -05:00