mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-14 02:02:46 +00:00

Files

Wojciech Laskowski 0ebeb88ba9 [CK Tile] Adding WMMA wrappers for dense builtins (#5801 )

## Motivation

This PR is part of the [WMMA/MFMA] unification work. It's the first of
the series of PRs that add all the necessary MMA builtins as a
`amdgcn_mma` structs.

## Technical Details

This change adds new specializations for WMMA dense builtins. In total,
we have now 9 RDNA4 builtins and 3 RDNA3 builtins.

## Test Plan

All the new wrappers were added to the test suite in
`test_amdgcn_mma_layout.inc`.

## Test Result

Test pass locally, waiting for the CI.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Yung-sheng Tu <yung-sheng@streamhpc.com>

2026-04-27 11:57:51 +00:00

algorithm

[CK][CK Tile] Improve access for merged groups and remove modulo from xor (#5334 )

2026-03-20 15:45:45 +00:00

arch

[CK Tile] Adding WMMA wrappers for dense builtins (#5801 )

2026-04-27 11:57:51 +00:00

container

[CK] Fix/suppress clang lifetimebound warnings with staging compiler. (#6550 )

2026-04-22 15:47:47 +00:00

numeric

[CK_TILE] Enable canonical-NaN BF16 conversion for FMHA on RDNA (#6253 )

2026-04-20 14:52:24 -04:00

tensor

[CK_TILE] fix(fmha): support >2GB KV cache in batch prefill via template dispatch (#6653 )

2026-04-24 07:08:41 +08:00

utility

[CK_TILE] fix(fmha): support >2GB KV cache in batch prefill via template dispatch (#6653 )

2026-04-24 07:08:41 +08:00

config.hpp

[CK_TILE] Enable canonical-NaN BF16 conversion for FMHA on RDNA (#6253 )

2026-04-20 14:52:24 -04:00

README.md

introducing ck_tile! (#1216 )

2024-04-15 19:27:12 -05:00

README.md

ck_tile/core

ck_tile/core contains every basic functions and structures to create a GPU kernel using ck_tile. User should only include ck_tile/core.hpp this single header to use all the functionality. Everything is under ck_tile namespace. The coding style under this folder should be similar to std (snake_case for structure/function, Camel for template types...)

algorithm/
    coordinate transform and some other reusable algorithm
arch/
    contains some basic device building block like mma, buffer addressing, etc...
container/
    contains basic container data structure, array/sequence/tuple/...
numeric/
    data type, and data type related math
tensor/
    tensor descriptors and tile level API
utility/
    other utility function for both host/device