mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-11 08:48:45 +00:00

Files

JiaLuo-CAN 5ff7497fa7 [rocm-libraries] ROCm/rocm-libraries#7537 (commit 07123f4)

[CK Tile] Fix Grouped Gemm quant mixed precision (#7537)

<Migrate from Internal repo PR>
test_ck_tile_grouped_gemm_quant_tensor would fail for mixed FP8/BF8
cases:
std::tuple<Row, Col, Row, FP8, F32, BF8, F32, F32, F16, TensorQuant,
False, True, False>,
std::tuple<Row, Col, Row, BF8, F32, FP8, F32, F32, F16, TensorQuant,
False, True, False>

GFX1250 would fail with incorrect results, GFX950 would fail when
compiling BF8+FP8 and give incorrect results for FP8+BF8.
The issue is due to the wrong ComputeDataType selection.
The fix is to consider original ADataType and BDataType even when
ComputeDataType is not void. For compiling error on gfx950, the bf8,
fp8, 16x16x32 warp Gemm is added.

2026-05-21 08:36:23 -07:00

algorithm

[rocm-libraries] ROCm/rocm-libraries#7528 (commit b4cae6f)

2026-05-20 17:25:22 +03:00

arch

[rocm-libraries] ROCm/rocm-libraries#6014 (commit 2f8259d)

2026-05-21 09:05:19 +02:00

container

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

numeric

[rocm-libraries] ROCm/rocm-libraries#6088 (commit 6ac353c)

2026-05-20 12:36:13 +00:00

tensor

[rocm-libraries] ROCm/rocm-libraries#7528 (commit b4cae6f)

2026-05-20 17:25:22 +03:00

utility

[rocm-libraries] ROCm/rocm-libraries#7537 (commit 07123f4)

2026-05-21 08:36:23 -07:00

config.hpp

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

README.md

introducing ck_tile! (#1216 )

2024-04-15 19:27:12 -05:00

README.md

ck_tile/core

ck_tile/core contains every basic functions and structures to create a GPU kernel using ck_tile. User should only include ck_tile/core.hpp this single header to use all the functionality. Everything is under ck_tile namespace. The coding style under this folder should be similar to std (snake_case for structure/function, Camel for template types...)

algorithm/
    coordinate transform and some other reusable algorithm
arch/
    contains some basic device building block like mma, buffer addressing, etc...
container/
    contains basic container data structure, array/sequence/tuple/...
numeric/
    data type, and data type related math
tensor/
    tensor descriptors and tile level API
utility/
    other utility function for both host/device