mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-06 07:51:52 +00:00

Files

SamiAario-AMD 0f10e6d921 [CK_TILE] Fixing Type Conversions in PassThroughPack8 (#2769 )

* Change the return type of run_gemm_combinations in the basic tests

* Change the return type of run_gemm_combinations in the universal tests

* Add universal GEMM tests for bf16 x pk_i4 and fp16 x pk_i4

* Add universal GEMM test for fp8 x pk_i4

* Add basic GEMM tests for bf16 x pk_i4, fp16 x pk_i4 and fp8 x pk_i4.

* Add missing GemmTypeConfig<ck_tile::fp8_t, ck_tile::pk_int4_t, ck_tile::half_t>

* Add missing GemmTypeConfig<ck_tile::bf16_t, ck_tile::pk_int4_t, ck_tile::bf16_t>

* No need for utility in test_ck_tile_elementwise_1d

* Fix conversion from pk_int4x4_t to bf16x8_t in PassThroughPack8

* Avoid union-based type punning in float_to_bf16_truc_raw to make it constexpr compliant

* For consistency also make float_to_bf16_truc_nan_raw constexpr compliant by removing the union

* Use a static_cast to bfloat16_t only when CK_TILE_USE_LLVM_BUILTIN_BF16 is enforced

* Convert from float to bf16 during compilation rather than using magic values

* Fix conversion from pk_int4x4_t to fp8x8_t in PassThroughPack8

* Comment out the basic test for fp16 x pk_i4 as it does not pass

* Add missing GemmTypeConfig<ck_tile::bf8_t, ck_tile::pk_int4_t, ck_tile::half_t>

* Fix conversion from pk_int4x4_t to bf8x8_t in PassThroughPack8

* Add basic and universal GEMM tests for bf8 x pk_i4

* Switch back to amd_assembly_i4_to_fp8x8 in PassThroughPack8 as it works now

* Switch back to amd_assembly_i4_to_bf8x8 in PassThroughPack8 as it works now

* Remove the inefficient fallbacks for fp8 and bf8 in elementwise/unary_element_wise_operation.hpp

* Use explicit macros for enabling and disabling the the constexpr lookup based converters

* Fix two failing tests

* Avoid union-based type punning in float_to_bf16_rtn_raw to make it constexpr compliant

* Use float_to_bf16_rtn_raw instead of float_to_bf16 to create the bf16 lookup table for use in conversions from pk_int4 to bf16

* On ROCm 7.0.1 we need an explicit cast to from uint16_t to bf16_t

2025-09-29 13:34:47 +03:00

algorithm

refactor: use snake_case naming in ck_tile/core components (#2766 )

2025-09-03 09:34:11 +02:00

arch

[CK_TILE] Support f32 in FMHA (fwd and bwd) (#2836 )

2025-09-27 18:03:48 +05:00

container

[CK_TILE] Add permuteN optimization to remove lds operation in c_shuffle (#2764 )

2025-09-08 22:02:48 -07:00

numeric

[CK_TILE] Fixing Type Conversions in PassThroughPack8 (#2769 )

2025-09-29 13:34:47 +03:00

tensor

Fix for Add the API to load SGPR (#2913 )

2025-09-25 10:32:42 -07:00

utility

[CK_TILE] Support f32 in FMHA (fwd and bwd) (#2836 )

2025-09-27 18:03:48 +05:00

config.hpp

Fix formatting problem (#2768 )

2025-09-02 14:14:10 +03:00

README.md

introducing ck_tile! (#1216 )

2024-04-15 19:27:12 -05:00

README.md

ck_tile/core

ck_tile/core contains every basic functions and structures to create a GPU kernel using ck_tile. User should only include ck_tile/core.hpp this single header to use all the functionality. Everything is under ck_tile namespace. The coding style under this folder should be similar to std (snake_case for structure/function, Camel for template types...)

algorithm/
    coordinate transform and some other reusable algorithm
arch/
    contains some basic device building block like mma, buffer addressing, etc...
container/
    contains basic container data structure, array/sequence/tuple/...
numeric/
    data type, and data type related math
tensor/
    tensor descriptors and tile level API
utility/
    other utility function for both host/device