mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-11 08:50:17 +00:00

Files

Anton Gorenko 7b074249f4 [CK_TILE] Fix UB and corner cases in f32/f16 to/from f8 conversion (#2571 )

* Add tests for host convesion f32/f16 to f8

* Add tests for host convesion from f8 to f32/f16

* Fix UB and corner cases in f32/f16 to/from f8 conversion

* There are UBs when very small values are converted to f8: bitshifts
  can be larger that type width. Using unsigned long long does not help
  because exponent_diff >= 64 in such cases. This causes that values
  like 2.117582368e-22 are converted to non-zero f8 in host validation
  of FMHA tests, test_f8 crashes with segfault in completely irrelevant
  code like GTest internals or produces non-deterministic results etc.
* Fix FNUZ conversion to return NaN for NaN inputs.
* Fix compilation error (due to uint8_t << 8) in OCP e5m2 to f16
  conversion.

* Replace some magic numbers with values from numeric_traits

* Build tests only on devices supporting the type

2025-07-31 09:54:17 +05:00

algorithm

upgrade from clang-format-12 to clang-format-18 (#2568 )

2025-07-28 11:34:07 -07:00

arch

Expand the bandwidth of direct_global_to_lds for gfx950 (#2576 )

2025-07-28 23:56:53 -07:00

container

upgrade from clang-format-12 to clang-format-18 (#2568 )

2025-07-28 11:34:07 -07:00

numeric

[CK_TILE] Fix UB and corner cases in f32/f16 to/from f8 conversion (#2571 )

2025-07-31 09:54:17 +05:00

tensor

[CK_TILE] FMHA bwd Support hdim as a Multiple of 32 (#2130 )

2025-07-29 09:31:14 +08:00

utility

upgrade from clang-format-12 to clang-format-18 (#2568 )

2025-07-28 11:34:07 -07:00

config.hpp

default skip y point to r (#2457 )

2025-07-06 23:54:34 -07:00

README.md

introducing ck_tile! (#1216 )

2024-04-15 19:27:12 -05:00

README.md

ck_tile/core

ck_tile/core contains every basic functions and structures to create a GPU kernel using ck_tile. User should only include ck_tile/core.hpp this single header to use all the functionality. Everything is under ck_tile namespace. The coding style under this folder should be similar to std (snake_case for structure/function, Camel for template types...)

algorithm/
    coordinate transform and some other reusable algorithm
arch/
    contains some basic device building block like mma, buffer addressing, etc...
container/
    contains basic container data structure, array/sequence/tuple/...
numeric/
    data type, and data type related math
tensor/
    tensor descriptors and tile level API
utility/
    other utility function for both host/device