mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 18:17:44 +00:00
* Allow sharing partition index across threads
* Fix typo PartitoinIndex -> PartitionIndex
* Remove C++20 'requires' usages
* Add missing template arguments
* Fix load_tile() overload ambiguity issue
* Use SFINAE to exclude invalid arguments
* Add additional offset parameter to the async_load_tile()
* Remove async_load_tile() default argument to avoid ambiguity
* Extract tile_window coordinate compute logic as method
* Use warp-shared LDS base address in tile_window::async_load()
* Add constraint to tile_window::load() templates
* Fix wrong type traits is_class_v<> usages
* Add missing constraint to async_load_tile()
* Add missing tile_window::load() overload
* Add more constraint to avoid load_tile() call ambiguity
* Rename ParitionIndex as ReplacementPartitionIndex
* Update pre_computed_warp_coords_ in move_extended()
* Fix inconsistency between template parameters and documentation
* Allow specifying pre-computed parition index
* Add type straits is_sequence<> & is_tile_distribution<>
* Add type straits is_tensor_view<>
* Add type constraints to make_tile_window() templates
* Allow passing partition_index to set_tile_if()
* Allow specifying partition_index to store_tile()
* Add missing template parameter of replace_bottom_tensor_view()
* Allow passing partition_index to Default2DEpilogue
* Make get_partition_index() public
* Add _with_offset() postfix to avoid resolution error
* Remove ReplacementPartitionIndex template param
* Add missing comments
* Add load_tile_transpose_with_offset() overload
[ROCm/composable_kernel commit: 40d2ed0f2a]
ck_tile/core
ck_tile/core contains every basic functions and structures to create a GPU kernel using ck_tile. User should only include ck_tile/core.hpp this single header to use all the functionality. Everything is under ck_tile namespace. The coding style under this folder should be similar to std (snake_case for structure/function, Camel for template types...)
algorithm/
coordinate transform and some other reusable algorithm
arch/
contains some basic device building block like mma, buffer addressing, etc...
container/
contains basic container data structure, array/sequence/tuple/...
numeric/
data type, and data type related math
tensor/
tensor descriptors and tile level API
utility/
other utility function for both host/device