[CK_TILE] Share partition index across threads and specify offset in load_tile()/async_load_tile()/load_tile_transpose() (#2905)

* Allow sharing partition index across threads

* Fix typo PartitoinIndex -> PartitionIndex

* Remove C++20 'requires' usages

* Add missing template arguments

* Fix load_tile() overload ambiguity issue

* Use SFINAE to exclude invalid arguments

* Add additional offset parameter to the async_load_tile()

* Remove async_load_tile() default argument to avoid ambiguity

* Extract tile_window coordinate compute logic as method

* Use warp-shared LDS base address in tile_window::async_load()

* Add constraint to tile_window::load() templates

* Fix wrong type traits is_class_v<> usages

* Add missing constraint to async_load_tile()

* Add missing tile_window::load() overload

* Add more constraint to avoid load_tile() call ambiguity

* Rename ParitionIndex as ReplacementPartitionIndex

* Update pre_computed_warp_coords_ in move_extended()

* Fix inconsistency between template parameters and documentation

* Allow specifying pre-computed parition index

* Add type straits is_sequence<> & is_tile_distribution<>

* Add type straits is_tensor_view<>

* Add type constraints to make_tile_window() templates

* Allow passing partition_index to set_tile_if()

* Allow specifying partition_index to store_tile()

* Add missing template parameter of replace_bottom_tensor_view()

* Allow passing partition_index to Default2DEpilogue

* Make get_partition_index() public

* Add _with_offset() postfix to avoid resolution error

* Remove ReplacementPartitionIndex template param

* Add missing comments

* Add load_tile_transpose_with_offset() overload
This commit is contained in:
Po Yen Chen
2025-11-12 10:26:14 +08:00
committed by GitHub
parent 92c1f4981a
commit 40d2ed0f2a
11 changed files with 441 additions and 58 deletions

View File

@@ -214,6 +214,17 @@ CK_TILE_HOST_DEVICE static void print(const sequence<Is...>&)
printf(">");
}
template <typename T>
struct is_sequence : std::false_type
{
};
template <index_t... Is>
struct is_sequence<sequence<Is...>> : std::true_type
{
};
template <typename T>
inline constexpr bool is_sequence_v = is_sequence<T>::value;
namespace impl {
template <typename T, T... Ints>
struct __integer_sequence;