[CK_TILE] Split-K autodeduction (#3351)

* First version of split-K autodeduction.

* Fix circular dependency and kernel construction.

* Fix tolerance calculation for bwd weight example.

* Simplify kernel construction.

* Fix kernel launching bug for split-K autodeduce.

* Add split-K autodeduction support for the two stage example.

* Fix a corner case.

* Fix clang-format.

* Fix clang-format for inc files.

* Add missing header.

* Prevent too large split-K values.

* Fix formatting.

* Add unit tests for IsSupportedArgument in grouped bwd conv.

* clang-format.

* Fix merge conflicts.

* Address feedback from code review.

* clang-format

* Fix new tests after merge.

---------

Co-authored-by: Ville Pietilä <>
This commit is contained in:
Ville Pietilä
2025-12-10 09:30:30 +02:00
committed by GitHub
parent 1aa93ef551
commit fc22320d78
11 changed files with 485 additions and 51 deletions

View File

@@ -70,6 +70,24 @@ inline bool is_load_tr_supported()
// Check if load transpose is supported.
return get_device_name() == "gfx950";
}
inline size_t get_num_cus()
{
hipDeviceProp_t props{};
int device;
auto status = hipGetDevice(&device);
if(status != hipSuccess)
{
return 0;
}
status = hipGetDeviceProperties(&props, device);
if(status != hipSuccess)
{
return 0;
}
return static_cast<size_t>(props.multiProcessorCount);
}
} // namespace ck_tile
#endif