mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 02:02:46 +00:00
* Squashed 'src/composable_kernel/' content from commit a4b211238
git-subtree-dir: src/composable_kernel
git-subtree-split: a4b21123849265d90a6b8fa86905a9a8ab253787
* add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files
* Squashed 'src/composable_kernel/' changes from a4b211238..5805b5dc4
5805b5dc4 Update develop (#5) (#6)
ede23b251 Merge pull request #4 from ROCmSoftwarePlatform/separate_online_compile
8b079b5c6 refactor
c3d788bfa refactor
fcf913481 rename
git-subtree-dir: src/composable_kernel
git-subtree-split: 5805b5dc442dd8d71295954c4a755a6ef30593bb
* fix
* refactor
* remove online compilation from CK
* refactor
* fix
* add ctest
* tidy
* add tidy
* tidy
* tidy
* tidy
* tidy
* tidy
* tidy
* tidy
* tidy
* tidy
* add c-style pointer cast
* vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast
* fix clang warning suppression
* tidy
* suppress cppcheck
* fix enum issue
* revert chagnes to hip build
* fix kernel filename
* update CK build script
* rename
* rename
* make innner product compatiable on gfx900
* Update src/include/miopen/solver/ck_utility_common.hpp
Co-authored-by: JD <Jehandad.Khan@amd.com>
* compiler parameter use stream
* use int instead of index_t in kernel wrapper
* DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element
* refactor
* refactor
* change cmakelist
* change ck common utility
* fix
* Squashed 'src/composable_kernel/' changes from 5805b5dc4..dd3d4444e
dd3d4444e Merge pull request #16 from ROCmSoftwarePlatform/develop
cb6b2dc63 Merge pull request #14 from ROCmSoftwarePlatform/miopen_downstream_init_integration
d9b2fcab4 Merge pull request #8 from ROCmSoftwarePlatform/miopen_downstream_init_integration
57b74196a refactor
431c47bea refactor
9a0d05870 DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element
bc4146402 use int instead of index_t in kernel wrapper
87a2fc094 compiler parameter use stream
24743c85e make innner product compatiable on gfx900
7ad33d8e1 rename
5a3bace8d rename
12405c12a update CK build script
3c2effd43 fix kernel filename
12ff8d1ca fix enum issue
f0f97fd79 tidy
26f311aa9 fix clang warning suppression
c4f47ed09 vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast
35fd7bf79 add c-style pointer cast
9c31642f0 tidy
1a2efac60 tidy
ddd3b4e94 tidy
7daa0cfbf tidy
cab6e58d3 tidy
d9a8aebd8 tidy
533e356ce tidy
42639836b tidy
efe2836a2 add tidy
d53d7c666 fix
cf4ea1145 remove online compilation from CK
e63b17bdf refactor
5a2e56f78 Merge commit '437cc595c6e206dfebb118985b5171bbc1e29eab' into composable_kernel_init_integration_v3
702078bfd Merge pull request #7 from ROCmSoftwarePlatform/master
9ce85357a Update develop (#5)
10a172710 add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files
git-subtree-dir: src/composable_kernel
git-subtree-split: dd3d4444e9b9ed07a54f82d91d969770aa8d5074
* Tiny fix in using data type template parameters in blockwise and direct_threadwise kernel
* Fix with regard to implementing GetZeroVal() in both kernel and host
* Avoid convert to compType from dstDataType before writting the output value
* Add half_t support to NumericLimits and make constexpr GetZeroVal() of binary operator
* Add CONSTANT decorator for descriptor read buffer
* Use get_thread_local_1d_id() for thread local Id
* Rename GetZeroVal() to GetReductionZeroVal() in the kernels
* Remove constexpr from initialized zeroVal and tiny fix in reduction_operator.hpp
* Occasional tiny simplification and update in the kernel files
* Update in src/reducetensor.cpp for consistent IDs passing to the kernel
* Update to re-order tensor dimensions on the host, split second_call kernel wrapper files and simplify reduce_all kernel wrappers
* Update to remove OpenCL tidy checking failures
* Small updates in src/reducetensor.cpp
* Update for better readability
* Remove unused codes and not-needed template parameters in the kernel wrappers
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: JD <Jehandad.Khan@amd.com>
[ROCm/composable_kernel commit: dfb80c4e39]