composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-04 13:41:24 +00:00

Author	SHA1	Message	Date
Illia Silin	1b66f3f4a3	Add declarations for atomic add for fp16 and unsigned short. (#2483 ) * add template for fp16 atomic add * add template for unsigned short atomic add * use atomicCAS in atomic add for fp16 and unsigned short * revrt back to atomic add using casting	2025-07-10 07:18:56 -07:00
Illia Silin	93420ecf89	Revert "Add templates for fp16 and unsigned short atomic add to fix FBGEMM bu…" (#2474 ) This reverts commit `112b47e885`.	2025-07-08 19:01:26 -07:00
Illia Silin	112b47e885	Add templates for fp16 and unsigned short atomic add to fix FBGEMM builds. (#2471 ) * add template for fp16 atomic add * add template for unsigned short atomic add * use atomicCAS in atomic add for fp16 and unsigned short	2025-07-08 18:09:30 -04:00
Illia Silin	d2eab23958	Split up data_type header. (#1996 ) * split fp64 vector data type * add missing header * move e8m0 structs * split off numeric_utils header * fix typo * split off numeric limits header * update data_type header * fix clang format * split off vector type header * fix clang format * fix typo for binary_inf	2025-03-24 15:08:54 -07:00
Illia Silin	b94fd0b227	update copyright headers (#726 )	2023-05-31 18:46:57 -05:00
Chao Liu	d3051d7517	add license in file (#303 )	2022-06-24 23:32:43 -05:00
Qianfeng	63eee2d999	Overhaul to Reducton and its dependants (#237 ) * Tiny fix in dynamic_buffer.hpp to support vectorized AtomicAdd for double type * Update to host layer and host reduction * Merge and remove reduction kernels * Merge and remove reduction device interfaces and update pooling device interface * Merge and remove useless reduction device instances * Update to reduction profiler and reduction ctests * Update to reduction and pooling examples and add one reduction example * Change to reduction examples to let them testable by ctest * Add explicit pass checking for reduction and pooling examples * Explicit assignment of tensor shapes in example reduce_blockwise_two_call * Use atomic_add to repace atomicAdd and add atomic_add for double type * Add reduce ctest support for double data type * Replace to_int_vector() by using c++ std::vector::assign() * Keep DeviceReduceThreadWise separated from DeviceReduceBlockWise * Merge DeviceReduceBlockWise and DeviceReduceMultiBlockAtomicAdd into DeviceReduceMultiBlock * Add GetAtomicOperationZeroValue() support for AtomicMax * Tiny change to reduce example README.md * Fix some tiny issues due to branch merging * Revoke previous change in dynamic_buffer.hpp and add atomic_add for double2_t * Add reduce multiblock_atomic_add instances for fp64 to verify vectorized atomic_add on fp64 * Renaming * Clean the header includings in device_reduce instances header files	2022-05-24 12:19:12 -05:00
rocking5566	0ffe956ab1	Gemm reduce max (#209 ) * [What] Rename the example [Why] Prepare to add unary reduction * Add global oparation to the parameter * Add atomicmax * Fix compile error * Support atomicMax (hip library) * Rename the reduction example * Fix target name * use p_d1_grid as the indicator directly * Prevent performance issue. Let passthrough handle it. * Implement the function template the specialize the float2 * No need to separate into two lines * Remove empty line * add comment * Fix compile error due to merge from develop * make the implementation of atomic_max / atomic_add explicit for each datatype * Refine typo * For future CI test * Fix compiler error in ckProfiler * Merge commit 'de2769e3a6695b38a20529261273ddc5cdaab2fe' * simply use remove_pointer * Rename type and var * Refine example * Modify reducemax example * Fix bug in reduction * Change initialize range * Implement F64 version of atomicMax * Move reduction code together * Add buffer atomic_max * Fix coding style by clang-format * Integrate new api of DeviceGemmReduce_Xdl_CShuffle * Integrate Batch gemm reduction * Fix example * fix example * clean up * Fix batch gemm tensor operation * Fix coding style * Fix template augument * Fix clang format * Keep flexible of different stride for each D tensor * Fix compile error for ckProfiler * Fix typo * [What] Fix naming [Why] Prepare to add out elementop * Add DoutElementOp Co-authored-by: Chao Liu <chao.liu2@amd.com> Co-authored-by: rocking <chunylai@amd.com>	2022-05-19 21:56:56 -05:00

8 Commits