mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-15 02:27:57 +00:00
* [What] Rename the example
[Why] Prepare to add unary reduction
* Add global oparation to the parameter
* Add atomicmax
* Fix compile error
* Support atomicMax (hip library)
* Rename the reduction example
* Fix target name
* use p_d1_grid as the indicator directly
* Prevent performance issue. Let passthrough handle it.
* Implement the function template the specialize the float2
* No need to separate into two lines
* Remove empty line
* add comment
* Fix compile error due to merge from develop
* make the implementation of atomic_max / atomic_add explicit for each datatype
* Refine typo
* For future CI test
* Fix compiler error in ckProfiler
* Merge commit 'de2769e3a6695b38a20529261273ddc5cdaab2fe'
* simply use remove_pointer
* Rename type and var
* Refine example
* Modify reducemax example
* Fix bug in reduction
* Change initialize range
* Implement F64 version of atomicMax
* Move reduction code together
* Add buffer atomic_max
* Fix coding style by clang-format
* Integrate new api of DeviceGemmReduce_Xdl_CShuffle
* Integrate Batch gemm reduction
* Fix example
* fix example
* clean up
* Fix batch gemm tensor operation
* Fix coding style
* Fix template augument
* Fix clang format
* Keep flexible of different stride for each D tensor
* Fix compile error for ckProfiler
* Fix typo
* [What] Fix naming
[Why] Prepare to add out elementop
* Add DoutElementOp
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: rocking <chunylai@amd.com>
[ROCm/composable_kernel commit: 0ffe956ab1]
50 lines
1.2 KiB
C++
50 lines
1.2 KiB
C++
#pragma once
|
|
#include "config.hpp"
|
|
#include "array.hpp"
|
|
#include "container_helper.hpp"
|
|
#include "statically_indexed_array.hpp"
|
|
#include "container_element_picker.hpp"
|
|
#include "multi_index.hpp"
|
|
#include "data_type.hpp"
|
|
#include "data_type_enum.hpp"
|
|
#include "data_type_enum_helper.hpp"
|
|
#include "functional.hpp"
|
|
#include "functional2.hpp"
|
|
#include "functional3.hpp"
|
|
#include "functional4.hpp"
|
|
#include "enable_if.hpp"
|
|
#include "ignore.hpp"
|
|
#include "integral_constant.hpp"
|
|
#include "math.hpp"
|
|
#include "number.hpp"
|
|
#include "sequence.hpp"
|
|
#include "sequence_helper.hpp"
|
|
#include "tuple.hpp"
|
|
#include "tuple_helper.hpp"
|
|
#include "type.hpp"
|
|
#include "magic_division.hpp"
|
|
#include "c_style_pointer_cast.hpp"
|
|
#include "is_known_at_compile_time.hpp"
|
|
#include "transpose_vectors.hpp"
|
|
#include "inner_product.hpp"
|
|
#include "element_wise_operation.hpp"
|
|
#include "thread_group.hpp"
|
|
#include "debug.hpp"
|
|
|
|
#include "amd_buffer_addressing.hpp"
|
|
#include "generic_memory_space_atomic.hpp"
|
|
#include "get_id.hpp"
|
|
#include "synchronization.hpp"
|
|
#include "amd_address_space.hpp"
|
|
#include "static_buffer.hpp"
|
|
#include "dynamic_buffer.hpp"
|
|
|
|
// TODO: remove this
|
|
#if CK_USE_AMD_INLINE_ASM
|
|
#include "amd_inline_asm.hpp"
|
|
#endif
|
|
|
|
#ifdef CK_USE_AMD_MFMA
|
|
#include "amd_xdlops.hpp"
|
|
#endif
|