linqunAMD
07def6b13d
Extend XDL kernel to Support RDNA3/4 - Part 4 ( #2724 )
...
* Fix example
* fix build error
* update pk_i4 & moe test case
* fix all instance build (examples)
* fix batched_gemm_gemm (example)
* disable example_gemm_bias_softmax_gemm_permute on gfx11
* remove unnecessary disable gfx11
* update tests
* update tests2
[ROCm/composable_kernel commit: 321627aec5 ]
2025-09-12 08:17:07 -07:00
Tianyuan Wu
abb90422b4
[CK_TILE] CK_TILE GEMM WMMA Support for GFX11/GFX12 ( #2466 )
...
* WMMA GEMM F16 Implementation
Signed-off-by: root <tianyuwu@amd.com >
* Self-review
Signed-off-by: root <tianyuwu@amd.com >
* ASIC check minor tweak
Signed-off-by: root <tianyuwu@amd.com >
* add missing include file
* Set GPU_TARGETS to gfx11/12 generic
Signed-off-by: root <tianyuwu@amd.com >
* INT8 GFX12
Signed-off-by: root <tianyuwu@amd.com >
* add int8x16 branch
* Fix CI script
Signed-off-by: root <tianyuwu@amd.com >
* Fix typo
Signed-off-by: root <tianyuwu@amd.com >
* Add CK_Tile WMMA example
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com >
* Fix CI
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com >
* fix clang format
* Set M/N_Warp Back to Constant
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com >
* Use GemmConfigComputeV3 by default
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Enable CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT for gfx12
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Remove CK_Tile wmma gemm examples from the CI list
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Add atomic add fallback method for gfx11
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Fix typo
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Omit copyright year
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Support non-square cases
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Fix CI
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Add get_device_ip()
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Revert "Add atomic add fallback method for gfx11"
This reverts commit 4f664969c01b37976c8518c19833d9f1574cd746.
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
* Revert "Enable CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT for gfx12"
This reverts commit 949129a3858a825b2a2c4d3ec01663df18a165a5.
* Revise method name and typos
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
* clang-format
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Try fix CI
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Revert "Try fix CI"
This reverts commit 084c683227e64ab6a8137db00c8165fb05bdc902.
* clang-format
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Fix typo caused by merge
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
* Fix typo caused by merging
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
---------
Signed-off-by: root <tianyuwu@amd.com >
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com >
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
Co-authored-by: joye <joye@amd.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: 68134b60e4 ]
2025-08-15 16:22:27 -07:00
Illia Silin
24f228df3b
upgrade from clang-format-12 to clang-format-18 ( #2568 )
...
* upgrade to clang-format-18
* update to clang-format-18 in pre-commit-config
[ROCm/composable_kernel commit: 504b101da3 ]
2025-07-28 11:34:07 -07:00
zjing14
fcb0e225db
add gemm_bias_add example ( #1361 )
...
* add gemm_bias_add example
* changed strideD
* clang-format
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
[ROCm/composable_kernel commit: 13c1e64daa ]
2024-07-11 18:08:07 -07:00
zjing14
4ddb546fe5
Added Multi_ABD support into Gemm and GroupedGemmFixedNK ( #978 )
...
* added an example grouped_gemm_multi_abd
* fixed ci
* add setElementwiseOp
* changed API
* clean code: add multiA into example
* fixed v7r2 copy
* add transpose
* clean
* fixed vector_load check
* Update example/15_grouped_gemm/grouped_gemm_multi_abd_xdl_fixed_nk_bias_fp16.cpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update example/15_grouped_gemm/grouped_gemm_multi_abd_xdl_fixed_nk_bias_fp16.cpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update example/15_grouped_gemm/grouped_gemm_multi_abd_xdl_fixed_nk_bias_fp16.cpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_multiple_abd_xdl_cshuffle.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_multiple_abd_xdl_cshuffle.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd_fixed_nk.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd_fixed_nk.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* add reduce
* testing
* add example_b16_i8
* refactor example
* clean
* add mpading
* disable reduce for kbatch = 1
* seperate reduce device op
* add reduce op
* add guard for workspace_size
* add instances
* format
* fixed
* add client example
* add a colmajor
* add instances
* Update cmake-ck-dev.sh
* Update profile_gemm_splitk.cpp
* Update gridwise_gemm_xdlops_v2r4r2.hpp
* format
* Update profile_gemm_splitk.cpp
* fixed
* fixed
* adjust test
* adjust precision loss
* adjust test
* fixed
* add bf16_i8 scale bias
* fixed scale
* fixed scale elementwise_op
* revert contraction deviceop changes
* fixed
* Add AddFastGelu
* Revert "Merge branch 'jizhan/gemm_splitk_reduce' into grouped_gemm_multi_abd_fixed_nk_example"
This reverts commit 3b5d001efd , reversing
changes made to 943199a991 .
* add Scales into elementwise
* add gemm_multi_abd client example
* add client examples
* add rcr and crr
* add grouped gemm client example
* add grouped gemm client example
* add instance for rcr crr
* format
* fixed
* fixed cmake
* fixed
* fixed client_example
* format
* fixed contraction isSupport
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd_fixed_nk.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update device_reduce_threadwise.hpp
* clean
* Fixes
* Fix example
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
[ROCm/composable_kernel commit: 12865fbf28 ]
2024-04-15 21:09:45 -05:00
Illia Silin
1f4d13b2b5
Split the instances by architecture. ( #1223 )
...
* parse examples inside the add_example_executable function
* fix the example 64 cmake file
* add xdl flag to the gemm_bias_softmax_gemm_permute example
* add filtering of tests based on architecture type
* enable test_grouped_gemm for gfx9 only
* enable test_transpose only for gfx9
* only linnk test_transpose if it gets built
* split the gemm instances by architectures
* split gemm_bilinear,grouped_conv_bwd_weight instances by targets
* split instances by architecture
* split grouped_conv instances by architecture
* fix clang format
* fix the if-else logic in group_conv headers
* small fix for grouped convolution instances
* fix the grouped conv bwd weight dl instances
* fix client examples
* only enable client examples 3 and 4 on gfx9
* set the gfx9 macro
* make sure the architecture macros are set by cmake
* use separate set of xdl/wmma flags for host code
* sinmplify the main cmake file
* add conv_fwd_bf8 instance declaration
[ROCm/composable_kernel commit: ae57e5938e ]
2024-04-02 09:42:17 -07:00
Bartlomiej Wroblewski
787296620c
Add missing ComputeDatatype in contraction_multi_ABD_xdl_fp16 ( #1024 )
...
[ROCm/composable_kernel commit: 16eb824c90 ]
2023-11-03 08:22:11 -07:00
zjing14
70f866ad92
Add contraction_multi_abd ( #972 )
...
* add gridwise_multi_abd
* move element_op into RunRead
* merge element_wise op with data read
* add multiABD example
* allow packed elementwise_op
* changed example
* clean
* clean
* add is_detected
* fix
* minor fix
* add scaleAdd_vec4 example
* init commit for contraction_multi_ABD
* add examples
* add examples of multiA and broadcast
* update example
* fixed comments
* Update cmake-ck-dev.sh
* Update cmake-ck-dev.sh
* Add comments into the example
* Update CMakeLists.txt
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: 1cc36ba5fb ]
2023-10-17 20:17:58 -05:00
Illia Silin
876e9540e4
Revert "Add support for mixed precision in contraction scale and bilinear" ( #967 )
...
* Revert "Add support for mixed precision in contraction scale and bilinear (#936 )"
This reverts commit ba712aee9a .
* revert commits #957 and #960
[ROCm/composable_kernel commit: 4daedf8ca5 ]
2023-10-05 14:58:23 -07:00
zjing14
c8e83bab45
Fixed contraction issues ( #960 )
...
* add missing ComputeType
* fixed
* Update cmake-ck-dev.sh
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: aa46039f2d ]
2023-10-03 09:32:44 -05:00
zjing14
1939f1820f
Contraction multi abd ( #957 )
...
* add gridwise_multi_abd
* move element_op into RunRead
* merge element_wise op with data read
* add multiABD example
* allow packed elementwise_op
* changed example
* clean
* clean
* add is_detected
* fix
* minor fix
* add scaleAdd_vec4 example
* init commit for contraction_multi_ABD
* add examples
* add examples of multiA and broadcast
* update example
* fixed comments
* Update cmake-ck-dev.sh
* Update cmake-ck-dev.sh
* Add comments into the example
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
[ROCm/composable_kernel commit: 9d58c42103 ]
2023-10-02 09:18:36 -05:00