zjing14
13c1e64daa
add gemm_bias_add example ( #1361 )
...
* add gemm_bias_add example
* changed strideD
* clang-format
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2024-07-11 18:08:07 -07:00
zjing14
12865fbf28
Added Multi_ABD support into Gemm and GroupedGemmFixedNK ( #978 )
...
* added an example grouped_gemm_multi_abd
* fixed ci
* add setElementwiseOp
* changed API
* clean code: add multiA into example
* fixed v7r2 copy
* add transpose
* clean
* fixed vector_load check
* Update example/15_grouped_gemm/grouped_gemm_multi_abd_xdl_fixed_nk_bias_fp16.cpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update example/15_grouped_gemm/grouped_gemm_multi_abd_xdl_fixed_nk_bias_fp16.cpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update example/15_grouped_gemm/grouped_gemm_multi_abd_xdl_fixed_nk_bias_fp16.cpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_multiple_abd_xdl_cshuffle.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_multiple_abd_xdl_cshuffle.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd_fixed_nk.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd_fixed_nk.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* add reduce
* testing
* add example_b16_i8
* refactor example
* clean
* add mpading
* disable reduce for kbatch = 1
* seperate reduce device op
* add reduce op
* add guard for workspace_size
* add instances
* format
* fixed
* add client example
* add a colmajor
* add instances
* Update cmake-ck-dev.sh
* Update profile_gemm_splitk.cpp
* Update gridwise_gemm_xdlops_v2r4r2.hpp
* format
* Update profile_gemm_splitk.cpp
* fixed
* fixed
* adjust test
* adjust precision loss
* adjust test
* fixed
* add bf16_i8 scale bias
* fixed scale
* fixed scale elementwise_op
* revert contraction deviceop changes
* fixed
* Add AddFastGelu
* Revert "Merge branch 'jizhan/gemm_splitk_reduce' into grouped_gemm_multi_abd_fixed_nk_example"
This reverts commit 3b5d001efd , reversing
changes made to 943199a991 .
* add Scales into elementwise
* add gemm_multi_abd client example
* add client examples
* add rcr and crr
* add grouped gemm client example
* add grouped gemm client example
* add instance for rcr crr
* format
* fixed
* fixed cmake
* fixed
* fixed client_example
* format
* fixed contraction isSupport
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd_fixed_nk.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update device_reduce_threadwise.hpp
* clean
* Fixes
* Fix example
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
2024-04-15 21:09:45 -05:00
Illia Silin
ae57e5938e
Split the instances by architecture. ( #1223 )
...
* parse examples inside the add_example_executable function
* fix the example 64 cmake file
* add xdl flag to the gemm_bias_softmax_gemm_permute example
* add filtering of tests based on architecture type
* enable test_grouped_gemm for gfx9 only
* enable test_transpose only for gfx9
* only linnk test_transpose if it gets built
* split the gemm instances by architectures
* split gemm_bilinear,grouped_conv_bwd_weight instances by targets
* split instances by architecture
* split grouped_conv instances by architecture
* fix clang format
* fix the if-else logic in group_conv headers
* small fix for grouped convolution instances
* fix the grouped conv bwd weight dl instances
* fix client examples
* only enable client examples 3 and 4 on gfx9
* set the gfx9 macro
* make sure the architecture macros are set by cmake
* use separate set of xdl/wmma flags for host code
* sinmplify the main cmake file
* add conv_fwd_bf8 instance declaration
2024-04-02 09:42:17 -07:00
Bartlomiej Wroblewski
16eb824c90
Add missing ComputeDatatype in contraction_multi_ABD_xdl_fp16 ( #1024 )
2023-11-03 08:22:11 -07:00
zjing14
1cc36ba5fb
Add contraction_multi_abd ( #972 )
...
* add gridwise_multi_abd
* move element_op into RunRead
* merge element_wise op with data read
* add multiABD example
* allow packed elementwise_op
* changed example
* clean
* clean
* add is_detected
* fix
* minor fix
* add scaleAdd_vec4 example
* init commit for contraction_multi_ABD
* add examples
* add examples of multiA and broadcast
* update example
* fixed comments
* Update cmake-ck-dev.sh
* Update cmake-ck-dev.sh
* Add comments into the example
* Update CMakeLists.txt
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-17 20:17:58 -05:00
Illia Silin
4daedf8ca5
Revert "Add support for mixed precision in contraction scale and bilinear" ( #967 )
...
* Revert "Add support for mixed precision in contraction scale and bilinear (#936 )"
This reverts commit f07485060e .
* revert commits #957 and #960
2023-10-05 14:58:23 -07:00
zjing14
aa46039f2d
Fixed contraction issues ( #960 )
...
* add missing ComputeType
* fixed
* Update cmake-ck-dev.sh
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-03 09:32:44 -05:00
zjing14
9d58c42103
Contraction multi abd ( #957 )
...
* add gridwise_multi_abd
* move element_op into RunRead
* merge element_wise op with data read
* add multiABD example
* allow packed elementwise_op
* changed example
* clean
* clean
* add is_detected
* fix
* minor fix
* add scaleAdd_vec4 example
* init commit for contraction_multi_ABD
* add examples
* add examples of multiA and broadcast
* update example
* fixed comments
* Update cmake-ck-dev.sh
* Update cmake-ck-dev.sh
* Add comments into the example
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-10-02 09:18:36 -05:00