zjing14
|
12865fbf28
|
Added Multi_ABD support into Gemm and GroupedGemmFixedNK (#978)
* added an example grouped_gemm_multi_abd
* fixed ci
* add setElementwiseOp
* changed API
* clean code: add multiA into example
* fixed v7r2 copy
* add transpose
* clean
* fixed vector_load check
* Update example/15_grouped_gemm/grouped_gemm_multi_abd_xdl_fixed_nk_bias_fp16.cpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
* Update example/15_grouped_gemm/grouped_gemm_multi_abd_xdl_fixed_nk_bias_fp16.cpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
* Update example/15_grouped_gemm/grouped_gemm_multi_abd_xdl_fixed_nk_bias_fp16.cpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_multiple_abd_xdl_cshuffle.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_multiple_abd_xdl_cshuffle.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd_fixed_nk.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd_fixed_nk.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
* add reduce
* testing
* add example_b16_i8
* refactor example
* clean
* add mpading
* disable reduce for kbatch = 1
* seperate reduce device op
* add reduce op
* add guard for workspace_size
* add instances
* format
* fixed
* add client example
* add a colmajor
* add instances
* Update cmake-ck-dev.sh
* Update profile_gemm_splitk.cpp
* Update gridwise_gemm_xdlops_v2r4r2.hpp
* format
* Update profile_gemm_splitk.cpp
* fixed
* fixed
* adjust test
* adjust precision loss
* adjust test
* fixed
* add bf16_i8 scale bias
* fixed scale
* fixed scale elementwise_op
* revert contraction deviceop changes
* fixed
* Add AddFastGelu
* Revert "Merge branch 'jizhan/gemm_splitk_reduce' into grouped_gemm_multi_abd_fixed_nk_example"
This reverts commit 3b5d001efd, reversing
changes made to 943199a991.
* add Scales into elementwise
* add gemm_multi_abd client example
* add client examples
* add rcr and crr
* add grouped gemm client example
* add grouped gemm client example
* add instance for rcr crr
* format
* fixed
* fixed cmake
* fixed
* fixed client_example
* format
* fixed contraction isSupport
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd_fixed_nk.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
* Update device_reduce_threadwise.hpp
* clean
* Fixes
* Fix example
---------
Co-authored-by: Jing Zhang <jizha@amd.com>
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
|
2024-04-15 21:09:45 -05:00 |
|