composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-07 16:26:10 +00:00

Files

zjing14 602c4cc0d9 Optimizing fp8_fp16 mixedprec gemm (#1150 )

* add delayed cvt

* extend fp16 gemm_splitk instances for fp8_fp16 gemm

* add f8 example

* add 128 kperblk instances for fp8

* add kpb128 instance

* added more instances into kpb128

* clean code

* clean code

* fix

* fix

* fixed

* Update example/35_splitK_gemm/splitK_gemm_xdl_fp16_fp8.cpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer.hpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update library/src/tensor_operation_instance/gpu/gemm_splitk/device_gemm_xdl_splitk_f16_fp8_f16_mk_nk_mn_kpb128_instance.cpp

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

---------

Co-authored-by: Jing Zhang <jizha@amd.com>
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

2024-02-12 09:45:42 -08:00

reduction_functions_threadwise.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

threadwise_contraction_dl.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

threadwise_gemm_dlops_v3.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

threadwise_tensor_slice_set.hpp

update copyright headers (#726 )