composable_kernel/include/ck/tensor_operation/gpu/thread at d4a8c6c2eddd53fa48e46f0b35db256a8c1297d4 - composable_kernel - Public git mirror

ROCm/composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-17 09:08:35 +00:00

Files

History

Mingtao Gu d4a8c6c2ed Implement the fp16xint4 scale weight only kernel for Ali (#1786 )

* enable int4 scale (weight only) kernel

* format some files

* Add unit test for int4 weight only

* fixed and formatted code

* fixed

* formated

* formated

* fixed

* fixed a bug in the ckProfiler, and formatted the code

---------

Co-authored-by: mtgu0705 <mtgu@amd.com>

[ROCm/composable_kernel commit: 4f62f6e9b7]

2025-01-03 18:35:21 +08:00

..

reduction_functions_threadwise.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

threadwise_contraction_dl.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

threadwise_gemm_dlops_v3.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

threadwise_tensor_slice_set.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

threadwise_tensor_slice_transfer_util.hpp

Added Multi_ABD support into Gemm and GroupedGemmFixedNK (#978 )

2024-04-15 21:09:45 -05:00

threadwise_tensor_slice_transfer_v3r1_dequant.hpp

Navi3 rel (#1176 )

2024-03-08 17:11:51 -08:00

threadwise_tensor_slice_transfer_v3r1.hpp

Jing's contribution: prototype of mixed precision gemm FP16/BF16xint4 GEMM (#1762 )

2025-01-02 11:48:06 +08:00

threadwise_tensor_slice_transfer_v3r2.hpp

Add elementwise with dynamic vector dim (#1198 )

2024-03-22 10:40:43 +01:00

threadwise_tensor_slice_transfer_v4r1.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

threadwise_tensor_slice_transfer_v5r1.hpp

Moficiation to fix this issue "threadwise_tensor_slice_transfer_v5r1 issue #1279 " (#1492 )

2024-09-04 21:52:55 -07:00

threadwise_tensor_slice_transfer_v6r1.hpp

add an example of customized type convert - bfp16_rtn (#869 )

2023-08-29 12:31:24 -05:00

threadwise_tensor_slice_transfer_v6r1r2.hpp

initial stream-k implementation with example (#699 )

2023-07-26 14:18:15 -05:00

threadwise_tensor_slice_transfer_v6r2.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

threadwise_tensor_slice_transfer_v6r3.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

threadwise_tensor_slice_transfer_v7.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

threadwise_tensor_slice_transfer_v7r2.hpp

bf16A_Int8B with fastgelu/bias (#1264 )

2024-04-26 07:26:30 -05:00

threadwise_tensor_slice_transfer_v7r3.hpp

add f8 gemm multiD with both row/col wise scale (#1300 )

2024-05-28 12:04:22 -05:00

threadwise_tensor_slice_transfer.hpp

Implement the fp16xint4 scale weight only kernel for Ali (#1786 )

2025-01-03 18:35:21 +08:00

threadwise_welford.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00