mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-07 16:26:10 +00:00

Files

Adam Osewski 1d8e4ec2ce Jing's contribution: prototype of mixed precision gemm FP16/BF16xint4 GEMM (#1762 )

* add a prototype of int4

* clean

* debug

* clean

* clean

* move packed into dynamic_buffer

* fixed coord reset

* add fast pki4 to half conversion

* fix

* fixed reference and host_tensor

* fixed tensor init

* format

* debug i4_to_f16_convert

* format

* fixed splitk

* weight permute

* add b tile permute

* clean

* weight permute with splitki

* format

* improve weight layout

* add and_or_b32

* fixed splitk crush

* add permute switch as a template

* recover v3r1

* clean

* failure with intrawave v2

* fixed

* fixed

* add ckProfiler

* add bfp16 support

* add bf16 example

* fixed int4 to bhalf_t conversion

* format

* fixed int4 to bf16 conversion

* clean

* add instances for mem

* clean

* fixed host tensor size

* fixed

* debug

* fixed

* add pk_i4_t as a struct

* fix

* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* revert

* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* fixed comments

* revert

* clean

* revert

* revert

* fixed

* Update CMakeLists.txt

* Update script/cmake-ck-dev.sh

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update include/ck/tensor_operation/gpu/element/unary_element_wise_operation.hpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update CMakeLists.txt

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* fixed

* fixed

* fixed

* revert

* revert

* add comments

* format

* fixed assert

* fixed

* Fix I4 define in ckProfiler

* Fixed example_gemm_xdl_bf16_pk_i4_v3 test failed issue

---------

Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: mtgu0705 <mtgu@amd.com>

2025-01-02 11:48:06 +08:00

CMakeLists.txt

Jing's contribution: prototype of mixed precision gemm FP16/BF16xint4 GEMM (#1762 )

2025-01-02 11:48:06 +08:00

common.hpp

Jing's contribution: prototype of mixed precision gemm FP16/BF16xint4 GEMM (#1762 )

2025-01-02 11:48:06 +08:00

gemm_dl_fp16.cpp

Add a gpu gemm reference kernel (#1528 )

2024-10-08 11:05:28 -05:00

gemm_dl_fp32.cpp

Add a gpu gemm reference kernel (#1528 )

2024-10-08 11:05:28 -05:00

gemm_dl_int4.cpp

Fixing most of the cppcheck errors. (#1142 )

2024-01-24 13:47:48 -08:00

gemm_dl_int8.cpp

Add a gpu gemm reference kernel (#1528 )

2024-10-08 11:05:28 -05:00

gemm_dpp_fp16.cpp

Add a gpu gemm reference kernel (#1528 )

2024-10-08 11:05:28 -05:00

gemm_wmma_bf16.cpp

Add bf16 and int8 wmma gemms for Navi3x and Navi4x. (#1671 )

2024-11-18 14:07:04 -08:00

gemm_wmma_fp16.cpp

Add a gpu gemm reference kernel (#1528 )

2024-10-08 11:05:28 -05:00

gemm_wmma_int8.cpp

Add bf16 and int8 wmma gemms for Navi3x and Navi4x. (#1671 )

2024-11-18 14:07:04 -08:00

gemm_xdl_bf16_pk_i4_v3.cpp

Jing's contribution: prototype of mixed precision gemm FP16/BF16xint4 GEMM (#1762 )

2025-01-02 11:48:06 +08:00

gemm_xdl_bf16_rtn.cpp

Add a gpu gemm reference kernel (#1528 )

2024-10-08 11:05:28 -05:00

gemm_xdl_bf16_v3.cpp

[GEMM] UniversalGemm update (#1262 )

2024-04-26 12:56:07 -05:00

gemm_xdl_bf16.cpp

Add a gpu gemm reference kernel (#1528 )

2024-10-08 11:05:28 -05:00

gemm_xdl_fp8_bf8.cpp

Add a gpu gemm reference kernel (#1528 )

2024-10-08 11:05:28 -05:00

gemm_xdl_fp8_streamk_v3.cpp

universal streamk fp8 changes (#1665 )

2024-11-21 08:21:37 -08:00

gemm_xdl_fp8_v3.cpp

[GEMM] F8 GEMM, performance optimized. (#1384 )

2024-07-19 22:06:52 +08:00

gemm_xdl_fp8.cpp

Add a gpu gemm reference kernel (#1528 )

2024-10-08 11:05:28 -05:00

gemm_xdl_fp16_fp8_v3.cpp

Jing's contribution: prototype of mixed precision gemm FP16/BF16xint4 GEMM (#1762 )

2025-01-02 11:48:06 +08:00

gemm_xdl_fp16_fp8.cpp

Add a gpu gemm reference kernel (#1528 )

2024-10-08 11:05:28 -05:00

gemm_xdl_fp16_pk_i4_v3.cpp

Jing's contribution: prototype of mixed precision gemm FP16/BF16xint4 GEMM (#1762 )

2025-01-02 11:48:06 +08:00

gemm_xdl_fp16_streamk_v3.cpp

universal streamk fp8 changes (#1665 )

2024-11-21 08:21:37 -08:00

gemm_xdl_fp16_v2.cpp

Add a gpu gemm reference kernel (#1528 )

2024-10-08 11:05:28 -05:00

gemm_xdl_fp16_v3.cpp

Jing's contribution: prototype of mixed precision gemm FP16/BF16xint4 GEMM (#1762 )

2025-01-02 11:48:06 +08:00

gemm_xdl_fp16.cpp

Add a gpu gemm reference kernel (#1528 )

2024-10-08 11:05:28 -05:00

gemm_xdl_fp64.cpp

Add a gpu gemm reference kernel (#1528 )

2024-10-08 11:05:28 -05:00

gemm_xdl_int4.cpp

Fixing most of the cppcheck errors. (#1142 )

2024-01-24 13:47:48 -08:00

gemm_xdl_int8.cpp

Add a gpu gemm reference kernel (#1528 )

2024-10-08 11:05:28 -05:00

gemm_xdl_lds_direct_load_fp16.cpp

Add a gpu gemm reference kernel (#1528 )

2024-10-08 11:05:28 -05:00

gemm_xdl_lds_direct_load_fp32.cpp

Add a gpu gemm reference kernel (#1528 )

2024-10-08 11:05:28 -05:00

gemm_xdl_skip_b_lds_fp16.cpp

Disable XDL kernels on unsupported HW Add ck::is_xdl_supported (#768 )

2023-07-26 07:19:55 -07:00

gemm_xdl_streamk.cpp

Add a gpu gemm reference kernel (#1528 )

2024-10-08 11:05:28 -05:00

gemm_xdl_wavelet_fp16.cpp

Add a gpu gemm reference kernel (#1528 )

2024-10-08 11:05:28 -05:00

README.md

Universal streamk with atomics (#1360 )

2024-07-05 21:40:30 -07:00

run_gemm_example_streamk_v2.inc

Jing's contribution: prototype of mixed precision gemm FP16/BF16xint4 GEMM (#1762 )

2025-01-02 11:48:06 +08:00

run_gemm_example_v2.inc

Jing's contribution: prototype of mixed precision gemm FP16/BF16xint4 GEMM (#1762 )

2025-01-02 11:48:06 +08:00

run_gemm_example.inc

Jing's contribution: prototype of mixed precision gemm FP16/BF16xint4 GEMM (#1762 )

2025-01-02 11:48:06 +08:00

README.md

Instructions for `example_gemm_xdl`

Run `example_gemm_xdl`

#arg1: verification (0=no, 1=yes)
#arg2: initialization (0=no init, 1=integer value, 2=decimal value)
#arg3: run kernel # of times (>1)
./bin/example_gemm_xdl 0 1 5

Instructions for `example_gemm_xdl_fp16_streamk_v3`

Run `example_gemm_xdl_fp16_streamk_v3`

arg1: verification (0=no, 1=yes)
arg2: initialization (0=no init, 1=integer value, 2=decimal value)
arg3: time kernel (0=no, 1=yes)
arg4 to 9: M (256x), N(128x), K(32x), StrideA, StrideB, StrideC
arg10: stream-k select (-1: default config, 0: all DP, 1: 1-tile SK, 2: 2-tile SK)
arg11: Grid_size(-1 for max occupancy)
bin/example_gemm_xdl_fp16_streamk_v3 1 2 1 3840 4096 4096 4096 4096 4096 1 -1
a_m_k: dim 2, lengths {3840, 4096}, strides {4096, 1}
b_k_n: dim 2, lengths {4096, 4096}, strides {4096, 1}
c_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
problem {M:3840, N:4096, K:4096, SA:4096, SB:4096, SC:4096, MP:4032, NP:4096, KRead:4096, KP:4096, AK0:512, BK0:2048, MBlock: 18, NBlock: 16, Stream-K Selection:1, Grid size:-1}
Perf: 0.292022 ms, 441.23 TFlops, 330.348 GB/s, DeviceGemmXdlUniversal<MNPadding, RRR> BlkSize: 256, BlkTile: 224x256x64, WaveTile: 16x16, WaveMap: 7x8, VmemReadVec: 8x8, BlkGemmPipelineScheduler: Intrawave, BlkGemmPipelineVersion: v3, BlkGemmPipelinePrefetchStages: 2

README.md

Instructions for example_gemm_xdl

Run example_gemm_xdl

Instructions for example_gemm_xdl_fp16_streamk_v3

Run example_gemm_xdl_fp16_streamk_v3

Instructions for `example_gemm_xdl`

Run `example_gemm_xdl`

Instructions for `example_gemm_xdl_fp16_streamk_v3`

Run `example_gemm_xdl_fp16_streamk_v3`