composable_kernel/test at 6a6177a246d6c81932fbb1061ad6a62e90b788a1 - composable_kernel - Public git mirror

ROCm/composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-04-19 22:39:03 +00:00

Files

History

Erwin Terpstra 6a6177a246 [CK_Tile] Support for a4w4 (fp4) in block scale gemm AB quant (#3603 )

* chore: split block scale example instances in more separate files to speed up compile times

* wip: fp4 scaffolding for abquant

* feat: add fp4 decoding-while-loading to abquant pipeline

* feat: add support for fp4 CPU verification in abquant

* chore: add time tracking to reference calculation

* feat: add a4w4 test for blockscale gemm

* feat: optimize reference calculation by preconverting values to AccType

* feat: add fp4 to fp8 look-up table

* fix: reference to wrong ComputeDataType field in QuantProblem

* feat: type utilities for determining MFMA compute types

* feat: packed fp4 for abquant weight preshuffle

* feat: add separate tests for a4w4 base case, padding and preshuffleB

* fix: fp4 conversion on gfx950 attempting to use non-supported method

* fix: test case was using quant group sizes which don't work on gfx950 due to larger mfma tile size

* chore: add fp4 preshuffleb mode to block scale example

* chore: sanity check for packed types being 1 byte

* chore: clarify tensor dimension indices with constants

* chore: replace traits check with specialized check for packed types

* style: some minor refactoring and cleanup

* fix: correct conversion table for FNUZ fp8

* chore: add fp4 instances to main abquant instances again

* chore: use same initialization branch for int4 and fp4

* chore: add missing initialization for fp4 in block scale gemm example

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

2026-01-30 04:40:50 -07:00

..

batched_contraction

Implement batched gemm bias permute for RDNA4 (#3534 )

2026-01-17 08:30:27 +01:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

batched_gemm_b_scale

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

batched_gemm_gemm

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

batched_gemm_multi_d

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

batched_gemm_multiple_d_gemm_multiple_d

Implement batched gemm add relu gemm add for rdna4 (#3391 )

2026-01-20 13:06:59 -08:00

batched_gemm_reduce

WMMA support for batched_gemm_reduce (#3332 )

2026-01-20 10:50:46 +01:00

batched_gemm_softmax_gemm

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

batched_gemm_softmax_gemm_permute

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

block_swizzle_test

chore(copyright): update copyright header for test directory

2025-11-19 17:43:28 -07:00

block_to_ctile_map

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[CK_Tile] Support for a4w4 (fp4) in block scale gemm AB quant (#3603 )

2026-01-30 04:40:50 -07:00

Add grouped convnd dataset tests for bwd_data, bwd_weight and make them parallel (#3380 )

2025-12-15 13:38:25 +01:00

Add support to fp16 + compute fp16 and bf16 + compute bf16 contractions (#3598 )

2026-01-20 09:39:57 -08:00

conv_tensor_rearrange

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

convnd_bwd_data

[CK tests] Extend conv GPU reference (#3539 )

2026-01-27 09:49:42 +01:00

[CK tests] Extend conv GPU reference (#3539 )

2026-01-27 09:49:42 +01:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[CK_BUILDER] Integrate CKB validation with CK verification (#3649 )

2026-01-28 17:41:02 +01:00

elementwise_normalization

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

Wmma support for gemm_ab_scale (#3314 )

2025-12-11 09:06:20 +01:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

gemm_bias_add_reduce

Wmma support for gemm_bias_add_reduce (#3316 )

2026-01-07 10:27:16 -08:00

gemm_blockscale_wp

Wmma support for gemm_ab_scale (#3314 )

2025-12-11 09:06:20 +01:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

gemm_multiply_multiply_wp

Wmma support for gemm_multiply_multiply_wp (#3278 )

2025-12-03 07:38:23 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

Padding support for wave transfer (#3537 )

2026-01-26 12:57:09 -08:00

gemm_universal_preshuffle

Implement device_gemm_universal_preshuffle_instance for RDNA4 (#3429 )

2026-01-15 07:19:31 -08:00

gemm_universal_reduce

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

gemm_universal_streamk

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[CK tests] Extend conv GPU reference (#3539 )

2026-01-27 09:49:42 +01:00

gpu_verification

[CK_BUILDER] Integrate CKB validation with CK verification (#3649 )

2026-01-28 17:41:02 +01:00

grouped_convnd_bwd_data

[CK tests] Extend conv GPU reference (#3539 )

2026-01-27 09:49:42 +01:00

grouped_convnd_bwd_weight

[Conv] Enable bwd weight splitk autodeduction with cap (#3656 )

2026-01-29 17:40:28 +00:00

grouped_convnd_fwd

[CK TILE] Enable CK TILE Conv Fwd tests in CI and fix check_err (#3624 )

2026-01-27 11:04:11 +02:00

grouped_convnd_fwd_activation

[CK tests] Extend conv GPU reference (#3539 )

2026-01-27 09:49:42 +01:00

Implement grouped gemm tile loop for RDNA4 (#3304 )

2026-01-13 07:14:23 +01:00

grouped_gemm_tile_loop

Implement grouped gemm tile loop for RDNA4 (#3304 )

2026-01-13 07:14:23 +01:00

magic_number_division

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

normalization_bwd_data

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

normalization_bwd_gamma_beta

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

normalization_fwd

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

position_embedding

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

fix some minor error (#3409 )

2025-12-16 19:50:49 -08:00

reference_conv_fwd

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

space_filling_curve

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

Optimize sequence metaprogramming utilities to reduce template instantiation depth (#3585 )

2026-01-26 10:08:55 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

CMakeLists.txt

Grouped conv_fwd_bias_bnorm_clamp instances and tests (#3525 )

2026-01-22 09:53:59 +01:00