composable_kernel/test at e548a3f28026033a0bb19476d4aec2eebfcdb1d3 - composable_kernel - Public git mirror

ROCm/composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-30 11:47:48 +00:00

Files

History

jeongkim d121abc9f5 Fix grouped_conv_fwd_bias_bnorm_clamp tolerance for RDNA3 (gfx11)

This commit addresses the numerical tolerance issue for grouped convolution
forward with fused bias, batch normalization, and clamp operations on RDNA3
(gfx11) GPUs.

Problem:
- Test test_grouped_convnd_fwd_bias_bnorm_clamp was disabled for gfx11 due to
  high numerical error that exceeded normal FP16 tolerance
- The operation performs: clamp(scale * ((x + bias - mean) / sqrt(var + ε)) + shift)
  involving 7 type conversions and 6 arithmetic operations per element
- RDNA3 has different FP16 characteristics compared to CDNA architectures

Solution:
1. Re-enabled test for gfx11 in CMakeLists.txt
2. Added adaptive tolerance in profiler for gfx11 + FP16:
   - rtol = 5e-3 (0.5% relative error)
   - atol = 5.0 (0.24% absolute error for output range [0, 2048])
3. Other architectures and data types use default tolerance (rtol=1e-3, atol=1e-3)

Rationale:
- The relaxed tolerance accounts for accumulated error in complex fused operations
- Values are mathematically justified based on operation complexity and FP16 precision
- Architecture-specific to avoid affecting other GPU targets
- Similar tolerance adjustments exist for other fused operations (e.g., conv bwd weight)

Files changed:
- test/grouped_convnd_fwd_activation/CMakeLists.txt: Enable test for gfx11
- profiler/include/profiler/profile_grouped_conv_fwd_bias_bnorm_clamp_impl.hpp:
  Add adaptive tolerance logic

2026-01-20 19:43:48 +00:00

..

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

batched_gemm_b_scale

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

batched_gemm_gemm

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

batched_gemm_multi_d

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

batched_gemm_reduce

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

batched_gemm_softmax_gemm

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

batched_gemm_softmax_gemm_permute

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

block_swizzle_test

chore(copyright): update copyright header for test directory

2025-11-19 17:43:28 -07:00

block_to_ctile_map

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[CKTILE] Support A/B Quantization in Blockscale Grouped Gemm (#3452 )

2026-01-06 12:36:04 -08:00

Add grouped convnd dataset tests for bwd_data, bwd_weight and make them parallel (#3380 )

2025-12-15 13:38:25 +01:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

conv_tensor_rearrange

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

convnd_bwd_data

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[CK] Integrate GPU reference into ckProfiler for convolutions (#3379 )

2025-12-18 07:59:45 +01:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[CK_Builder] [testing] Integrate device random generators (#3427 )

2025-12-30 10:03:05 -08:00

elementwise_normalization

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

Wmma support for gemm_ab_scale (#3314 )

2025-12-11 09:06:20 +01:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

gemm_blockscale_wp

Wmma support for gemm_ab_scale (#3314 )

2025-12-11 09:06:20 +01:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

gemm_multiply_multiply_wp

Wmma support for gemm_multiply_multiply_wp (#3278 )

2025-12-03 07:38:23 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

gemm_universal_preshuffle

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

gemm_universal_reduce

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

gemm_universal_streamk

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[CK] Integrate GPU reference into ckProfiler for convolutions (#3379 )

2025-12-18 07:59:45 +01:00

grouped_convnd_bwd_data

Grouped convolution backward data WMMA v3 implementation (#3460 )

2025-12-30 16:25:08 +01:00

grouped_convnd_bwd_weight

Replace grouped conv bwd wei wmmaV3 bilin/scale bf16f32bf16 support with bf16bf16bf16 (#3470 )

2025-12-29 12:58:29 +01:00

grouped_convnd_fwd

Post-merge cleanup for WMMA grouped conv fwd (#3468 )

2025-12-22 15:57:45 +01:00

grouped_convnd_fwd_activation

Fix grouped_conv_fwd_bias_bnorm_clamp tolerance for RDNA3 (gfx11)

2026-01-20 19:43:48 +00:00

Add grouped gemm instances for RDNA4 (#3237 )

2025-12-01 15:32:10 -08:00

magic_number_division

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

normalization_bwd_data

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

normalization_bwd_gamma_beta

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

normalization_fwd

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

position_embedding

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

fix some minor error (#3409 )

2025-12-16 19:50:49 -08:00

reference_conv_fwd

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

space_filling_curve

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

Improve sequence sorting and add unit tests (#3376 )

2025-12-10 12:25:23 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

CMakeLists.txt

[CK_Builder] [testing] Integrate device random generators (#3427 )

2025-12-30 10:03:05 -08:00