ROCm/composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-10 16:28:38 +00:00

Files

History

Zoltán Lakatos 58e2ab1fc7 [rocm-libraries] ROCm/rocm-libraries#6761 (commit d19f6f1)

[CK] Large tensor gemm workaround (#6761)

## Motivation

Customer qeruested large tensor gemm support for 8bit and 4bit data
types. Currently CK triggers “This GEMM not supported” error. The root
cause appears to be the 2 GB limit on the input/output matrix, triggered
by buffer offset constraints when testing a larger shape such as M =
699,904 (which is an exact multiple of MPerBlock = 256).

## Technical Details

Quick workaround to have support ASAP. Split the tensors into inputs /
outputs smaller than 2GB limit. Iterate on host and call all subproblems
without device code change.
Support is restricted to rowise layout in A, Ds and E

All changes were implemented in DeviceGemm structures to avoid secondory
affect on grouped convolutions.

Got lots of AI generated comments. Addressed the ones that seemed
relevant on the functionality.

## Test Plan

Within CK the following examples can be used with modified input sizes:
example_gemm_multiply_multiply_xdl_fp8
example_gemm_mx_fp4
Tested with Aiter tuning on provided shapes.

## Test Result

All gemms run and provide correct results.

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Zoltán Lakatos <zoltan.lakatos@streamhpc.com>
Co-authored-by: Márton Bidlek <marton.bidlek@streamhpc.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

2026-05-27 18:55:15 +00:00

..

[rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24)

2026-05-22 02:43:50 +00:00

02_gemm_bilinear

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

03_gemm_bias_relu

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

04_gemm_add_add_fastgelu

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

10_convnd_fwd_multiple_d_multiple_reduce

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

11_convnd_fwd_bias

[DOCS] Documentation Addition (Readme updates) (#2495 )

2025-10-16 03:10:57 -07:00

[rocm-libraries] ROCm/rocm-libraries#5030 (commit 8e02a26)

2026-03-06 09:27:27 -07:00

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

14_gemm_quantization

[rocm-libraries] direct push (commit 7b18234)

2026-03-12 09:47:41 +01:00

15_grouped_gemm

[rocm-libraries] ROCm/rocm-libraries#7760 (commit a61bc76)

2026-05-27 06:56:58 -07:00

16_gemm_multi_d_multi_reduces

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

17_convnd_bwd_data

[CK] Integrate GPU reference into ckProfiler for convolutions (#3379 )

2025-12-18 07:59:45 +01:00

18_batched_gemm_reduce

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

19_binary_elementwise

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

20_grouped_conv_bwd_weight

[rocm-libraries] ROCm/rocm-libraries#5652 (commit 7dc7d1d)

2026-05-18 17:46:01 +02:00

21_gemm_layernorm

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

24_batched_gemm

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

25_gemm_bias_e_permute

Implement batched gemm bias permute for RDNA4 (#3534 )

2026-01-17 08:30:27 +01:00

[rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24)

2026-05-22 02:43:50 +00:00

27_layernorm2d_fwd

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

28_grouped_gemm_bias_e_permute

[rocm-libraries] ROCm/rocm-libraries#7760 (commit a61bc76)

2026-05-27 06:56:58 -07:00

29_batched_gemm_bias_e_permute

[rocm-libraries] ROCm/rocm-libraries#7760 (commit a61bc76)

2026-05-27 06:56:58 -07:00

30_grouped_conv_fwd_multiple_d

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

31_batched_gemm_gemm

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

32_batched_gemm_scale_softmax_gemm

[rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24)

2026-05-22 02:43:50 +00:00

33_multiple_reduce

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

36_sparse_embedding

[rocm-libraries] ROCm/rocm-libraries#7760 (commit a61bc76)

2026-05-27 06:56:58 -07:00

37_batched_gemm_add_add_relu_gemm_add

Implement batched gemm add relu gemm add for rdna4 (#3391 )

2026-01-20 13:06:59 -08:00

38_grouped_conv_bwd_data_multiple_d

[rocm-libraries] ROCm/rocm-libraries#7732 (commit b0e29d9)

2026-05-27 09:59:14 +03:00

[rocm-libraries] ROCm/rocm-libraries#7760 (commit a61bc76)

2026-05-27 06:56:58 -07:00

40_conv2d_fwd_quantization

[rocm-libraries] ROCm/rocm-libraries#7111 (commit 651947f)

2026-05-08 07:14:14 -07:00

41_grouped_conv_conv_fwd

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

42_groupnorm_fwd

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

43_splitk_gemm_bias_e_permute

[rocm-libraries] ROCm/rocm-libraries#7760 (commit a61bc76)

2026-05-27 06:56:58 -07:00

44_elementwise_permute

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

45_elementwise_normalization

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

46_gemm_add_multiply

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

47_gemm_bias_softmax_gemm_permute

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

49_maxpool2d_bwd

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

51_avgpool3d_bwd

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

52_im2col_col2im

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

53_layernorm2d_bwd

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

54_groupnorm_bwd

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

59_grouped_gemm_multi_ABD

[rocm-libraries] ROCm/rocm-libraries#4425 (commit 513cf9f)

2026-02-25 05:16:07 +00:00

60_gemm_multi_ABD

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

61_contraction_multi_ABD

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

62_convnd_activ

[rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24)

2026-05-22 02:43:50 +00:00

63_layernorm4d_fwd

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

64_fpAintB_gemm

[rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24)

2026-05-22 02:43:50 +00:00

65_gemm_multiply_multiply

[rocm-libraries] ROCm/rocm-libraries#6761 (commit d19f6f1)

2026-05-27 18:55:15 +00:00

66_complex_contraction_bilinear

[rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24)

2026-05-22 02:43:50 +00:00

67_gemm_microscaling

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

[CK][Examples] Fixing stride issues in ck examples 14/65/68/69 by workaround - Bypassing hostTensor validation

2026-01-15 16:43:02 +01:00

69_gemm_add_relu

[CK][Examples] Fixing stride issues in ck examples 14/65/68/69 by workaround - Bypassing hostTensor validation

2026-01-15 16:43:02 +01:00

[rocm-libraries] ROCm/rocm-libraries#7760 (commit a61bc76)

2026-05-27 06:56:58 -07:00

CMakeLists.txt

[rocm-libraries] direct push (commit 49b73ad)

2026-05-25 11:26:26 +02:00

README.md

…

README.md

Back to the main page

Composable Kernel examples