composable_kernel/example at 0056e0bf4b270d1eb78807f64f94f3a761048a90 - composable_kernel - Public git mirror

ROCm/composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-17 00:58:44 +00:00

Files

History

Andriy Roshchenko c3515f277c Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. (#1473 )

* Enable CMakePresets build

* Verify Convolution, Scaling and ReLU algorithms.

* Add tensor element-wise scale and type cast operation.

* Reduction implemented but does not work.

* Exploration of Reduction functionality.

* Completed example for Convolution scaled with ReLu activation and AMAX reduction.

* WIP: Add required instances for convolution.

* WIP: Create client example. Implement convolution stage.

* Add elementwise instances.

* Add elementwise scale + convert example.

* Add reduction instances.

* WIP: Client example for AMAX reduction.

* WIP: Add instances for multistage reduction.

* WIP: Implementation of multistage reduction.

* Refactoring.

* Clean up.

* Add CMakePresets.json

* Guard off FP8 instances when the data type is not available.

* Add example for Scaled FP8 Convolution with AMAX reduction.

* Refactor CombConvScaleRelu instances.

* Add CombConvScale instances.

* Add client example for Scaled FP8 Convolution with AMAX reduction.

* Cleanup.

2024-08-21 15:22:41 -07:00

..

Set RNE fp8 conversion as a default (#1458 )

2024-08-21 09:09:48 -07:00

02_gemm_bilinear

Fix issue with multiple targets and remove smfmac tests from unsupported test targets (#1372 )

2024-07-03 23:34:38 -07:00

03_gemm_bias_relu

Split the instances by architecture. (#1223 )

2024-04-02 09:42:17 -07:00

04_gemm_add_add_fastgelu

Merging the gfx12 code into public repo. (#1362 )

2024-06-27 00:33:34 -07:00

re-enable convnd_fwd_xdl_fp64 testing (#1289 )

2024-05-10 22:48:28 -07:00

10_convnd_fwd_multiple_d_multiple_reduce

Add Grouped Conv Fwd Large Tensor kernel (#1432 )

2024-08-06 10:06:10 +02:00

Support large: 12d tensor size for reduction kenrel (#1465 )

2024-08-13 16:15:47 +02:00

Refactoring cmake files to build data types separately. (#932 )

2023-09-20 22:15:56 -07:00

14_gemm_quantization

Split the instances by architecture. (#1223 )

2024-04-02 09:42:17 -07:00

15_grouped_gemm

Switch to universal gemm in grouped gemm tile loop (#1335 )

2024-06-18 09:01:49 -05:00

16_gemm_multi_d_multi_reduces

Split the instances by architecture. (#1223 )

2024-04-02 09:42:17 -07:00

17_convnd_bwd_data

Add Grouped Conv Fwd Large Tensor kernel (#1432 )

2024-08-06 10:06:10 +02:00

18_batched_gemm_reduce

Fixing most of the cppcheck errors. (#1142 )

2024-01-24 13:47:48 -08:00

19_binary_elementwise

Refactor elementwise kernels (#1222 )

2024-04-19 13:31:17 +02:00

20_grouped_conv_bwd_weight

[GEMM] gemm_universal related optimization (#1453 )

2024-08-14 10:42:30 +08:00

21_gemm_layernorm

Refactor elementwise kernels (#1222 )

2024-04-19 13:31:17 +02:00

Clean DTYPES conditions in CMake (#974 )

2023-10-18 11:14:14 -05:00

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

24_batched_gemm

Clean DTYPES conditions in CMake (#974 )

2023-10-18 11:14:14 -05:00

25_gemm_bias_e_permute

Refactoring cmake files to build data types separately. (#932 )

2023-09-20 22:15:56 -07:00

Code clean-up (#1285 )

2024-05-10 09:41:39 -07:00

27_layernorm2d_fwd

Split the static library into several files. (#1044 )

2023-11-28 11:17:37 -08:00

28_grouped_gemm_bias_e_permute

Refactoring cmake files to build data types separately. (#932 )

2023-09-20 22:15:56 -07:00

29_batched_gemm_bias_e_permute

Merging the gfx12 code into public repo. (#1362 )

2024-06-27 00:33:34 -07:00

30_grouped_conv_fwd_multiple_d

Fix issue with multiple targets and remove smfmac tests from unsupported test targets (#1372 )

2024-07-03 23:34:38 -07:00

31_batched_gemm_gemm

Split the instances by architecture. (#1223 )

2024-04-02 09:42:17 -07:00

32_batched_gemm_scale_softmax_gemm

Fix issue with multiple targets and remove smfmac tests from unsupported test targets (#1372 )

2024-07-03 23:34:38 -07:00

33_multiple_reduce

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

Refactor elementwise kernels (#1222 )

2024-04-19 13:31:17 +02:00

Universal gemm splitk using reduce (with multi-d) (#1341 )

2024-07-19 22:01:22 +08:00

36_sparse_embedding

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

37_batched_gemm_add_add_relu_gemm_add

Refactoring cmake files to build data types separately. (#932 )

2023-09-20 22:15:56 -07:00

38_grouped_conv_bwd_data_multiple_d

Fix issue with multiple targets and remove smfmac tests from unsupported test targets (#1372 )

2024-07-03 23:34:38 -07:00

Clean DTYPES conditions in CMake (#974 )

2023-10-18 11:14:14 -05:00

40_conv2d_fwd_quantization

Split the instances by architecture. (#1223 )

2024-04-02 09:42:17 -07:00

41_grouped_conv_conv_fwd

Split the instances by architecture. (#1223 )

2024-04-02 09:42:17 -07:00

42_groupnorm_fwd

Split the static library into several files. (#1044 )

2023-11-28 11:17:37 -08:00

43_splitk_gemm_bias_e_permute

Refactoring cmake files to build data types separately. (#932 )

2023-09-20 22:15:56 -07:00

44_elementwise_permute

Refactor elementwise kernels (#1222 )

2024-04-19 13:31:17 +02:00

45_elementwise_normalization

Layernorm and groupnorm support to save mean and inverse std in forward (#929 )

2023-10-19 07:36:29 +08:00

46_gemm_add_multiply

Code clean-up (#1285 )

2024-05-10 09:41:39 -07:00

47_gemm_bias_softmax_gemm_permute

Split the instances by architecture. (#1223 )

2024-04-02 09:42:17 -07:00

Fixing most of the cppcheck errors. (#1142 )

2024-01-24 13:47:48 -08:00

49_maxpool2d_bwd

Refactoring cmake files to build data types separately. (#932 )

2023-09-20 22:15:56 -07:00

Refactoring cmake files to build data types separately. (#932 )

2023-09-20 22:15:56 -07:00

51_avgpool3d_bwd

Fixing most of the cppcheck errors. (#1142 )

2024-01-24 13:47:48 -08:00

52_im2col_col2im

Split the instances by architecture. (#1223 )

2024-04-02 09:42:17 -07:00

53_layernorm2d_bwd

layernorm and groupnorm backward data (#1083 )

2023-12-19 04:23:11 +08:00

54_groupnorm_bwd

layernorm and groupnorm backward data (#1083 )

2023-12-19 04:23:11 +08:00

59_grouped_gemm_multi_ABD

Fix example CMakeLists.txt (#1267 )

2024-04-30 08:28:19 -07:00

60_gemm_multi_ABD

bf16A_Int8B with fastgelu/bias (#1264 )

2024-04-26 07:26:30 -05:00

61_contraction_multi_ABD

add gemm_bias_add example (#1361 )

2024-07-11 18:08:07 -07:00

62_convnd_activ

Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. (#1473 )

2024-08-21 15:22:41 -07:00

63_layernorm4d_fwd

Split the static library into several files. (#1044 )

2023-11-28 11:17:37 -08:00

64_fpAintB_gemm

Split the instances by architecture. (#1223 )

2024-04-02 09:42:17 -07:00

65_gemm_multiply_multiply

[GEMM] gemm_universal related optimization (#1453 )

2024-08-14 10:42:30 +08:00

[CK_TILE] FA bwd kernels optimization (#1397 )

2024-08-16 13:40:10 -07:00

CMakeLists.txt

[GEMM] gemm_universal related optimization (#1453 )

2024-08-14 10:42:30 +08:00