Commit Graph

4 Commits

Author SHA1 Message Date
Michal Kulikowski
9da8a056df [CK][Examples] Extending support for rdna3/4 part 3:
-example_gemm_xdl_int8
-example_gemm_xdl_fp8
-example_gemm_xdl_fp8_bf8
-example_gemm_xdl_fp16_fp8
-example_gemm_add_add_fastgelu_xdl_int8
-example_grouped_gemm_xdl_int8
-example_grouped_conv_bwd_weight_xdl_bf16
-example_cgemm_xdl_fp32
-example_cgemm_xdl_int8

fixing cmdlines for:
-example_22_cgemm
-example_24_batched_gemm
-example_batched_gemm_xdl_fp16int4_b_scale_v3

Signed-off-by: Michal Kulikowski <Michal.Kulikowski@amd.com>


[ROCm/composable_kernel commit: 2444c44895]
2025-10-08 18:14:38 +02:00
Michal Kulikowski
f85778eab4 [CK][Examples] Extending support for rdna3/4 part 2:
-example_batched_gemm_xdl_int8
-example_batched_gemm_xdl_fp8_rowwise_v3
-example_batched_gemm_xdl_fp32
-example_batched_gemm_xdl_bf16
-example_batched_gemm_xdl_bf16_v3
-example_batched_gemm_xdl_fp16
-example_splitk_gemm_bias_e_permute_xdl_fp32
*fixing return value to return 0 as success in above examples.

Fixing cmdline parameters in:
-example_sparse_embedding3_forward_layernorm
-example_elementwise_binary_4D_fp16
-elementwise_scale_permute_amax_2D_fp16_fp8

Signed-off-by: Michal Kulikowski <Michal.Kulikowski@amd.com>


[ROCm/composable_kernel commit: 7259b9c4db]
2025-10-08 18:14:38 +02:00
Adam Osewski
8a8f8521f9 Refactor device op implementations into impl subdirectory. (#420)
* Move kernel implementation files under impl directory.

* Update examples paths.

* Update device kernel impl include paths.

* Update tensor operation instances include paths.

* Update profiler and tests include paths.

* Clang-format

* Update include paths for batched gemm reduce

* Refactor UnitTest ConvNDBwdWeight.

* Refactor fwd and bwd data convND UT.

* Fix used test macro.

* Fix include path.

* Fix include paths.

* Fix include paths in profiler and tests.

* Fix include paths.

Co-authored-by: Adam Osewski <aosewski@amd.com>

[ROCm/composable_kernel commit: 3048028897]
2022-10-13 09:05:08 -05:00
zjing14
7566771bdd Add examples of batched/grouped/SplitK Gemm for int8/bfp16/fp16/fp32 (#361)
* add examples into grouped/batched_gemm

* adding splitK examples

* fixed splitK

* add bfp16 int8 example into splitK

* formatting

* use static_cast

* added common for batched_gemm

* add commons for examples of splitK/batched/grouped_gemm

* return true

* adjust splitK check tol

* update example

Co-authored-by: Chao Liu <lc.roy86@gmail.com>

[ROCm/composable_kernel commit: 6091458300]
2022-08-23 14:41:56 -05:00