Commit Graph

8 Commits

Author SHA1 Message Date
Aviral Goel
91ffc9dd1e chore(copyright): update copyright header for example directory (#3273)
* chore(copyright): update copyright header for codegen directory

* chore(copyright): update copyright header for example directory

[ROCm/composable_kernel commit: d85f065b15]
2025-11-24 18:02:41 -08:00
Michal Kulikowski
573df0d546 [CK][Examples] Extending support for rdna3/4 part 3:
-example_gemm_xdl_int8
-example_gemm_xdl_fp8
-example_gemm_xdl_fp8_bf8
-example_gemm_xdl_fp16_fp8
-example_gemm_add_add_fastgelu_xdl_int8
-example_grouped_gemm_xdl_int8
-example_grouped_conv_bwd_weight_xdl_bf16
-example_cgemm_xdl_fp32
-example_cgemm_xdl_int8

fixing cmdlines for:
-example_22_cgemm
-example_24_batched_gemm
-example_batched_gemm_xdl_fp16int4_b_scale_v3

Signed-off-by: Michal Kulikowski <Michal.Kulikowski@amd.com>


[ROCm/composable_kernel commit: 2444c44895]
2025-10-08 18:14:38 +02:00
jefyang1
23a8bed9af Use new mfma instructions for FP8 on gfx950 (#2202)
* Add logic to use new mfma instructions for fp8 bf8

* Fix example_gemm_xdl_fp8_pk_i4_bpreshuffle_v3 on gfx950 and run clang format

* Update include/ck/tensor_operation/gpu/warp/xdlops_gemm.hpp

Co-authored-by: Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com>

* Fix intrin_mfma f8 calls due to merge mistake

---------

Co-authored-by: Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com>

[ROCm/composable_kernel commit: f18170064d]
2025-05-19 17:29:51 -07:00
Rostyslav Geyyer
6dfbf61cf7 Add a gpu gemm reference kernel (#1528)
* Add a gpu gemm reference kernel

* Switch to gpu reference in gemm examples

* Remove redundant arguments

* Update all related examples

* Update more examples

* Try less threads per block

* Try even less threads per block

* Add support for all matrix layouts

* Increase block size

* Clean up

* Remove hardcoded strides

* Clean up

* Try a column-major case

* Revert back to row-major

* Run both CPU and GPU veriffication

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

[ROCm/composable_kernel commit: aa932445ea]
2024-10-08 11:05:28 -05:00
Rostyslav Geyyer
94954e9fe4 Set RNE fp8 conversion as a default (#1458)
* Set RNE fp8 conversion as a default

* Update f8 tests

* Disable failing test on gfx11

* Update bf8 tests

* Add a flag

* Fix the flag

* Raise flag for gfx10 as well

* Temp commit for tolerance testing

* Update tolerances

[ROCm/composable_kernel commit: e20f20efbf]
2024-08-21 09:09:48 -07:00
Rostyslav Geyyer
19366f13cd Fix example_gemm_xdl_fp8 (#1183)
[ROCm/composable_kernel commit: 9ce18b045d]
2024-03-01 16:42:15 -08:00
zjing14
5441fb5316 Fixed f8_gemm NaN (#975)
* workaround nan problem by changing output to fp16

* enable f8/bf8 gemm tests on MI200

* workaround f16 to f8 conversion

---------

Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: ac9595a9f1]
2023-10-10 10:30:26 -05:00
Rostyslav Geyyer
e80e4bedba Add fp8 @ bf8 gemm support and example (#933)
* Add f8 bf8 gemm example

* Add element-wise ops

* Add intrinsics

* Update reference calculation

* Add an additional type option for xdlops gemm

* Fix build process

* Add bf8 to buffer addressing

* Update blockwise op, split typeA and typeB

* Update for compatibility

* Uppdate naming to f8->fp8

* Update naming

* Format

[ROCm/composable_kernel commit: bd09b5c538]
2023-10-02 16:39:03 -05:00