deepsek
dde428cdf9
Added bf16 instances grouped gemm fixed nk ( #1825 )
...
* Feat: Add bf16 input instances
* feat: Add BF16 profiler code
* fix: reorder enum types
* fix: CI fail due to clang-format
* fix: clang script format issue
* fix: clang format broke cmakelist file
[ROCm/composable_kernel commit: e7dce4d247 ]
2025-01-20 09:13:09 -08:00
deepsek
7290b1a8dd
fix: preprocessor directives logic error if/else ( #1764 )
...
* fix: preprocessors logic error if/else
* fix: added macros as preferred by CK team
[ROCm/composable_kernel commit: 0fcbb25f70 ]
2025-01-16 20:31:15 -08:00
Adam Osewski
974524f2c1
Polished Grouped GEMM APIs and new BF16 instances ( #1600 )
...
* Few small fixes.
* New GroupedGemm instances (BF16)
* Unify and refactor GroupedGEMM device API.
* Adapt changes to new API.
* Adapt grouped gemm profiler.
* Accept multiple kbatches for grouped gemm profiler.
- delete obsolete two stage as it is now covered by grouped gemm
* Update unit test for grouped gemm.
* Fix thresholds for BF16 and F8. Unblock tests.
* Fix few instances.
* Multiple small fixes.
* Adapt to new API, check dynamic casting.
* Uncomment few data types in grouped gemm profiler.
* Fix call to SetDeviceArgs.
* Fix profile grouped gemm multiply tile loop.
* Fix grouped gemm tile loop kernel args in client examples.
* Review comments.
[ROCm/composable_kernel commit: 061ac0649c ]
2024-11-27 13:02:44 +01:00
Haocong WANG
68d3fce998
[GEMM] gemm_universal related optimization ( #1453 )
...
* replace buffer_atomic with global_atomic
* fixed global_atomic_add
* added bf16 atomic_add
* format
* clang-format-12
* clean
* clean
* add guards
* Update gtest.cmake
* enabled splitk_gemm_multi_d
* format
* add ckProfiler
* format
* fixed naming
* format
* clean
* clean
* add guards
* fix clang format
* format
* add kbatch printout
* clean
* Add rocm6.2 related gemm optimization
* Limit bf16 atomic usage
* remove redundant RCR gemm_universal instance
* Add RRR fp8 gemm universal instance
* Bug fix
* Add GPU_TARGET guard to FP8/BF8 target
* bug fix
* update cmake
* remove all fp8/bf8 example if arch not support
* Enable fp8 RRR support in ckProfiler
* limit greedy-reverse flag to gemm_universal in ckProfiler
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
Co-authored-by: Jing Zhang <jizhan@meta.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: 3049b5467c ]
2024-08-14 10:42:30 +08:00
jakpiase
b3a942c03c
Add support for mixed precision bf16&int8 grouped gemm ( #1166 )
...
* add support for mixed precision bf16&int8 grouped gemm
* fix gfx versions and add bf16 kbatch condition
* added reviewers comments
[ROCm/composable_kernel commit: 32d4be3d09 ]
2024-02-21 10:35:35 +01:00