rahjain-amd
7674eb6416
Add json dump support to output details from CK/CKTile Examples. ( #2551 )
...
* Adding RapidJson Library
* Adding Json Dumps in all CK_Tile Examples
Not verified yet
* Adding json to cktile Batched Transpose
* adding json dumps to layernorm2d_fwd
* Adding json dump to flatmm_basic
* Adding RapidJson Library
* Adding Json Dumps in all CK_Tile Examples
Not verified yet
* Adding json to cktile Batched Transpose
* adding json dumps to layernorm2d_fwd
* Adding json dump to flatmm_basic
* Adding json in 03_gemm
* Add json dump to 16_batched_gemm
* Add json dump to gemm_multi_d_fp16
* Add json dump to grouped_gemm
* fix fmha_bwd/fwd
* Fix clang-format errors
exclude include/rapidjson in jenkins as its a third-party library
* Saparating function and defination.
* Update Documentation of 03_gemm
* Refactoring as per code review
* Disable fp8 instances on unsupported targets (#2592 )
* Restrict building of gemm_universal_preshuffle_f8 instances to specific targets in CMakeLists.txt
* Add condition to skip gemm_xdl_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt
* Add conditions to skip unsupported targets for gemm_universal_preshuffle_f8 and gemm_xdl_universal_preshuffle_f8 instances in CMakeLists.txt
* Refine conditions to exclude gemm_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt
---------
Co-authored-by: AviralGoelAMD <aviralgoel@amd.com >
* fix clang format
* remove duplicate lines of code from library/src/tensor_operation_instance/gpu/CMakeLists.txt
* Fixing Readme and unifying jsondumps
* adding moe_smoothquant
* adding fused_moe
* Fixing Readme for batched_gemm
* Fixing Readme for grouped_gemm
* adding flatmm
* adding gemm_multi_d_fp16
* adding elementwise
* adding File name when json is dumped
* Fixing Reduce after merge
* adding batched_transpose
* Adding Warptile in Gemm
* Fixing Clang Format
---------
Co-authored-by: Aviral Goel <aviral.goel@amd.com >
Co-authored-by: AviralGoelAMD <aviralgoel@amd.com >
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: 4d041837ad ]
2025-09-02 23:31:29 -07:00
linqunAMD
807f7510b5
Support Wave32 in CK_TILE - Part 1 ( #2594 )
...
* Support wave32/wave64 in CK_TILE - Part 1
* remove blocksize in kernel launch
* fix build error
* fix clang format
* fix clang format 2
* fix clang format 3
* fix fmha build error
* fix fmha build 2
* fix fmha build 3
* fix build error 4
* address review comment
* update change log
* replace KernelBlockSize with kBlockSize
* fix CI fail
* fix clang format
* address review comment and rebase code.
* fix universal test fail
---------
Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com >
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
[ROCm/composable_kernel commit: 9fcc1ee9fd ]
2025-08-18 10:08:31 -07:00
Emily Martins
70f25296e6
[CK_Tile] Refactor MOE Sorting and Smoothquant ctests to gtests ( #2596 )
...
* refactor moe_sorting ctests to use gtest framework
* Refactor ctests for smoothquant to gtests
* fix clang format to use version 18
* Print local_eid in MOE sorting gtests
* Remove extra space in smoothquant output
[ROCm/composable_kernel commit: 70dce4e0c6 ]
2025-08-14 10:54:57 -07:00
Illia Silin
3345f5f417
upgrade from clang-format-12 to clang-format-18 ( #2568 )
...
* upgrade to clang-format-18
* update to clang-format-18 in pre-commit-config
[ROCm/composable_kernel commit: 504b101da3 ]
2025-07-28 11:34:07 -07:00
Emily Martins
356180d6a3
Tests for CK Tile Batched Transpose and Smoothquant ( #2453 )
...
* Create tests for ck tile batched transpose using example
* Create ck tile tests for smoothquant using examples
* fix precision input strings and convert batched transpose to regression tests
* Code cleanup and fix asserts
* add missing licenses
* update copyright and licensing in files
* Update smoothquant tests to use example's smoothquant.cpp
* Add custom target for batched transpose tests
* Add missing new lines at end of files for CMakelists
* fix typo in batched transpose CMakeList target_compile_options
---------
Co-authored-by: root <root@ctr-ubbsmc16.amd.com >
[ROCm/composable_kernel commit: c08986b026 ]
2025-07-17 09:53:34 -06:00