* chore(copyright): update copyright header for test directory
* chore(copyright): update copyright header for test directory
* chore(copyright): update copyright header for client_example directory
* chore(copyright): update copyright header for test directory
* HasHotLoop is a constexpr
* Remove an unused function
* Remove some unused include statements
* Add implementation and tests for fp8 x bf8 weight preshuffle GEMM
* Add implementation and tests for fp8 x bf8 in CK Tile basic and universal GEMMs
* Remove two barrier calls that HotLoopScheduler already calls
* No need to suppress a variable that hasn't been declared
* Replace six arg_parser arguments with constexpr literals
* Simplify run_gemm_test_prec_type
* The strides don't need to be passed via arg_parser as we use their default values
* The layouts don't need to be passed as arguments twice
* Pass M N and K as regular arguments, not using the argument parser
* We can now remove the argument parser
* Add a common file for precision types to be used in testing
* Convert basic and universal GEMM tests to use gtest
* Make GemmConfig a test parameter, and form test cases as the cartesian product GemmConfigs x PrecTypes
* Add GemmConfigComputeV4 to the GEMM configs to run the universal tests on
* Added a changelog entry
* Add missing copyright statements
* ifndef-define-endif is not needed with pragma once
* Fix a comment
* Add F8 x BF8 tests for CompV4 in test_gemm_pipeline_kernel_types.hpp
* Disable the unreliable test MoeSortingCase4
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
* Adding RapidJson Library
* Adding Json Dumps in all CK_Tile Examples
Not verified yet
* Adding json to cktile Batched Transpose
* adding json dumps to layernorm2d_fwd
* Adding json dump to flatmm_basic
* Adding RapidJson Library
* Adding Json Dumps in all CK_Tile Examples
Not verified yet
* Adding json to cktile Batched Transpose
* adding json dumps to layernorm2d_fwd
* Adding json dump to flatmm_basic
* Adding json in 03_gemm
* Add json dump to 16_batched_gemm
* Add json dump to gemm_multi_d_fp16
* Add json dump to grouped_gemm
* fix fmha_bwd/fwd
* Fix clang-format errors
exclude include/rapidjson in jenkins as its a third-party library
* Saparating function and defination.
* Update Documentation of 03_gemm
* Refactoring as per code review
* Disable fp8 instances on unsupported targets (#2592)
* Restrict building of gemm_universal_preshuffle_f8 instances to specific targets in CMakeLists.txt
* Add condition to skip gemm_xdl_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt
* Add conditions to skip unsupported targets for gemm_universal_preshuffle_f8 and gemm_xdl_universal_preshuffle_f8 instances in CMakeLists.txt
* Refine conditions to exclude gemm_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt
---------
Co-authored-by: AviralGoelAMD <aviralgoel@amd.com>
* fix clang format
* remove duplicate lines of code from library/src/tensor_operation_instance/gpu/CMakeLists.txt
* Fixing Readme and unifying jsondumps
* adding moe_smoothquant
* adding fused_moe
* Fixing Readme for batched_gemm
* Fixing Readme for grouped_gemm
* adding flatmm
* adding gemm_multi_d_fp16
* adding elementwise
* adding File name when json is dumped
* Fixing Reduce after merge
* adding batched_transpose
* Adding Warptile in Gemm
* Fixing Clang Format
---------
Co-authored-by: Aviral Goel <aviral.goel@amd.com>
Co-authored-by: AviralGoelAMD <aviralgoel@amd.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
* refactor moe_sorting ctests to use gtest framework
* Refactor ctests for smoothquant to gtests
* fix clang format to use version 18
* Print local_eid in MOE sorting gtests
* Remove extra space in smoothquant output
* [CK_TILE] Disable moe_sorting unit test on gfx908
- gfx908 does not support instruction used in moe_sorting
* Update CMakeLists.txt
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
* Convert ck-tile 06_permute smoke test to unit tests for fp16, fp8, and fp32
* Apply clang format and update copy right year
* Convert ck tile moe sorting example smoke tests to unit tests
* fix CMakelists to ensure that permute and moe_sorting are built for gfx9 only.
* Remove number prefix from permute and moe_sorting directory names
* code cleanup
* add missing test cases for fp16 permute
* remove unecessary parentheses
* Cleanup
* Remove uneccessary final nullptr
* update copyright and licensing statement in files
* Add custom target for permute tests
* Add missing new line at end of file for moe sorting CMakelist.
* Update MOE sorting tests to account for MOE sorting example updates
The ck_tile/13_moe_sorting example was updated to include different
cases dependending on whether MOE_SORTING_FMOE_2D_BUF is set. So,
the ck_tile tests for MOE sorting were updated to account for these
changes.
---------
Co-authored-by: Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com>