* Finished adding the performance benchmark for ck tile gemm
* Fix the executable rename problem
* fix the executable name error
* delete the unsupported layout combinations
* Update run_full_test.sh
* Update benchmark_mem_pipeline.sh
* Update benchmark_basic.sh
* change the executable of gemm_universal
* change ck_tile_gemm script permissions
* Addressed the comment
* Addressed the comment
* Fixed the comments
* Fixed Comment
* roll back the malfunctioned change
* Fix the Typo
* finalize the tile_gemm_fp16 performance monitoring
* fix the stash names for ck_tile gemm logs
* change the stashing logic
* change stashing syntax
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>
* Fix text alignment of ArgParser::print()
* Update example README files
* Clarify make-ck-dev.sh <arch> usage
* Only keep some of the argument from '-?' output
* Undo command line output changes in README
* Only keep existing argument on doc and update description
* Fix text alignment
* Make cmake-ck-*.sh compatible with 'sh' command
* Checkpoint: Finished with the tile example & kernel verification, working on the different matrix layout
* Finished the Matrix Layout feature set up. Note: Need to modify the inner block to solve the shuffle problem in the future.
* Fix: Clang Format, API fixed from fmha
* fix with better naming convention
* revert back the pipeline code of fmha
* Fixed: Addressed the comments and merge the GEMM shape of GEMM Operator and FMHA Operator to one.
* clang format with the reference_gemm file
* convert the clang format with the remod.py
* Changed the format and variable name of the kernel gemm_shape and partitioner
---------
Co-authored-by: thomasning <thomasning@banff-cyxtera-s70-4.ctr.dcgpu>