* Adding validation for tile sizes
* Add architecture in config, and shuffle lines of code in warp_gemm.hpp
* Enable MFMA for gfx950, and invalid tile handling
* make the work compiled
* Solved the example code, but still have the profiler error
* Finished the feature
* Clang format and update the CHANGELOG
* solve the preshuffle v1 & v2 problem
* Comment Addressed
* Comment Addressed
* Shared Memory for single data point
* CKTile Transpose vectorize CP1
* CKTile Transpose vectorize CP2
* CKTile Transpose vectorize CP2.1
* fixed the compile error of the transpose tile 2d
* Have the correct result for the current test sample
* Changes to printing tensor
* fp8 support added
* Debugging for transpose
* solving the corner issue
* Changed padding flag
* Intermideate Debugging
* Intermidiate Debugging
* Intermediate Debugging
* Finished debugging of the transpose op
* Code Cleanup
* Adding edge case smoke tests
* Adding Transpose test to CI/CD
* Adding Transpose test to CI/CD
* Adding Transpose test to CI/CD
* Addressing Review Comment
* Addressing Comments
* Addressing Comments
* Measuring Perf Tests
* Code Cleanup
* Changlog
* Added the running iterations
* clang format
* Fix the changelog
* Fix the compilation error
* change the printing factor
---------
Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com>
* Changes for updating tile distribution for shuffle and transpose
* Fixed swizzle and transpose, removed comments
* clang formatted
* Adding support for bf16 type
* Addressing review comments
* build CI for gfx942 exclusively
* run the last stage in a docker with user jenkins
* update the image for the last stage
* ignore perf_log if not found
* archive and store all packages
* use ccache for building packages
* sync with function interface of cshuffleepiloge,fix flatmm build fail
* move code from solin/flatmm which add mfma16*16*32fp8 and optimize flatmm
---------
Co-authored-by: solin <bingzhou@amd.com>
* Test Copy kernel code for testing tile distribution logic
* Fix the error
* Solved the problem
* Updated comments and document formatting
* Removed unused tile distribution and code cleanup
* Added README.md and formatting for CI/CD.
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com>
* added WindowType enum to tile_window_structs and static assert checks in computev4 pipeline
* added type traits instead of enum to tile_window() and tile_window_linear() with debug comments
* removed comments, added documentation and clang format