* solve compiler issue
* solve the gfx950 mfma shuffle regression
* refactor jenkinsfile to handle arch name better
* [CK TILE] set divisor to count of thread along k dimension
* fix the compiler error
* solve degradation
* Finish the multiplies fix
* fix the scales
* solve compilation error
* solve the composes
* solve the error of tile sweeper
* fix the test and example
* fix for gfx950
---------
Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
Co-authored-by: Cong Ma <congma13@amd.com>
* Code drop for gtest to test atomic_add for a tensor
* Adding additional test cases
* Fix clang errors in CI pipeline
* Updated test cases
* Fix the Navi card atomic add problem
* solved the define problem
* add more print out traces
* Fix the float4 missing case
* solved the gfx9 errors
* Address the comment
---------
Co-authored-by: Khushbu <khuagarw@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>