* remove EXCLUDE_FROM_ALL from ck-tile examples
-> +15 min build time w/ 64 threads for a single arch
* fix cpp17 compile error in the ck-tile examples
---------
Co-authored-by: khuagarw <khuagarw@amd.com>
Co-authored-by: Ding, Yi <yi.ding@amd.com>
[ROCm/composable_kernel commit: 79aae7c7f7]
* GH-2368 Adding a basic glossary
GH-2368 Minor edits
GH-2368 Adding missing READMEs and standardization.
resolving readme updates
GH-2368 Minor improvements to documentation.
Improving some readmes.
Further improvement for readmes.
Cleaned up the documentation in 'client_example' (#2468)
Update for PR
Update ACRONYMS.md to remove trivial terms
Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats
Apply suggestion from @spolifroni-amd
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
Apply suggestion from @spolifroni-amd
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine.
revise 37_transpose readme
revise 36_copy readme
Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity.
Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity.
Remove references to the Tile Engine in README files across multiple examples
* GH-2368 Adding a basic glossary
GH-2368 Minor edits
GH-2368 Adding missing READMEs and standardization.
resolving readme updates
GH-2368 Minor improvements to documentation.
Improving some readmes.
Further improvement for readmes.
Cleaned up the documentation in 'client_example' (#2468)
Update for PR
Update ACRONYMS.md to remove trivial terms
Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats
Apply suggestion from @spolifroni-amd
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
Apply suggestion from @spolifroni-amd
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine.
revise 37_transpose readme
revise 36_copy readme
Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity.
Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity.
Remove references to the Tile Engine in README files across multiple examples
Refine README files by removing outdated references to the Tile Engine
* Updates based on PR feedback 1
* Updates based on PR feedback 2
* Updates based on PR feedback 3
* Updates based on PR feedback 4
* Updates based on PR feedback 5
* Updates based on PR feedback 6
* Updates based on PR feedback 7
* Updates based on PR feedback 8
* Content Modification of CK Tile Example
* Modify the ck_tile gemm config
---------
Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>
[ROCm/composable_kernel commit: 92c67a824f]
* [CK_TILE] Correct BlockWarps calculation and fix smoke-test in rmsnorm
* Update rmsnorm host reference
* Update tree reduction of rmsnorm for reference host
* Fix cross warp for m > 1 cases
* Add RMSNorm model selectable option for host reference
* Fix save_unquant cases
* Update reference rmsnorm forward function to use enum for model sensitivity
* Update reference rmsnorm calculation for model sensitivity
* Fix m warp for layernorm
* Adjust parameter of reference for twoPass
* Fix clang format
* Run clang-format-overwrite.sh to fix formating issue
* fix clang format
---------
Co-authored-by: MHYang <mengyang@amd.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>
[ROCm/composable_kernel commit: e1b0bdfbfa]
BlockWarps, WarpTile in Generic2dBlockShape are wave size dependent, it causes mangled name mismatch between host and device side.
Solution: Replace them with ThreadPerBlock and move BlockWarps, WarpTile calculation into Generic2dBlockShape
[ROCm/composable_kernel commit: c254f3d7b4]
* add aiter tests and move json_dump header
* remove example/include path from cmake
* extend time for aiter and pytorch stages
[ROCm/composable_kernel commit: ef6c28e989]
* Do not use warpSize as compile time constant as it is removed
* Update tile_image_to_column_shape.hpp
update warpSize usage.
* clean-up all use of warpSize, make sure code builds
* fix
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: Bartlomiej Kocot <barkocot@amd.com>
[ROCm/composable_kernel commit: 4c57157d50]
* - elevate important build messages to log level STATUS
- comment out the rest (temporarily)
* - marked all low importance build messages as log_level=DEBUG
[ROCm/composable_kernel commit: aed0f5880c]
* add ck tile examples to package
* Update jenkinsfile
* fix for jenkinsfile
* fix for building ck tile code on non gfx9
* compile ck tile examples only for gfx94
* include ck tile examples in all target
* fix for basic gemm UseStructuredSparsity
* Update CMakeLists.txt
* Update gemm_pipeline_problem.hpp
* add targets to rocm install
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
[ROCm/composable_kernel commit: 434d19f696]
* 50ms -> 28ms
* Fix bug in non fuse_add_store cases
* Fine tuned setting for 2 pass pipeline
* adjust workload
* remove unnecessary change
* add layernorm
* Adding output quant and unquant results at the same time.
* fix test
* fix format
* tune for cases 128x640 and 128x1024
* bug ifx
[ROCm/composable_kernel commit: d49abdaa87]
* Add shortcut to RMSNorm
* Modify test for adding shortcut for RMSNorm
* Add fused parameter into tests
* 1. Add YDataType. 2. rmsnorm2d_fwd_traits_ from rmsnorm2d_fwd.hpp to rmsnorm2d_fwd_api.cpp and rmsnorm2d_fwd_instance_common.hpp
* 1. Supports various stride and percisions.
* Add support of Epilogue
* Add fuse and epilogue support to rmsnorm ref
* Modify rmsnorm example
* Refactor tests/examples
* Bug fix for newly added tests/examples
* Bug fix for new tests 2
* Modify smoke test scripts
remove dbg code
* Supports non-smooth dyanmic quant
* Update Rmsnorm2dFwd::GetName()
* rename xscale and prec_sx to smoothscale and prec_sm
Bug fix after rename
Remove files
* change example_rmsnorm2d_fwd.cpp
* update performance calculator
* Fix issue in two-pass when fuse add is enabled
* Remove comment of beta
---------
Co-authored-by: rocking <ChunYu.Lai@amd.com>
[ROCm/composable_kernel commit: 04dd314883]
* Add reduce2d new api
* Prevent user use cross warp reduction
* Fix bug of std caculation
* Add rmsnorm2d
* Add rmsnorm small example
* Remove static assert to prevent compile fail
* Add script to test performance and correctness
* Add missing cmake change
* refine naming
* refine example of rmsnorm
* Fix bug of rmsnorm
* Refine naming
* Fix cmake
* clang format
* Refine pipeline name
* Add add_rmsnorm2d_rdquant kernel
* Add reduce op
* host verification
* Fix bug of one pass pipeline
* Refine tile size
* Add two pass pipeline
* Rename two pass to three pass
* Fix bug of kSaveX == false
* Add instance library
* Add test script
* Fix bug of x verification
* Add save_x to trait
* Add README
* Move reduce2d into reduce folder
* Fix bug of welford when number of m warp > 1
* remove reduncant comment
* 1. move 06_rmsnorm2d to 10_rmsnorm2d
2. move 07_add_rmsnorm2d_rdquant to 11_add_rmsnorm2d_rdquant
* clang format and add missing header
* Add host validation of add + layernorm2d + rsquant
* Revert "Add host validation of add + layernorm2d + rsquant"
This reverts commit 936cb45797.
* Remove deprecated flag
[ROCm/composable_kernel commit: 3d60953477]