* initial poc
* factor out common parts in operator()
* cv4
* rest of the universal gemm pipelines
* fix test
* remove boilerplate from tile engine
* fix example
* fix example
* format
* fix tests build for gemm
* remove base pipeline codegen from gemm instance builder
* unify v3 logic with the rest of universal gemm pipelines
* fix build for multi abd test
* fix test gemm multi d
* fix build for weight preshuffle
* fix grouped gemm test
* fix grouped gemm multi d test
* fix grouped gemm preshuffle
* fix grouped gemm example except for quant
* fix gemm preshuffle
* fix splitk 2 stage example
* fix batched gemm example
* fix multid example
* fix multiabd example
* fix batched gemm test
* fixup
* fix examples build
* fix grouped gemm test build
* fix smoke builder
This renames the typeToStr struct in the common utilities to DataTypeTraits and removes all duplication of DataTypeTraits across files in CK Tile.
Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com>
* remove EXCLUDE_FROM_ALL from ck-tile examples
-> +15 min build time w/ 64 threads for a single arch
* fix cpp17 compile error in the ck-tile examples
---------
Co-authored-by: khuagarw <khuagarw@amd.com>
Co-authored-by: Ding, Yi <yi.ding@amd.com>
* Renaming old code
* Adding GEMM code with new Architecture
* Partial Progress : Errors
* Partial Progress : Working code
* Changes to element wise function
* Removing Debugging statements
* Working GEMM Multi D code
* Removing Stale Code
* Address Copilot review comments
* Address Copilot review comments
* Changes to validation file
* Changes to common code snippets
* Creating common folder
* Removing duplicate files
* Pointing to right common file
* Pointing to right common file
* Pointing to right common file
* Changing to VERBOSE
* Changing CMAKE messages to verbose
* Updating Cmake with right layout datatype configs
* Working code for GEMM Multi D
* Partial Progress : CK Tile Engine GEMM
* Partial Progress : CK Tile Engine GEMM
* Partial Progress : Working GEMM Code
* Partial Progress : Working GEMM Code
* Changinf jenkins to remove preshuffle
* Partial Progress : CK TILE ENGINE GEMM Debugging
* Partial Progress : Removing changes that are not GEMM
* Partial Progress : Validation of full block size in GEMM
* Changes in Jenkins to run only fp16 and bf16
* Addressing Review Comments
* Partial Progress : Addressing CI issues
* Partial Progress - Runing GEMM for fp16,bf16 and rcr
* Clang
* Adding fp8 and bf8
* Adding fp8 and bf8
* Adding additional architrcture
* Limited datatypes and layouts
* Adding k_block_per_cu in test config
* Changes to faling CI errors
* Changes to faling CI errors
* Validation for GEMM
* Adding Layout support
* Adding Validations
* Adding layout in jenkins
* Update on Jenkins
* Distribution validation for GEMM
* Resolving merge conflicts
* Solving merge conflicts
* Reading gpuname from target for gemm in ck tile engine
* Reading gpuname from target for gemm preshuffle in ck tile engine
* Reading gpuname from target for gemm preshuffle in ck tile engine
* Get GPU changes for GEMM Muti D in TILE ENGINE
* Addressing errors for gpu name in cktileengine
* initial work on adding support of gfx12 in tile_engine for GEMM benchmarking
* add stage("Run TILE_ENGINE_GEMM Tests on gfx1201") to Jenkins config
* make tile_[m/n/k] validation arch dependent
* Making edits to identify individual compilation issues.
* Minor fix for blob txt files not being created.
* Fixing compilation issues.
* Fixing ordering bug.
* Adding python profiling functionality.
* Setting individual build as default.
* Setting gpu target filtering for tile engine to gfx90a, gfx942 and gfx950.
* update the default running parameters and settings
* Fixing bug with benchmarking, shifting file generation to build instead of config.
* Updating fixes.
* Fixing json output and parsing.
* Disable ccache for tile engine gemm ops because we dont need it.
* Removing duplicate type definition.
* Improving json printing.
* Add the flexibility of different layout and more warp tile support
* Fix extra flag in name of individual kernels.
* Fixing bug with booleans.
* Solve the first patch of the post merge conflict
* Compilation fixes, and cosmetic improvements.
* Yet again compilation fixes after latest changes from develop.
* Fixing python benchmarking script.
---------
Co-authored-by: Vidyasagar Ananthan <vidyasagar.ananthan@amd.com>
Co-authored-by: Vidyasagar Ananthan <vanantha@amd.com>
* Enable persistent kernel in tile_engine and use tail handler
* Fix formatting
* Add persistent to default_config.json
* Remove extra newlines and add persistent also to user config
* Reduce instances from default_config.json
* add persistent to benchmark.json and custom_ci_config.json
* changed the config file to have few instances
---------
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
Co-authored-by: ThomasNing <thomasning@amd.com>
* Modify CMakeLists to allow for splitting.
* Modify CMakeLists for data and layout logic.
* Run tests and get build artifact.
* Test new Cmakelists for speedup.
* Further improvements for speedup.
* turn off the FMHA
* turn off the automatic tile engine gemm
* minor fix
* disable the transpose test first
* Address the comment
* Jenkinsfile
* change the make thread to 64
* change the compile thread to 32
* Try to use with less OS memory space
* Have the Unity build batch size to 2
* reduce the chunk size
---------
Co-authored-by: Vidyasagar Ananthan <vidyasagar.ananthan@amd.com>
* update gpu_timer for rotating buffer as hipblasLt's implementation
* timing fix
* Updating gpu timer for old ck as well
* Revert "Updating gpu timer for old ck as well"
This reverts commit 958cd1bc99.
* code clean up with runtime argument; function rename
* code cleanup
* general timer fixes
* bug fix
* clang formatted
* addressing reveiew comments
* clang formatted
* Addressing review comments
* CI fix
---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
* Updating runtime log message for CK TILE ENGINE
* CKTile layout from config
* CKTile custom config for CI
* Documentation for Layout Changes
* CKTile Layout changes to Jenkins
* Fixing Clang Format
* Changes to Jenkins file to fix error
* fix(cmake-ck-dev): no longer sets invalid values as gpu arch
* style(py files): ruff formatting
* fix(cmake-ck-release): no longer sets invalid values as gpu arch
* chore(cmake-tile_engine): add reminder to uncomment user config json
* Changes to jenkin file to address more cases
* Changes to Jenkins to fix Error
* Changes to Jenkins file for fixing an error
* Update Jenkinsfile (#2517)
* Update Jenkinsfile
---------
Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com>
Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
* updates to support int8 in 03_gemm example
* added comments, using aliases, helper functions
* test(gemm_universal): add test cases for int8 gemm pipeline
* fix(test_gemm): fix for failing test unit test for int8
* test(ck_tile): add int8 unit test for gemm universal
* refactor(gemm_universal): GPU reference verification for GEMM code improved
* style(gemm_universal): removed extra comments and did clang format
* merging recent changes to universal gemm to tile_engine
* ck tile engine integration work
* feat(tile_engine): add int8 support to tile engine ops/gemm
* feat(tile_engine): added 32 32 16 mfma instances to tile engine for int8
* style: Format code with clang-format-12
* refactor(tile_engine): address review comments
* style: removed unhelpful comments & unused variables.
* build: tile engine uses default config
* feat: add int8 support for CK_TILE GEMM
* style: added trailing commas to codegen_utils.py
* refactor: tile engine
* refactor: formatting and code review
* refactor: code formatting for python files
* fix: suppress build warning
* add support for gfx950
* refactor:KWarpTile size in gemms util
* Fix the branch and wrap up the k warp tile
* Add bf8 integration
* refactor: clang format and rebase
---------
Co-authored-by: zjli2013 <leezhengjiang@gmail.com>
Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: Khushbu Agarwal <khuagarw@amd.com>
* [CK_TILE] Refine fp8 in flatmm
1. Replace USING_MFMA_16x16x32 & USING_MFMA_16x16x32 with constexpr
2. Add an additional const check to avoid build error in HotLoopScheduler
3. Refine shuffleb to support both tile 32x32 and 16x16
4. Support command option -init
5. Move Gemm warp defintion to a separate struct
* fix clang format
* fix clang format
* keep default bhavior unchanged (warp tile = 16x16)
* fix tile engine build error
* fix a typo in codegen_utils.py
* address review comments
* address review comments
---------
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
* - elevate important build messages to log level STATUS
- comment out the rest (temporarily)
* - marked all low importance build messages as log_level=DEBUG
* Adding validation for tile sizes
* Add architecture in config, and shuffle lines of code in warp_gemm.hpp
* Enable MFMA for gfx950, and invalid tile handling