Khushbu Agarwal
7a6efa0194
[CK_Tile] Updating gpu timer when doing flush cache ( #2593 )
...
* Missed updating function names in example
* updating timer
* code cleanup
* addressing review comments
* updating tile_engine code
* addressing review comments
[ROCm/composable_kernel commit: 88d72178d6 ]
2025-07-31 16:43:33 -07:00
Khushbu Agarwal
3bc1bdff9a
Update to gpu_timer for rotating_buffer ( #2524 )
...
* update gpu_timer for rotating buffer as hipblasLt's implementation
* timing fix
* Updating gpu timer for old ck as well
* Revert "Updating gpu timer for old ck as well"
This reverts commit 958cd1bc99 .
* code clean up with runtime argument; function rename
* code cleanup
* general timer fixes
* bug fix
* clang formatted
* addressing reveiew comments
* clang formatted
* Addressing review comments
* CI fix
---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
[ROCm/composable_kernel commit: 61e21f5567 ]
2025-07-29 15:21:05 -07:00
Khushbu Agarwal
2b6621fba8
Rotating buffer PR CI fix ( #2257 )
...
* Revert "Revert "[CK_tile] Add rotating buffer feature for universal gemm (#2200 )" (#2256 )"
This reverts commit 7baac527a1 .
* fix regression
[ROCm/composable_kernel commit: 2e38eb4f1c ]
2025-06-02 10:25:01 -07:00
Illia Silin
7baac527a1
Revert "[CK_tile] Add rotating buffer feature for universal gemm ( #2200 )" ( #2256 )
...
This reverts commit 0f77aa335d .
[ROCm/composable_kernel commit: bbdaf79a52 ]
2025-05-28 09:46:52 -06:00
Khushbu Agarwal
0f77aa335d
[CK_tile] Add rotating buffer feature for universal gemm ( #2200 )
...
* Add rotating buffer feature for universal gemm
* adding changes in tile_engine
* Updated code to merge kernel_launch
* removing comments
* Enable rotating buffer changes to flatmm
* Created diff launch_kernel function for rotating buffer
* Simplfied calculation using macros
* merge code with new changes in tile_engine
* clang formatted
* Redefine macros
[ROCm/composable_kernel commit: 99857e10e6 ]
2025-05-27 23:00:58 -07:00
Casey-Shi
64b17847fa
[Tile Engine] Add benchmark for tile engine gemm. ( #2193 )
...
* initial commit -m benchmark
* only support profile
* fix
* fix doc
* add default config
* add ci
* fix cmake
* tmp save for gen blobs
* fix bug
* merge
* range config
* test success
* fix
* fix
* move struct
* remove config property
* fix config
* remove comment
* add cmake option & modify
* add changelog
* fix
* format
* add pydantic module to the docker image
* fix
* add benchmark for cold and warmp up
* python format
* add asm cache control
* fix README
* remove pydantic module
* modify changelog
* fix config
* recover benchmark_gemm and fix
* format python
* refactor profiler
* fix csv bug
* fix codegen bug
* add kernel instance object
* add benchmark gemm executable
* fix jenkins & delete extra header
* disable warning output & enable default config
* Disable sparsity for invalid warp tile combinations
* fix gemm host template func
* refactor gemm profiler
* filter out some inmstances
* default config test & fix codegen bug
* add sparse flag to gen more instances
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
Co-authored-by: khuagarw <khuagarw@amd.com >
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
[ROCm/composable_kernel commit: 128f5a1eab ]
2025-05-26 22:32:36 -07:00