Yanxing-Shi
|
926bd2b985
|
fix conflict
|
2025-05-28 09:08:42 +00:00 |
|
Khushbu Agarwal
|
99857e10e6
|
[CK_tile] Add rotating buffer feature for universal gemm (#2200)
* Add rotating buffer feature for universal gemm
* adding changes in tile_engine
* Updated code to merge kernel_launch
* removing comments
* Enable rotating buffer changes to flatmm
* Created diff launch_kernel function for rotating buffer
* Simplfied calculation using macros
* merge code with new changes in tile_engine
* clang formatted
* Redefine macros
|
2025-05-27 23:00:58 -07:00 |
|
Yanxing-Shi
|
b88be7fff3
|
merge upstream
|
2025-05-27 09:31:20 +00:00 |
|
Casey-Shi
|
128f5a1eab
|
[Tile Engine] Add benchmark for tile engine gemm. (#2193)
* initial commit -m benchmark
* only support profile
* fix
* fix doc
* add default config
* add ci
* fix cmake
* tmp save for gen blobs
* fix bug
* merge
* range config
* test success
* fix
* fix
* move struct
* remove config property
* fix config
* remove comment
* add cmake option & modify
* add changelog
* fix
* format
* add pydantic module to the docker image
* fix
* add benchmark for cold and warmp up
* python format
* add asm cache control
* fix README
* remove pydantic module
* modify changelog
* fix config
* recover benchmark_gemm and fix
* format python
* refactor profiler
* fix csv bug
* fix codegen bug
* add kernel instance object
* add benchmark gemm executable
* fix jenkins & delete extra header
* disable warning output & enable default config
* Disable sparsity for invalid warp tile combinations
* fix gemm host template func
* refactor gemm profiler
* filter out some inmstances
* default config test & fix codegen bug
* add sparse flag to gen more instances
---------
Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: khuagarw <khuagarw@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
|
2025-05-26 22:32:36 -07:00 |
|
Yanxing-Shi
|
7549e2b2e6
|
fix readme
|
2025-05-26 06:45:36 +00:00 |
|
Yanxing-Shi
|
3506722e6a
|
add benchmark gemm executable
|
2025-05-20 15:41:19 +00:00 |
|
Yanxing-Shi
|
9897410acf
|
refactor profiler
|
2025-05-19 10:42:57 +00:00 |
|
Yanxing-Shi
|
012c77125a
|
recover benchmark_gemm and fix
|
2025-05-16 10:37:59 +00:00 |
|
Yanxing-Shi
|
fc092038f7
|
fix README
|
2025-05-15 12:37:00 +00:00 |
|
Yanxing-Shi
|
3140659357
|
fix
|
2025-05-13 14:18:16 +00:00 |
|
Yanxing-Shi
|
a8a19be1b0
|
merge
|
2025-05-13 07:39:51 +00:00 |
|
Yanxing-Shi
|
2d3dc763f8
|
merge
|
2025-05-13 06:27:16 +00:00 |
|
Yanxing-Shi
|
267eb410cc
|
tmp save for gen blobs
|
2025-05-12 07:06:15 +00:00 |
|
Yanxing-Shi
|
1ccecf9a11
|
add default config
|
2025-05-07 10:59:36 +00:00 |
|
Yanxing-Shi
|
bc72ec4cfb
|
fix doc
|
2025-05-06 08:29:11 +00:00 |
|
Yanxing-Shi
|
d3d32843b5
|
only support profile
|
2025-05-01 11:05:27 +00:00 |
|
Aviral Goel
|
1aea51d34e
|
[Tile Engine] Improved README.md (#2134)
* improved tile_engine readme
* changed ck tile explanation and json
* further improved readme
* fixed typo
|
2025-04-29 17:37:07 -07:00 |
|
Khushbu Agarwal
|
768c99eca9
|
[TileEngine] Support for sparsity in codegen (#2128)
* Added sparsity flag in codegen
* remove comments
* clan formatted
* added sparsity as runtime argument
* updated README
* updated stream config variable
* fix typo for tail_num in hot loop
|
2025-04-28 18:19:23 -07:00 |
|
Khushbu Agarwal
|
7cadf187e2
|
multi instance generation for CkTileEngine (#2080)
* Add support for multi-instance verification, print detail for each instance, documentation fix
* clang formatted
* Added Readme file
* updated readme
* Addressing review comments
* clang formatted
* Updated ReadMe and GPU reference code
* simplified dispatch kernel code
* indentation
|
2025-04-21 08:39:45 -07:00 |
|