Matthias Gehre
|
678298d4c7
|
Add support for gfx1153 (#3306)
|
2025-11-27 08:48:00 +01:00 |
|
Aviral Goel
|
de6466481f
|
chore(copyright): update copyright header for include directory (#3293)
|
2025-11-26 11:00:05 -07:00 |
|
Tianyuan Wu
|
68134b60e4
|
[CK_TILE] CK_TILE GEMM WMMA Support for GFX11/GFX12 (#2466)
* WMMA GEMM F16 Implementation
Signed-off-by: root <tianyuwu@amd.com>
* Self-review
Signed-off-by: root <tianyuwu@amd.com>
* ASIC check minor tweak
Signed-off-by: root <tianyuwu@amd.com>
* add missing include file
* Set GPU_TARGETS to gfx11/12 generic
Signed-off-by: root <tianyuwu@amd.com>
* INT8 GFX12
Signed-off-by: root <tianyuwu@amd.com>
* add int8x16 branch
* Fix CI script
Signed-off-by: root <tianyuwu@amd.com>
* Fix typo
Signed-off-by: root <tianyuwu@amd.com>
* Add CK_Tile WMMA example
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com>
* Fix CI
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com>
* fix clang format
* Set M/N_Warp Back to Constant
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com>
* Use GemmConfigComputeV3 by default
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
* Enable CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT for gfx12
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
* Remove CK_Tile wmma gemm examples from the CI list
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
* Add atomic add fallback method for gfx11
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
* Fix typo
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
* Omit copyright year
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
* Support non-square cases
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
* Fix CI
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
* Add get_device_ip()
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
* Revert "Add atomic add fallback method for gfx11"
This reverts commit 07a79e797d.
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>
* Revert "Enable CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT for gfx12"
This reverts commit ceee918007.
* Revise method name and typos
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>
* clang-format
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
* Try fix CI
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
* Revert "Try fix CI"
This reverts commit 7a7241085e.
* clang-format
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
* Fix typo caused by merge
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>
* Fix typo caused by merging
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>
---------
Signed-off-by: root <tianyuwu@amd.com>
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com>
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>
Co-authored-by: joye <joye@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
|
2025-08-15 16:22:27 -07:00 |
|
Yi DING
|
4fde1646e5
|
[CK_TILE] FMHA BWD Optimization For GFX950 (#2628)
* simplify fmha_bwd_kernel MakeKargs & dq_dram_window
* simply duplicate
* trload pipeline
* Try two-stage
* add prefetch
* optimize & iglp
|
2025-08-12 11:11:55 +08:00 |
|
Casey-Shi
|
128f5a1eab
|
[Tile Engine] Add benchmark for tile engine gemm. (#2193)
* initial commit -m benchmark
* only support profile
* fix
* fix doc
* add default config
* add ci
* fix cmake
* tmp save for gen blobs
* fix bug
* merge
* range config
* test success
* fix
* fix
* move struct
* remove config property
* fix config
* remove comment
* add cmake option & modify
* add changelog
* fix
* format
* add pydantic module to the docker image
* fix
* add benchmark for cold and warmp up
* python format
* add asm cache control
* fix README
* remove pydantic module
* modify changelog
* fix config
* recover benchmark_gemm and fix
* format python
* refactor profiler
* fix csv bug
* fix codegen bug
* add kernel instance object
* add benchmark gemm executable
* fix jenkins & delete extra header
* disable warning output & enable default config
* Disable sparsity for invalid warp tile combinations
* fix gemm host template func
* refactor gemm profiler
* filter out some inmstances
* default config test & fix codegen bug
* add sparse flag to gen more instances
---------
Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: khuagarw <khuagarw@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
|
2025-05-26 22:32:36 -07:00 |
|