Commit Graph

32 Commits

Author SHA1 Message Date
Vidyasagar Ananthan
b26a7f9b84 [DOCS] Documentation Addition (Readme updates) (#2495)
* GH-2368 Adding a basic glossary

GH-2368 Minor edits

GH-2368 Adding missing READMEs and standardization.

resolving readme updates

GH-2368 Minor improvements to documentation.

Improving some readmes.

Further improvement for readmes.

Cleaned up the documentation in 'client_example' (#2468)

Update for PR

Update ACRONYMS.md to remove trivial terms

Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine.

revise 37_transpose readme

revise 36_copy readme

Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity.

Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity.

Remove references to the Tile Engine in README files across multiple examples

* GH-2368 Adding a basic glossary

GH-2368 Minor edits

GH-2368 Adding missing READMEs and standardization.

resolving readme updates

GH-2368 Minor improvements to documentation.

Improving some readmes.

Further improvement for readmes.

Cleaned up the documentation in 'client_example' (#2468)

Update for PR

Update ACRONYMS.md to remove trivial terms

Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine.

revise 37_transpose readme

revise 36_copy readme

Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity.

Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity.

Remove references to the Tile Engine in README files across multiple examples

Refine README files by removing outdated references to the Tile Engine

* Updates based on PR feedback 1

* Updates based on PR feedback 2

* Updates based on PR feedback 3

* Updates based on PR feedback 4

* Updates based on PR feedback 5

* Updates based on PR feedback 6

* Updates based on PR feedback 7

* Updates based on PR feedback 8

* Content Modification of CK Tile Example

* Modify the ck_tile gemm config

---------

Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: 92c67a824f]
2025-10-16 03:10:57 -07:00
Aviral Goel
2a7946bbe8 fix: nil performance results for gemm examples (#2950)
[ROCm/composable_kernel commit: 19415d0b6f]
2025-10-06 12:43:23 -07:00
linqunAMD
930f95d4a6 [CK_TILE] Enable ck_tile tests on gfx11 and gfx12 (#2821)
* [CK_TILE] Enable ck_tile test on gfx11 & gfx12

* revert an unnecessary change

* enable pk_int4 on gfx11 & gfx12

* revert .pre-commit-config.yaml

[ROCm/composable_kernel commit: b0ee317d83]
2025-09-12 12:45:14 -07:00
linqunAMD
7022f756e6 [CK_TILE] Fix flatmm on gfx11 and gfx12 (#2790)
1. Correct shuffle_b and MakeBFlatDramTileDistribution according to WMMA warp layout
2. Add FlatmmConfig16_Wmma for gfx11 and gfx12

[ROCm/composable_kernel commit: df4ee556d6]
2025-09-10 08:28:00 +08:00
Illia Silin
80cc2b7bca Fix latest AITER failure and add more AITER tests in CK CI. (#2782)
* add aiter tests and move json_dump header

* remove example/include path from cmake

* extend time for aiter and pytorch stages

[ROCm/composable_kernel commit: ef6c28e989]
2025-09-04 13:44:00 -07:00
rahjain-amd
ff4df5c158 Add json dump support to output details from CK/CKTile Examples. (#2551)
* Adding RapidJson Library

* Adding Json Dumps in all CK_Tile Examples

Not verified yet

* Adding json to cktile Batched Transpose

* adding json dumps to layernorm2d_fwd

* Adding  json dump to flatmm_basic

* Adding RapidJson Library

* Adding Json Dumps in all CK_Tile Examples

Not verified yet

* Adding json to cktile Batched Transpose

* adding json dumps to layernorm2d_fwd

* Adding  json dump to flatmm_basic

* Adding json in 03_gemm

* Add json dump to 16_batched_gemm

* Add json dump to gemm_multi_d_fp16

* Add json dump to grouped_gemm

* fix fmha_bwd/fwd

* Fix clang-format errors

exclude include/rapidjson in jenkins as its a third-party library

* Saparating function and defination.

* Update Documentation of 03_gemm

* Refactoring as per code review

* Disable fp8 instances on unsupported targets (#2592)

* Restrict building of gemm_universal_preshuffle_f8 instances to specific targets in CMakeLists.txt

* Add condition to skip gemm_xdl_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt

* Add conditions to skip unsupported targets for gemm_universal_preshuffle_f8 and gemm_xdl_universal_preshuffle_f8 instances in CMakeLists.txt

* Refine conditions to exclude gemm_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt

---------

Co-authored-by: AviralGoelAMD <aviralgoel@amd.com>

* fix clang format

* remove duplicate lines of code from library/src/tensor_operation_instance/gpu/CMakeLists.txt

* Fixing Readme and unifying jsondumps

* adding moe_smoothquant

* adding fused_moe

* Fixing Readme for batched_gemm

* Fixing Readme for grouped_gemm

* adding flatmm

* adding gemm_multi_d_fp16

* adding elementwise

* adding File name when json is dumped

* Fixing Reduce after merge

* adding batched_transpose

* Adding Warptile in Gemm

* Fixing Clang Format

---------

Co-authored-by: Aviral Goel <aviral.goel@amd.com>
Co-authored-by: AviralGoelAMD <aviralgoel@amd.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 4d041837ad]
2025-09-02 23:31:29 -07:00
Cong Ma
871f68f227 [CK TILE GEMM] Fix a merge conflict (#2753)
* Fixed a merge conflict in 39b254c1
* Foramt the code

[ROCm/composable_kernel commit: cd53e2e57e]
2025-08-27 11:08:09 -07:00
Max Podkorytov
339fc3da3f invoke script directly (#2687)
[ROCm/composable_kernel commit: f38751fc2a]
2025-08-19 00:23:07 -07:00
linqunAMD
615ca9842d Support Wave32 in CK_TILE - Part 1 (#2594)
* Support wave32/wave64 in CK_TILE - Part 1

* remove blocksize in kernel launch

* fix build error

* fix clang format

* fix clang format 2

* fix clang format 3

* fix fmha build error

* fix fmha build 2

* fix fmha build 3

* fix build error 4

* address review comment

* update change log

* replace KernelBlockSize with kBlockSize

* fix CI fail

* fix clang format

* address review comment and rebase code.

* fix universal test fail

---------

Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: 9fcc1ee9fd]
2025-08-18 10:08:31 -07:00
Jimniu
d6b1452592 Jimniu/tile_example_flatmm_basic fix (#2680)
* Add stride_b validation

* run clang-format

[ROCm/composable_kernel commit: 753b6227c5]
2025-08-13 16:06:08 -07:00
Aviral Goel
8bf96c18c6 Integration of a new pipeline for weight preshuffle into gemm examples (#2516)
* something khushbu can help with

* v1 v2 works with flatmm develop

* v0 v1 v2 numerical error gone

* Fixing numerical error, and interchange preshuffle configs to match with flatmm

* Refactor GEMM pipeline configurations and integrate preshuffle support

- Updated preshuffle pipeline definitions to include multiple versions (V1, V2, V3).
- Changed the pipeline constant from CK_TILE_PIPELINE_PRESHUFFLE to CK_TILE_PIPELINE_PRESHUFFLE_V3 in relevant configurations.
- Removed obsolete code and comments

* clang format

* fix vectorloadsize bug

* add the Preshuffle3

* update kwarp calculation in gemm utils

* update vector size A and B correctly in V2 pipeline; Added few more changes to align with dteng's branch

* fix: add CK_GFX950_SUPPORT macro for gfx950 detection

* default disable rotating buffer

* docs(CHANGELOG): update changelog for rocm 7.0

* Revert "docs(CHANGELOG): update changelog for rocm 7.0"

This reverts commit 2bc16fff84.

* Remove unused Preshuffle V3 pipeline and related code; update gemm function to use Preshuffle V2; clean up comments and formatting in various files.

* revert example/ck_tile/flatmm to its original state

* remove comment added by second author

* switch to xor ALDSDescriptor

* modify the MakeALdsDescriptor()

* temporary profiling script

* getting rid of line marker compiler error

* UniversalWeightPreshufflePipelineAgBgCrPolicy now derives from UniversalGemmBasePolicy

* add a minor fix for the config

* typo fix

* Fix formatting in lambda function for WeightPreshufflePipelineAGmemBGmemCRegV2

* revert change in include/ck_tile/ops/flatmm/pipeline/flatmm_pipeline_agmem_bgmem_creg_v1.hpp

* revert change in include/ck_tile/core/arch/amd_buffer_addressing.hpp

* reenable the GemmSpatiallyLocalTilePartitioner

* make GemmConfigPreshuffle_1 for v1 pipeline, GemmConfigPreshuffle_2 for v2 pipeline

* remove hardcoded true for preshuffle bool template argument

* rename script

* remove gemm_profilie.sh script

* merge conflict resolve

* clang formatted

* typo fix

* Remove duplicate include of block_gemm_areg_bsmem_creg_v2r1.hpp in gemm.hpp

* Remove commented-out code in UniversalWeightPreshufflePipelineAgBgCrPolicy

* Fix missing newline at end of file in run_gemm_example.inc

* Remove unused barrier call in BlockWeightPreshuffleASmemBSmemCRegV1

* addressing review comments

* removing debug code

* addressing review comments

* Revert "addressing review comments"

This reverts commit 29c45192ba.

* updating tile_engine code

* addressing review comments

---------

Co-authored-by: amd-khushbu <khuagarw@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: 1441a0a7ee]
2025-08-01 00:04:54 -07:00
Khushbu Agarwal
7a6efa0194 [CK_Tile] Updating gpu timer when doing flush cache (#2593)
* Missed updating function names in example

* updating timer

* code cleanup

* addressing review comments

* updating tile_engine code

* addressing review comments

[ROCm/composable_kernel commit: 88d72178d6]
2025-07-31 16:43:33 -07:00
Khushbu Agarwal
ac0a7f44e1 Fixing numerical error, and interchange preshuffle configs to match with flatmm (#2515)
[ROCm/composable_kernel commit: 579bd73435]
2025-07-16 22:33:03 -07:00
rahjain-amd
92ada43ba2 add missing condition for bf16 (#2502)
Without this DataType = unknown -
``` sh
Run Flatmm kernel with DataType = unknown M =1280 N =16384 K =1024 StrideA =1024 StrideB =1024 StrideC =16384 : 0.228837 ms, 187.687 TFlops, 341.374 GB/s,
```

after this change
```sh
Run Flatmm kernel with DataType = bf16 M =1280 N =16384 K =1024 StrideA =1024 StrideB =1024 StrideC =16384 : 0.227029 ms, 189.181 TFlops, 344.092 GB/s,
```

[ROCm/composable_kernel commit: 6b09f0823e]
2025-07-15 21:25:56 +05:30
Khushbu Agarwal
e34599e8a9 Merge flatmm Operator with universal gemm (#2434)
* Initial commit

* Adding new tile partitioner to flatmm

* intermediate changes

* debugging kernels

* Updating flatmm example to universal gemm example

* updated flatmm kernel to run via gemmKernel

* update universal gemm to incorporate flatmm

* debug

* Fix flatmm call

* Fixing other kernels and tests for API changes

* clang formatted

* fixing gemm tests

* added test for flatmm and simplify kernel arguments

* adding flatmm test

* fix test for flatmm

* simplify gemm kernel with flatmm

* remove flatmm related files

* addressing review comments and code clean up

* resolving empty file

* resolving empty file

* clang formatted

* addressing review comments

* enable persistent kernel for flatmm

* reverted the removed files for flatmm

* reverted the removed files for flatmm

* changed flatmm to weightPReshuffle; removed the _1 added in teh faltmm example

* some more renames

* clang formatted

[ROCm/composable_kernel commit: d239b91fd5]
2025-07-11 08:27:55 -07:00
linqunAMD
511f170dab [CK_TILE] Refine fp8 support in flatmm (#2239)
* [CK_TILE] Refine fp8 in flatmm

1. Replace USING_MFMA_16x16x32 & USING_MFMA_16x16x32 with constexpr
2. Add an additional const check to avoid build error in HotLoopScheduler
3. Refine shuffleb to support both tile 32x32 and 16x16
4. Support command option -init
5. Move Gemm warp defintion to a separate struct

* fix clang format

* fix clang format

* keep default bhavior unchanged (warp tile = 16x16)

* fix tile engine build error

* fix a typo in codegen_utils.py

* address review comments

* address review comments

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: 37e1a27537]
2025-06-25 01:07:45 -07:00
John Afaganis
3ef7712ee3 Add missing copyright headers (#2359)
* Add missing copyright headers

* empty commit

[ROCm/composable_kernel commit: df54667102]
2025-06-17 14:29:45 -07:00
Thomas Ning
bc6af0fa49 Fix the CK Tile related operators (#2356)
* fix the flatmm

* Fix the pipeline

* address the comment

[ROCm/composable_kernel commit: 3c4cdfac4f]
2025-06-16 17:38:52 -07:00
Illia Silin
0f4d68633b Revert "fix the flatmm (#2349)" (#2352)
This reverts commit fc65195605.

[ROCm/composable_kernel commit: 5523df4b2d]
2025-06-16 07:54:55 -07:00
Thomas Ning
fc65195605 fix the flatmm (#2349)
[ROCm/composable_kernel commit: d996bc78be]
2025-06-16 02:17:53 -07:00
Khushbu Agarwal
7afee6c536 fix flatmm kernel for bigger size for fp16 datatype (#2302)
[ROCm/composable_kernel commit: bd270fe4bc]
2025-06-10 11:13:40 -07:00
Khushbu Agarwal
2b6621fba8 Rotating buffer PR CI fix (#2257)
* Revert "Revert "[CK_tile] Add rotating buffer feature for universal gemm (#2200)" (#2256)"

This reverts commit 7baac527a1.

* fix regression

[ROCm/composable_kernel commit: 2e38eb4f1c]
2025-06-02 10:25:01 -07:00
Illia Silin
7baac527a1 Revert "[CK_tile] Add rotating buffer feature for universal gemm (#2200)" (#2256)
This reverts commit 0f77aa335d.

[ROCm/composable_kernel commit: bbdaf79a52]
2025-05-28 09:46:52 -06:00
Khushbu Agarwal
0f77aa335d [CK_tile] Add rotating buffer feature for universal gemm (#2200)
* Add rotating buffer feature for universal gemm

* adding changes in tile_engine

* Updated code to merge kernel_launch

* removing comments

* Enable rotating buffer changes to flatmm

* Created diff launch_kernel function for rotating buffer

* Simplfied calculation using macros

* merge code with new changes in tile_engine

* clang formatted

* Redefine macros

[ROCm/composable_kernel commit: 99857e10e6]
2025-05-27 23:00:58 -07:00
Aviral Goel
9c99bdede7 Add catch blocks in example GEMM apps to enable better error handling (Issue: 1928) (#2234)
* added catch statements to examples

* clang format

[ROCm/composable_kernel commit: c52649ad57]
2025-05-27 22:32:42 -07:00
Illia Silin
24057b3662 fix the buffer intrinsic names for clang >=20 (#2228)
[ROCm/composable_kernel commit: 8146e471f1]
2025-05-23 14:58:25 -07:00
Illia Silin
9c7b0a65f9 Revert "Update the buffer load/store intrinsic names for clang>=20. (#2192)" (#2227)
This reverts commit 7d92e48278.

[ROCm/composable_kernel commit: 1b846143c6]
2025-05-22 15:41:17 -07:00
Illia Silin
7d92e48278 Update the buffer load/store intrinsic names for clang>=20. (#2192)
* fix the buffer load/store intrinsic names

* fix clang format

[ROCm/composable_kernel commit: 58f9e9ffbc]
2025-05-13 10:18:14 -07:00
BingYuan.Zhou
977b7d0928 Flatmm merge (#2168)
* sync with function interface of cshuffleepiloge,fix flatmm build fail

* move code from solin/flatmm which add mfma16*16*32fp8 and optimize flatmm

---------

Co-authored-by: solin <bingzhou@amd.com>

[ROCm/composable_kernel commit: 6a3960c1e1]
2025-05-08 12:59:57 +08:00
Illia Silin
df8daef645 Revert "Add ck tile examples to package (#1880)" (#2150)
[ROCm/composable_kernel commit: 9a9f59ae69]
2025-04-30 10:20:16 -07:00
jakpiase
d48f3b585b Add ck tile examples to package (#1880)
* add ck tile examples to package

* Update jenkinsfile

* fix for jenkinsfile

* fix for building ck tile code on non gfx9

* compile ck tile examples only for gfx94

* include ck tile examples in all target

* fix for basic gemm UseStructuredSparsity

* Update CMakeLists.txt

* Update gemm_pipeline_problem.hpp

* add targets to rocm install

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 434d19f696]
2025-04-28 09:53:19 -07:00
BingYuan.Zhou
4ec293cb4b [flatmm] implement basic fp16 flatmm (#2089)
* [flatmm] implement basic fp16 flatmm

* fix CI build fail

---------

Co-authored-by: root <root@hjbog-srdc-50.amd.com>
Co-authored-by: solin <bingzhou@amd.com>

[ROCm/composable_kernel commit: eaf1f0bf3b]
2025-04-16 16:51:17 +08:00