spolifroni-amd
d0e5159fea
Merge pull request #2990 from spolifroni-amd/spolifroni-amd/cherry-pick-702
...
Docs cherry-pick for 7.0.2
2025-10-08 11:07:44 -04:00
spolifroni-amd
86d5709ec3
Improving the contribution page ( #2804 )
...
* edited the contribution page to remove a broken link
* smoothed language; added a link
* updated link to install
* Adding contribution guide for PRs.
* additional editing
* Update docs/Contributors_Guide.rst
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com >
* Update docs/Contributors_Guide.rst
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com >
* Update docs/Contributors_Guide.rst
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com >
* Update docs/Contributors_Guide.rst
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com >
* Update docs/Contributors_Guide.rst
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com >
* Update docs/Contributors_Guide.rst
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com >
---------
Co-authored-by: Vidyasagar Ananthan <vidyasagar.ananthan@amd.com >
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com >
(cherry picked from commit 4759cb5e63 )
2025-10-08 10:27:49 -04:00
spolifroni-amd
11e491ac77
first commit of the glossary ( #2702 )
...
* first commit of the glossary
* minor changes
* Update docs/reference/Composable-Kernel-Glossary.rst
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com >
* Update docs/reference/Composable-Kernel-Glossary.rst
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com >
* Update Composable-Kernel-Glossary.rst
---------
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com >
Co-authored-by: Vidyasagar Ananthan <vanantha@amd.com >
(cherry picked from commit e11f694eda )
2025-10-08 10:27:38 -04:00
spolifroni-amd
6951b9cc61
removed the blog posts as as these are broken links ( #2732 )
...
(cherry picked from commit 1d4a3341f0 )
2025-10-08 10:27:25 -04:00
Val Movsik
571a811391
Merge pull request #2566 from ROCm/release/fix-changelog-7.0
...
Update CHANGELOG.md for ROCm 7.0 RC2
rocm-7.0.2
rocm-7.0.1
20250912-42
rocm-7.0.0
2025-08-07 00:10:29 -04:00
AviralGoelAMD
15a527dada
docs(CHANGELOG): update changelog for rocm 7.0
2025-07-25 14:35:11 +00:00
Val Movsik
b8893b9339
Merge pull request #2508 from ROCm/rocm-rel-7.0-staging_us
...
Cherrypicking for ROCm 7.0RC1
20250912-17
2025-07-22 09:40:50 -04:00
Aviral Goel
ebd5ca8598
Merge pull request #2507 from ROCm/PRXYZ_begin
...
Cherry Picking for 7.0RC1
2025-07-15 17:33:06 -04:00
Andriy Roshchenko
8414b989da
MX GEMM - Add FP6 GEMM Test ( #2488 )
...
* Add F6 GEMM MX Test
* Add BF6 GEMM MX Test
2025-07-15 20:17:46 +00:00
Andriy Roshchenko
9330ea99fb
MX GEMM - FP6 Support in GEMM MX v3 Pipeline ( #2481 )
...
* Add GEMM MX BF6 example
* Fix BF6 type_convert
* Add type_convert for bf16x6
* Add compare operator to f4x2_pk_t
* Update README for 67_gemm_microscaling
* Fix host tensor initialization with integer values for FP8
2025-07-15 20:17:46 +00:00
illsilin_amdeng
92bb1fbd77
replace obsolete warpSize system variable with the new one
2025-07-15 20:16:42 +00:00
Andres Lugo
aadeffde18
Update FMHA recipe for Pytorch SDPA integration ( #2480 )
...
* Add receipts in splitk and appendk
* remove grouped
* Remove logits
---------
Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com >
2025-07-10 09:00:23 -07:00
Illia Silin
1b66f3f4a3
Add declarations for atomic add for fp16 and unsigned short. ( #2483 )
...
* add template for fp16 atomic add
* add template for unsigned short atomic add
* use atomicCAS in atomic add for fp16 and unsigned short
* revrt back to atomic add using casting
2025-07-10 07:18:56 -07:00
Illia Silin
d9b37c7121
Fix blockscale fp8 gemm examples ( #2476 )
...
* fix blockscale fp8 gemm examples
* refactor the compiler flags
* fix hip version calculation
2025-07-10 07:12:13 -07:00
shay-li77
d814fefe18
support y-direction step length greater than 1 for SimplifiedGenericAttentionMask ( #2338 )
...
* mask support ratio for y axis
* format code
* add notes for param y_ratio
* fix comments error
* support template and mdiv for ratio mask
* refactor y-ratio mask constructor
* optimize coordinate calculation
* add SimplifiedRatioAttentionMask
2025-07-09 23:18:55 +08:00
Yi DING
032ca60015
[CK_TILE] Avoid compile kernel in host pass ( #2475 )
2025-07-09 22:27:54 +08:00
Po Yen Chen
ad9863fe05
[CK_TILE] Low CU utilization optimization for fMHA fwd kernels ( #2402 )
...
* Wrap tile size mapping as class method
* Warp pipeline generating as class method
* Add constraint as kernel dispatching criteria
* Support mutltiple tile size for a (hdim, hdim_v) combination
* Use smaller tile size if CU utilization is low
* Use integar as the key of the tile size map
* Fix type error
* Simply override parent class method return value
* Add attribute to eliminate warnging
* Allow using environment variables to turn on/off custom factory
* Unify param naming style
* Add missing HIP runtime include directive
* Fix os.environ.get() usage
2025-07-09 22:01:33 +08:00
Vidyasagar Ananthan
e391b025a0
New ninja tracing script ( #2472 )
...
* Adding ninja log json convertion utility
* Updating to match old ninjatracing
* Updating Jenkins to use new ninjatracing
* Ensuring v7 works
* Removing old ninjatracing from dockerfile
2025-07-08 22:36:50 -07:00
Illia Silin
93420ecf89
Revert "Add templates for fp16 and unsigned short atomic add to fix FBGEMM bu…" ( #2474 )
...
This reverts commit 112b47e885 .
2025-07-08 19:01:26 -07:00
Illia Silin
112b47e885
Add templates for fp16 and unsigned short atomic add to fix FBGEMM builds. ( #2471 )
...
* add template for fp16 atomic add
* add template for unsigned short atomic add
* use atomicCAS in atomic add for fp16 and unsigned short
2025-07-08 18:09:30 -04:00
Vidyasagar Ananthan
33d704a6f9
Separating ninja build tracing and setting flag to false ( #2470 )
...
* Separating ninja build tracing and setting flag to false
* Add ftime-tracing flag
* Fix conditional issue
* Try adding a script block
* Embed Clang analysis in ftime trace block
2025-07-08 10:52:00 -07:00
Haocong WANG
5557eadce6
[CK TILE] Fix FA build filter ( #2369 )
...
* Fix for fwd/bwd kernel build filter
* fix bwd code
* cmake depends & bwd filter order fix
* revert unexpected reformat
* Avoid change fmha bwd filter order for downstream compatibility
* Revert unexpected changes
---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
Co-authored-by: Ding, Yi <yi.ding@amd.com >
2025-07-08 10:42:07 +08:00
Illia Silin
e033a1b4bf
fix compilation errors with clang20 ( #2464 )
2025-07-07 19:40:30 -07:00
Po Yen Chen
b2dea90116
Eliminate warning caused by failed to meet occupancy requirement ( #2389 )
...
Co-authored-by: felix <felix.li@amd.com >
2025-07-08 09:17:25 +08:00
Thomas Ning
f240ae3248
Enable Async Copy for MI355 ( #2425 )
...
* add for async load builtin
* add async load api
* fix some compiling errors
* fix a compiling error
* fix some compiling errors
* add a pipeline which copies from v4
* add a new pipeline for async load
* fix some compiling errors
* add async load tests
* fix some issues in async load
* fix
* fix async inline assembly
* fix async inline assembly
* add ignore header file
* comment some not gfx950 codes
* comment some not gfx950 codes
* fix a error
* update async load apis
* fix lds descriptor
* fix a compiling error
* fix some compiling errors
* fix a descriptor issue
* update lds descriptor
* change async pipeline's tile distribution pattern from thread to warp
* fix clang format
* update async policy
* fix a CRTP issue
* fix a typo error
* change lds layout
* fix some sync issues
* improve codes
* delete the async test
* fix a commented format issue
* avoid compiling device functions when compile host
* make gemm run
* add the copy kernel support
* finish the feature
* Address comment
* add the support for buffer_builtin
* solved the merging problem
* Comment Addressed
---------
Co-authored-by: joye <joye@amd.com >
Co-authored-by: joyeamd <John.Ye@amd.com >
2025-07-07 10:08:49 -07:00
Andriy Roshchenko
054f85ab7c
MX GEMM - FP6 Example ( #2419 )
...
Adds support for MX FP6 data type in MX GEMM block pipeline version v1.
Provides an example of MX FP6 GEMM algorithm.
---------
Co-authored-by: OscarXu <huaiguxu@amd.com >
Co-authored-by: aska-0096 <haocwang@amd.com >
Co-authored-by: mtgu0705 <mtgu@amd.com >
Co-authored-by: Your Name <you@example.com >
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: valarLip <340077269@qq.com >
Co-authored-by: Ding, Yi <yi.ding@amd.com >
Co-authored-by: feifei14119 <feiw@amd.com >
Co-authored-by: Lin, Qun <qlin@amd.com >
Co-authored-by: joye <joye@amd.com >
2025-07-07 10:33:26 -06:00
dependabot[bot]
bfe573d3ba
Bump sphinxcontrib-bibtex from 2.6.4 to 2.6.5 in /docs/sphinx ( #2424 )
...
---
updated-dependencies:
- dependency-name: sphinxcontrib-bibtex
dependency-version: 2.6.5
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Aviral Goel <aviral.goel@amd.com >
2025-07-07 07:30:49 -07:00
spolifroni-amd
096bf2de41
updating the doxyfile and the index.rst so that it gets the full API ( #2416 )
...
* updating the doxyfile and the index.rst so that it gets the full API
* added recommended doxygen values
2025-07-07 07:29:36 -07:00
rahjain-amd
ad593c286f
Fixing Debug build ( #2404 )
...
Failed to build `tile_example_fmha_bwd` due to below error
```
/home/rahjain/src/composable_kernel/example/ck_tile/01_fmha/fmha_bwd.cpp:358:30: error: comparison of integers of different signs: 'size_type' (aka 'unsigned long') and 'ck_tile::index_t' (aka 'int') [-Werror,-Wsign-compare]
358 | assert(slopes.size() == nhead);
| ~~~~~~~~~~~~~ ^ ~~~~~
/usr/include/assert.h:103:27: note: expanded from macro 'assert'
103 | (static_cast <bool> (expr) \
| ^~~~
/home/rahjain/src/composable_kernel/example/ck_tile/01_fmha/fmha_bwd.cpp:989:16: note: in instantiation of function template specialization 'run<FmhaBwdFp16>' requested here
989 | return run<FmhaBwdFp16>(arg_parser) ? 0 : -2;
| ^
/home/rahjain/src/composable_kernel/example/ck_tile/01_fmha/fmha_bwd.cpp:358:30: error: comparison of integers of different signs: 'size_type' (aka 'unsigned long') and 'ck_tile::index_t' (aka 'int') [-Werror,-Wsign-compare]
358 | assert(slopes.size() == nhead);
| ~~~~~~~~~~~~~ ^ ~~~~~
/usr/include/assert.h:103:27: note: expanded from macro 'assert'
103 | (static_cast <bool> (expr) \
| ^~~~
/home/rahjain/src/composable_kernel/example/ck_tile/01_fmha/fmha_bwd.cpp:993:16: note: in instantiation of function template specialization 'run<FmhaBwdBf16>' requested here
993 | return run<FmhaBwdBf16>(arg_parser) ? 0 : -2;
| ^
2 errors generated when compiling for gfx942.
```
Fixed with proper cast
2025-07-07 14:46:22 +05:30
ltqin
9f4c5d7372
ck tile pagedkv prefill ( #2405 )
...
* add prefetching physical block id for pagedkv
* start add pagedkv prefill
* rename pipeline
* add kernel for pagedkv
* add an init version pagedkv prefill
* fix redefine issue
* add struct BlockFmhaFwdPagedKVPipelineProblem and fmha_fwd_pagedkv_args
* generate dispatch code
* add body generating code
* comipling pass
* remove dropout from pagedkv
* set lse to false in generating code
* start changing qr kernel to pagedkv
* init version of kernerl with pagedkv
* change names of file that are generated
* chang host validation for pagedkv prefill
* using iglp to change blockgemm
* add kernel files to op head file
* show parameters
* rewrite print parameter fun
* add fwd
* remove default parameter of GridSize
* format
* fix nhead issue and add seqlen_k_ptr to batch mode
* format code
* remove no-longer used code
* format
* fix some comments
---------
Co-authored-by: ltqin <letaoqin@amd.com >
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
2025-07-07 16:16:54 +08:00
carlushuang
0aecb5ab68
default skip y point to r ( #2457 )
...
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
2025-07-06 23:54:34 -07:00
carlushuang
a8742f7e31
[CK_TILE][CORE] enhance slice_tile api ( #2430 )
...
* support slice cross p
* fix some bug in y_len
* more case
* fix a bug when R exist
* support -1 to hint end of current length
* format
* change commit
2025-07-06 20:13:12 -07:00
Mingtao Gu
7998ae8969
[CK] Mxfp4 moe blockscale buf2lds version support ( #2455 )
...
* change cshuffle size
* added mxfp4 moe async buffer loading without B preshuffle
* added mx moe B shuffling + scale shuffling (async loads)
* minor fix
---------
Co-authored-by: mtgu0705 <mtgu@amd.com >
2025-07-06 15:42:00 +08:00
Adam Osewski
3d70c638d1
Always force output clearing for grouped conv bwd data ( #2446 )
...
* Always force output clearing
* dont run set zero for residual
---------
Co-authored-by: Bartlomiej Kocot <barkocot@amd.com >
2025-07-04 07:49:52 -06:00
Mateusz Ozga
394e5be10d
[CK-TILE DOC] Ck-tile grouped GEMM documentation ( #1939 )
...
* Ck-tile readme
* After review
* Review: part1
* Review part 3
2025-07-04 02:56:42 -07:00
Max Podkorytov
158ddeb8ce
[CK-TILE] File-level documentation for static encoding pattern ( #2433 )
...
* add file-level comment
* Finished the write-up
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com >
2025-07-04 02:26:18 -07:00
Vidyasagar Ananthan
2e971eff90
Removing reference to undefined parameter for ignore statement. ( #2447 )
2025-07-03 20:10:29 -07:00
Vidyasagar Ananthan
d2536b91bc
Remove ftime tracing to avoid printing json files ( #2452 )
...
* Remove ftime tracing to avoid printing json files
* Factoring out build commands
2025-07-03 07:54:12 -07:00
Vidyasagar Ananthan
58d24a7172
Adding ddembeck to codeowners. ( #2449 )
...
Co-authored-by: Dave Dembeck <dave.dembeck@amd.com >
2025-07-02 20:47:09 -07:00
damien-lejeune
1183824573
Fix clang in ck develop branch ( #2445 )
...
Co-authored-by: Damien Lejeune <damien.lejeune@amd.com >
2025-07-02 10:07:47 -06:00
chenjun
74a34e0f50
fix KPerBlock = 64 a8w8 bpreshulle gemm build fail in gfx950 ( #2437 )
...
Co-authored-by: valarLip <340077269@qq.com >
2025-07-02 19:12:07 +08:00
Gino Lu
60eb70f543
Fix return value bug that drops minus sign in some cases. ( #2415 )
...
* fix return value bug.
* refine change according to comment.
2025-07-02 14:53:00 +08:00
Aviral Goel
36df1cbd0a
[ckProfiler] Add infrastructure and instances to profile gemm_universal with B preshuffle ( #2427 )
...
* works on mi300
* fix(profiler): add error message for unsupported type/layout
* refactor(preshuffle.inc): add type aliases for code readability
2025-07-01 18:34:52 -07:00
Thrupti Raj Lakshmana Gowda
a03682cb80
Updating Runtime log for CK Tile Engine ( #2431 )
...
* Updating runtime log message for CK TILE ENGINE
* Fixing Clang Format
* Update tile_engine/ops/gemm/README.md
Co-authored-by: Aviral Goel <aviral.goel@amd.com >
---------
Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com >
Co-authored-by: Aviral Goel <aviral.goel@amd.com >
2025-07-01 10:59:49 -07:00
Aviral Goel
e9036a8fc2
Enhancements in precommit_install.sh for Python and CK Tile code ( #2400 )
...
* fix(precommit_install): script now installs packages in virtual env
* fix(precommit_install): installs packages in virtual env
* feat(precommit): added ruff for python linting and formatting
* feat(precommit): added ruff for python linting and formatting
* feat(precommit): run ruff when py files are commited
* feat(precommit): remod.py is run when ck_tile modified
* add empty line at the end
* style(precommit.yaml): remove empty line
---------
Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com >
2025-07-01 01:11:10 -07:00
Vidyasagar Ananthan
2fa9270a25
Fix an earlier static check error due to assignment of variable in Jenkinsfile ( #2420 )
...
* Testing assignment of param fix
* Removing redundant changes
* Adding back unit test runs
* Ensuring Jenkins changes work on develop - to be reverted
* Revert "Ensuring Jenkins changes work on develop - to be reverted"
This reverts commit cf1cab4a43 .
2025-06-28 07:07:14 -07:00
Thomas Ning
28a63d7dcb
Revert "Enable builds on gfx942 by default and run all tests on develop branc…" ( #2418 )
...
This reverts commit 6d6f4c76c1 .
2025-06-27 16:40:10 -07:00
huaiguxu
e1c5172fdb
Huaiguxu/moe fp8 pertoken scale fix ( #2391 )
...
* fix pertoken_scale a_scale dimension
* clang-format
* Fix moe_gemm2_fp8 perTokenScale reference and example.
2025-06-27 10:24:34 +08:00
linqunAMD
1749c0409e
[CK][CONV] Support NCHW in class DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle ( #2375 )
...
1. When conv spec is 1x1 stride1 pad0, nchw is equal with matrix A + column major, we only need minor change in conv transformer to support it.
2. when out is NKHW, it is equal with matrix C with column major. we need swap A & B to get best performance.
3. Add new instance device_grouped_conv_fwd_xdl_f16_nchw_instances for nchw.
2025-06-26 08:32:39 +08:00
Khushbu Agarwal
a14753b86f
Enabling diff datatypes for tile_engine and build with more granularity ( #2392 )
...
* merging recent changes to universal gemm to tile_engine
* Reducing Linking time by generating less intermediate files
* make small libs to build faster
* Reducing the instances
* reducing instances
* Restoring default config
* Restoring default config
* warp_n reverted in default config
* Adding diff json files for fp8 and fp16, cmake changes for fp8
* Restructure the CMake File
* Added more granularity for build and some debugging code
* removed some of debugging statements
* added fp8 instances
* tahe datatype from command line to enable both type of json files
* updated README file
* code cleanup
* code cleanup
* updated jenkinsfile
* enable tile_engine daily builds
* updating cmake file
* updated CMakeLists.txt
* Updating CMake code fixing gfx12 build
* Updating CMake code fixing gfx12 build
* Fix CMake file null checks
* fixed traces of rebase
* Update tile_engine/ops/gemm/README.md
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update tile_engine/ops/gemm/README.md
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update tile_engine/ops/gemm/README.md
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* fixing rebase issue
---------
Co-authored-by: khushbu <khuagarw@gmail.com >
Co-authored-by: ThomasNing <thomas.ning@amd.com >
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com >
Co-authored-by: AviralGoelAMD <aviral.goel@amd.com >
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
2025-06-25 15:18:24 -07:00