Commit Graph

1644 Commits

Author SHA1 Message Date
Bartłomiej Kocot
d86a5e1418 Add Conv NGCHW client example (#1831)
[ROCm/composable_kernel commit: 742f5d6b55]
2025-01-22 01:02:03 +01:00
Mateusz Ozga
91965f3411 Simplify static_cast if-lands (#1828)
[ROCm/composable_kernel commit: 3db77bc4f2]
2025-01-21 23:23:19 +01:00
Mateusz Ozga
b63e4bc4b8 CK-Tile Grouped GEMM refactor and post PR fixes (#1756)
* Grouped gemm simple code refactor

* Offset invoker

* Invoke generic Run, and replace name of parrtitioner variable

* Tests fix type

* Removed namespaces

* Add template param to avoid implicit cast

* Remove generic function

* Constant value

* underline enum to int16_t

* Generalize partitioner function

* Remove whitespaces

* Rename function

* Using support

* Clang-format

* Clang-format

* Fn-partitioner description fn

* Typo

* Typo 2

* Better description

* Better description

* Refactor after review

* Use ctr instead of set fn

* Inovke ctr and typo

* Comments

* Remove unnecessary comment

* Review, remove modulo

[ROCm/composable_kernel commit: 3c93d3c444]
2025-01-21 21:06:10 +01:00
deepsek
c6a6e93628 Added bf16 instances grouped gemm fixed nk (#1825)
* Feat: Add bf16 input instances

* feat: Add BF16 profiler code

* fix: reorder enum types

* fix: CI fail due to clang-format

* fix: clang script format issue

* fix: clang format broke cmakelist file

[ROCm/composable_kernel commit: e7dce4d247]
2025-01-20 09:13:09 -08:00
lucbruni-amd
80a206156b Add CK_TIME_KERNEL as toggleable CMake Variable (#1794)
* Disable CK_TIME_KERNEL by Default, Add as CMake Variable

* Enable CK_TIME_KERNEL by Default, Maintaining CMake Variable Functionality.

* Fix build error.

[ROCm/composable_kernel commit: 3fb2f5acc7]
2025-01-20 07:09:19 -08:00
Mingtao Gu
462b62be6b fix a bug for int4 scale weight only kernel (#1820)
Co-authored-by: mtgu0705 <mtgu@amd.com>

[ROCm/composable_kernel commit: 86d1b46aa6]
2025-01-19 11:18:18 +08:00
Bartłomiej Kocot
6472bdb4ed [CK_TILE] Add error threshold calculation for gemm examples (#1821)
[ROCm/composable_kernel commit: bdddf1eace]
2025-01-18 01:01:52 +01:00
deepsek
cb909428d0 fix: preprocessor directives logic error if/else (#1764)
* fix: preprocessors logic error if/else

* fix: added macros as preferred by CK team

[ROCm/composable_kernel commit: 0fcbb25f70]
2025-01-16 20:31:15 -08:00
Aviral Goel
552845ecd0 Implementing Test Filters for Smoke and Regression Tests (#1819)
* smoke and regression targets working with tests

* test filters work for both examples and test

* removed uneccesary comments

* added a missing comment

* added a missing comment

* fixed typo in the comments

* updated README

* Update PULL_REQUEST_TEMPLATE.md

updating the template for future addition of test cases

* Update PULL_REQUEST_TEMPLATE.md

[ROCm/composable_kernel commit: 54de3e55e1]
2025-01-16 16:40:08 -08:00
Bartłomiej Kocot
e65a010b5e Fix and optimize dynamic unary elementwise (#1818)
* Fix and optimize dynamic unary elementwise

* fix

[ROCm/composable_kernel commit: 1519ce91a3]
2025-01-16 13:48:39 -08:00
carlushuang
2fec988802 [CK_TILE] Fix mock token id, support g1u1/g1u0 through same inline code block (#1808)
* fix mock token id

* prepare host for g1u1

* reformat inline-asm

* restructure uk_0

* restructure gate_up

* done

* change default to init=1

* update readme

* fix a bug in interleave pipeline

* rcp for silu

[ROCm/composable_kernel commit: 1ff50e78c6]
2025-01-16 17:51:10 +08:00
Illia Silin
9955ac560b disable inductor codegen tests on legacy OS (#1816)
[ROCm/composable_kernel commit: 8c29e06f3c]
2025-01-15 12:11:54 -08:00
Bartłomiej Kocot
2c4a1cce43 Add rounding for float to bf16 conversion as default (#1812)
* Add rounding for float to bf16 conversion

* Add bhalf test

* Add inf test bhalf

* Refactor

* update cmake

* Fixes

[ROCm/composable_kernel commit: 7790e8c3f7]
2025-01-15 16:41:21 +01:00
ruanjm
9f9eddd0cf [CK_TILE] Add Various Fusion Functions to RMSNorm (#1802)
* Add shortcut to RMSNorm

* Modify test for adding shortcut for RMSNorm

* Add fused parameter into tests

* 1. Add YDataType. 2. rmsnorm2d_fwd_traits_ from rmsnorm2d_fwd.hpp to rmsnorm2d_fwd_api.cpp and rmsnorm2d_fwd_instance_common.hpp

* 1. Supports various stride and percisions.

* Add support of Epilogue

* Add fuse and epilogue support to rmsnorm ref

* Modify rmsnorm example

* Refactor tests/examples

* Bug fix for newly added tests/examples

* Bug fix for new tests 2

* Modify smoke test scripts

remove dbg code

* Supports non-smooth dyanmic quant

* Update Rmsnorm2dFwd::GetName()

* rename xscale and prec_sx to smoothscale and prec_sm

Bug fix after rename

Remove files

* change example_rmsnorm2d_fwd.cpp

* update performance calculator

* Fix issue in two-pass when fuse add is enabled

* Remove comment of beta

---------

Co-authored-by: rocking <ChunYu.Lai@amd.com>

[ROCm/composable_kernel commit: 04dd314883]
2025-01-15 10:23:48 +08:00
Max Podkorytov
51c8a8e291 fix parsing instances for pt inductor (#1796)
add unit test for gen instances for gemms

add unit tests for conv and batched gemms

add unit test for preselected gemm instances

apply ruff lint

add license header for the unit test

add inductor pytest to CI

verbose pip install

switch the directory before installing python packages

move the inductor codegen test

try yet another workdir

Update Jenkinsfile

The directory looks right, fixing pip module not found by invoking pip directly

Update Jenkinsfile

invoke pytest directly since the module is not found

Update Dockerfile

Install setuptools

update package structure

bump setuptools

maybe fix data path for library sources

fix library search path for conv instances

fix path in pyproject definition

compare path used in gen_instances with one in pyproject.toml; fix the difference

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: c0b90f130f]
2025-01-13 13:51:08 -08:00
feli
1f484facd3 Dev/merge u8w8 (#1774)
* port tiles from a8w8

* rm debug used files

* add instances

* remove all non gemm in cmake

* merge; impl fp16

* recover cmake from develop

* add missed files; fix clang format

---------

Co-authored-by: coderfeli <coderfeli@163.com>

[ROCm/composable_kernel commit: 53ab1b9047]
2025-01-13 10:25:14 -08:00
Thomas Ning
70e79bc56f CK Tile GEMM CICD fixed & register block method refactor (#1776)
* refactor the block_gemm_areg_breg_creg_v1 and add the v2 policy with 2x2 warp gemm

* Finished the 2x2 warp gemm policy and the block selection mechanism

* Clang format

* address poyen's comment

* Address feedbacks

* Fixed the compilation issue

* Change the function name

[ROCm/composable_kernel commit: 5d671a5fc4]
2025-01-13 13:10:44 +08:00
ClementLinCF
bbd54d3dfb [CK_TILE] Adjust kBlockSize of reduce example for better perf (#1779)
* Observed a 2x perf improvement with kBlockSize = 256
* Using 512 threads may lead to redundant computations

[ROCm/composable_kernel commit: 0b8f117f1a]
2025-01-12 20:50:32 -08:00
Qianfeng
3cc02417a9 Update for fmha_fwd qs_ks_vs pipeline (#1810)
* Update for fmha_fwd qs_ks_vs pipeline

* Remove _builtin_amdgcn_sched_barrier(0)

* Move p_compute to p converting earlier for trying to increase vgprs re-using

* Enable GetQKBlockGemm to use WarpGemm-16x16x16 for QLoadOnce==false situation

* Re-add __builtin_amdgcn_sched_barrier(0)

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

[ROCm/composable_kernel commit: 3d50f57f43]
2025-01-13 12:43:05 +08:00
Bartłomiej Kocot
da7d6023cf Grouped convolution backward weight special vector size loads (#1772)
* Grouped convolution backward weight special vector size loads

* Instnaces and tests

* Fixes

* Add 7 and 13 special cases

* fix comments

* Fix

* Fix2

* fixes

* fix atomic add bf16

[ROCm/composable_kernel commit: fd46a01d8b]
2025-01-10 22:02:30 +08:00
Thomas Ning
dc1b18eebf Ck tile/gemm perf measure (#1750)
* Finished adding the performance benchmark for ck tile gemm

* Fix the executable rename problem

* fix the executable name error

* delete the unsupported layout combinations

* Update run_full_test.sh

* Update benchmark_mem_pipeline.sh

* Update benchmark_basic.sh

* change the executable of gemm_universal

* change ck_tile_gemm script permissions

* Addressed the comment

* Addressed the comment

* Fixed the comments

* Fixed Comment

* roll back the malfunctioned change

* Fix the Typo

* finalize the tile_gemm_fp16 performance monitoring

* fix the stash names for ck_tile gemm logs

* change the stashing logic

* change stashing syntax

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 73a076eee1]
2025-01-09 17:41:49 -08:00
darren-amd
6bc57cf274 Disable building DPP kernels by default (#1804)
* Disable building DPP kernels by default

* Disable building dpp instances, examples, or tests if DPP_KERNELS is not set

* Add new DPP_KERNELS flag to readme

[ROCm/composable_kernel commit: 26b3829c02]
2025-01-08 13:50:42 -05:00
Max Podkorytov
4c98908e17 mark unused args
[ROCm/composable_kernel commit: ad697c78ac]
2025-01-08 10:09:54 -08:00
Max Podkorytov
d7a2a81051 run clang-format -style=file
[ROCm/composable_kernel commit: a2e6ad62e2]
2025-01-08 10:09:54 -08:00
Max Podkorytov
e1896982b5 run clang-format==12
[ROCm/composable_kernel commit: aa59ecaa22]
2025-01-08 10:09:54 -08:00
Max Podkorytov
715635839b update comment in the policy
[ROCm/composable_kernel commit: 82fb3f84fb]
2025-01-08 10:09:54 -08:00
Max Podkorytov
00c32ecda2 update qsksvs comment
[ROCm/composable_kernel commit: 4daa82b451]
2025-01-08 10:09:54 -08:00
Max Podkorytov
099e23be84 remove dead code
[ROCm/composable_kernel commit: 66c5b715c9]
2025-01-08 10:09:54 -08:00
Max Podkorytov
63cc962000 clang-format and remove dead code
[ROCm/composable_kernel commit: edb78a4729]
2025-01-08 10:09:54 -08:00
Max Podkorytov
25fdfe3df8 roll back splitkv
[ROCm/composable_kernel commit: 60113859fa]
2025-01-08 10:09:54 -08:00
Max Podkorytov
d3d53433aa update qsksvs pipeline
[ROCm/composable_kernel commit: bfc997a7e6]
2025-01-08 10:09:54 -08:00
Max Podkorytov
1d7c38642c qsksvs pipeline changes to mirror qrksvs
[ROCm/composable_kernel commit: f7942b993c]
2025-01-08 10:09:54 -08:00
AMD-dteng
12103d0f17 enable bias feature that add bias before adding residual (for rtpllm project) (#1741)
* 1. enable bias feature that add bias before adding residual; 2. change block size from 128->64 when m<64 in fp16

* delete comment

* 1.remove fmha change 2.change buffer name from bias to xbias

* Now bias can be used independently from fadd

* change kbias to kxbias

---------

Co-authored-by: feli <felix.li@amd.com>

[ROCm/composable_kernel commit: d5c8a334ca]
2025-01-08 17:51:06 +08:00
spolifroni-amd
76a8207ac0 Update LICENSE to 2025 (#1797)
[ROCm/composable_kernel commit: a6b761c39a]
2025-01-07 08:29:40 -08:00
dependabot[bot]
5e21ea1b59 Bump rocm-docs-core from 1.12.1 to 1.13.0 in /docs/sphinx (#1798)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.12.1 to 1.13.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.12.1...v1.13.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 9f6bf9ab26]
2025-01-07 08:26:31 -08:00
Po Yen Chen
ff1d9f88fa [CK_TILE] fmha fwd splitkv optimization for decode (seqlen_q=1) (#1789)
* Update license year

* Add initial code to override decode problem

* Fix splitkv traits/args overriding error

* Reshape and transpose lse for decode

* Remove debug code

* Prettify example code

* Use better function name

* Add kMergeNumHeadGroupsSeqLenQ flag

Kernel user can use this switch to turn on/off optimization for
some problem sizes

* Add missing flag declarations

* Default turn off kMergeNumHeadGroupsSeqLenQ in codegen

* Group similar statements together

* Remove assumption of seqlen_q=1

* Remove kMergeNumHeadGroupsSeqLenQ from splitkv combine kernel

* Support kMergeNumHeadGroupsSeqLenQ=true in fmha splitkv kernel

* Run kMergeNumHeadGroupsSeqLenQ=true kernels when need

* Fix group mode block skip logics

* Undo changes of normal fwd kernel

* Update in GridSize() and using GridSize() for splitkv kernel (#1799)

---------

Co-authored-by: Qianfeng <qianfeng.zhang@amd.com>

[ROCm/composable_kernel commit: 24b12d04af]
2025-01-07 18:49:24 +08:00
Bartłomiej Kocot
a064792e96 Fix universal gemm profiler for pk_i4_t (#1790)
* Fix universal gemm profiler for pk_i4_t

* fix

[ROCm/composable_kernel commit: 888317e698]
2025-01-04 14:01:33 +01:00
dependabot[bot]
19603c0e45 Bump rocm-docs-core from 1.12.0 to 1.12.1 in /docs/sphinx (#1788)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.12.0 to 1.12.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.12.0...v1.12.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 37b3514648]
2025-01-03 17:47:48 -08:00
Illia Silin
68c7f53cb1 terminology clean-up (#1792)
[ROCm/composable_kernel commit: 8ea375bb58]
2025-01-03 16:38:22 -08:00
carlushuang
60e814a3ba [CK_TILE]naive attn support FP8 KVCache quant (#1747)
* quant

* fix bug

* simple smoothquant after softmax

* update kv-quant

* update stride

* fix fp8-pertoken-kvcache

* update int8/fp8 quant support

---------

Co-authored-by: so <a.com>
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

[ROCm/composable_kernel commit: 6df5fe2ad8]
2025-01-03 18:43:07 +08:00
Mingtao Gu
d4a8c6c2ed Implement the fp16xint4 scale weight only kernel for Ali (#1786)
* enable int4 scale (weight only) kernel

* format some files

* Add unit test for int4 weight only

* fixed and formatted code

* fixed

* formated

* formated

* fixed

* fixed a bug in the ckProfiler, and formatted the code

---------

Co-authored-by: mtgu0705 <mtgu@amd.com>

[ROCm/composable_kernel commit: 4f62f6e9b7]
2025-01-03 18:35:21 +08:00
feli
5ce28a1d13 Ck tile/layernorm: implement naive reduce, opt performance (#1784)
* add no welford

* enable output raw

* raw of int8

* fix build

* fix smoke test err

* [ck_tile]layernorm: fix welford ok, set int8 and bf16 small N as default and others open by generate

* [cktile]layernorm, fix err commit files and remove uselss

* fix quant 8192 err & change norm_reduce class and file name

---------

Co-authored-by: coderfeli <coderfeli@163.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>

[ROCm/composable_kernel commit: 4bc610416a]
2025-01-03 14:28:59 +08:00
John Afaganis
de674980aa Add afagaj to CODEOWNERS (#1787)
[ROCm/composable_kernel commit: 17e8efb573]
2025-01-02 20:50:07 -06:00
Muhammed Emin Ozturk
222b1d6b48 BF16 GEMM Stream-K (#1541)
* initial

* Cmake file

* successfull compilation but validation failed

* Cmake

* update

* gpu validation

* gemm universal

* gemm universal sk update

* sk bf16 universal instance

* gemm_universal_streamk.hpp

* only build for gfx94

* Cmakelist

* profiler update, bf16 sk only works at gfx42

* clang

* clang

* clang all

* no need flags

* cmake script

* delete comment

* gemm universal sk fix

* clang

* profiler fix

* clang

* update

* update

* delete comment

* code formatting

* cmake

* fix instance

* clang

* argument supported

* argument supported and clang

* update

* fix

* removing unnecessary comments

* clang formatting

* Update library/src/tensor_operation_instance/gpu/CMakeLists.txt

Co-authored-by: afagaj <john.afaganis@gmail.com>

* CopyRight Comment 2025

* clang reformatting

* copy right 2025

---------

Co-authored-by: Emin Ozturk <ozturk.27@osu.edu>
Co-authored-by: root <root@ctr-ubbsmc16.amd.com>
Co-authored-by: Muhammed Emin Ozturk <meozturk@t004-008.hpcfund>
Co-authored-by: root <root@splinter-126-wr-d3.amd.com>
Co-authored-by: Muhammed Emin Ozturk <meozturk@t006-001.hpcfund>
Co-authored-by: Muhammed Emin Ozturk <meozturk@login1.hpcfund>
Co-authored-by: Muhammed Emin Ozturk <meozturk@t004-004.hpcfund>
Co-authored-by: Emin Ozturk <emin.ozturk@utah.edu>
Co-authored-by: Muhammed Emin Ozturk <meozturk@t008-001.hpcfund>
Co-authored-by: afagaj <john.afaganis@gmail.com>

[ROCm/composable_kernel commit: 9e95d54cd2]
2025-01-02 10:30:04 -08:00
Adam Osewski
ac74520ff6 Jing's contribution: prototype of mixed precision gemm FP16/BF16xint4 GEMM (#1762)
* add a prototype of int4

* clean

* debug

* clean

* clean

* move packed into dynamic_buffer

* fixed coord reset

* add fast pki4 to half conversion

* fix

* fixed reference and host_tensor

* fixed tensor init

* format

* debug i4_to_f16_convert

* format

* fixed splitk

* weight permute

* add b tile permute

* clean

* weight permute with splitki

* format

* improve weight layout

* add and_or_b32

* fixed splitk crush

* add permute switch as a template

* recover v3r1

* clean

* failure with intrawave v2

* fixed

* fixed

* add ckProfiler

* add bfp16 support

* add bf16 example

* fixed int4 to bhalf_t conversion

* format

* fixed int4 to bf16 conversion

* clean

* add instances for mem

* clean

* fixed host tensor size

* fixed

* debug

* fixed

* add pk_i4_t as a struct

* fix

* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* revert

* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* fixed comments

* revert

* clean

* revert

* revert

* fixed

* Update CMakeLists.txt

* Update script/cmake-ck-dev.sh

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update include/ck/tensor_operation/gpu/element/unary_element_wise_operation.hpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update CMakeLists.txt

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* fixed

* fixed

* fixed

* revert

* revert

* add comments

* format

* fixed assert

* fixed

* Fix I4 define in ckProfiler

* Fixed example_gemm_xdl_bf16_pk_i4_v3 test failed issue

---------

Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: mtgu0705 <mtgu@amd.com>

[ROCm/composable_kernel commit: 1d8e4ec2ce]
2025-01-02 11:48:06 +08:00
Bartłomiej Kocot
a860c20099 Add NGCHW bf16 grouped conv fwd instances (#1783)
* Add NGCHW bf16 grouped conv fwd instances

* add missed cmake

[ROCm/composable_kernel commit: 159fa31946]
2025-01-01 18:00:06 +01:00
Qianfeng
8c1883a424 Remove using partitioner for all fmha kernels (#1778)
* Remove using tile partitioner for fmha_fwd_kernel

* Remove using tile partitioner for fmha_fwd_splitkv and splitkv-combine kernels

* Remove using tile partitioner for fmha_fwd_appendkv kernel

* Unify the format of GetTileIndex

[ROCm/composable_kernel commit: 4e076909b6]
2024-12-29 14:29:56 +08:00
Bartłomiej Kocot
7fbc8a9ac1 [CK TILE] GEMM and Batched GEMM SplitK support (#1724)
* [CK TILE] Add split K support in GEMM

* Updates

* Fixes

* rebase

* fix

* Fix

* fixes

* support for batched gemm

[ROCm/composable_kernel commit: af66494880]
2024-12-28 14:40:17 +01:00
Po Yen Chen
1e65b3ab35 Correct the dtype checking logics (#1775)
[ROCm/composable_kernel commit: 4c2eff023a]
2024-12-25 23:57:28 +08:00
carlushuang
4c4be7b14f [CK_TILE] optimize moe-sorting kernel (#1771)
* opt moe sorting

* remove commented code

[ROCm/composable_kernel commit: 3d15f364b3]
2024-12-23 10:59:02 +08:00