Commit Graph

98 Commits

Author SHA1 Message Date
Thomas Ning
753232ea70 [CK Tile] Int8 Support on CK Tile GEMM (#2267)
* updates to support int8 in 03_gemm example

* added comments, using aliases, helper functions

* test(gemm_universal): add test cases for int8 gemm pipeline

* fix(test_gemm): fix for failing test unit test for int8

* test(ck_tile): add int8 unit test for gemm universal

* refactor(gemm_universal): GPU reference verification for GEMM code improved

* style(gemm_universal): removed extra comments and did clang format

* merging recent changes to universal gemm to tile_engine

* ck tile engine integration work

* feat(tile_engine): add int8 support to tile engine ops/gemm

* feat(tile_engine): added 32 32 16 mfma instances to tile engine for int8

* style: Format code with clang-format-12

* refactor(tile_engine): address review comments

* style: removed unhelpful comments & unused variables.

* build: tile engine uses default config

* feat: add int8 support for CK_TILE GEMM

* style: added trailing commas to codegen_utils.py

* refactor: tile engine

* refactor: formatting and code review

* refactor: code formatting for python files

* fix: suppress build warning

* add support for gfx950

* refactor:KWarpTile size in gemms util

* Fix the branch and wrap up the k warp tile

* Add bf8 integration

* refactor: clang format and rebase

---------

Co-authored-by: zjli2013 <leezhengjiang@gmail.com>
Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: Khushbu Agarwal <khuagarw@amd.com>

[ROCm/composable_kernel commit: e03293ebce]
2025-06-25 08:20:35 -07:00
Yi DING
9f0d3497c3 [CK_TILE] FMHA Support hdim_v to as a Multiple of 32 (#2114)
* 160+192

* Add splitkv d160

* cleanup

* fix

* Add change log

* Fix CHANGELOG

* Use static_cast

* Update ignored instance

---------

Co-authored-by: asleepzzz <hanwen.chang@amd.com>

[ROCm/composable_kernel commit: b8212864cf]
2025-06-24 01:33:31 +08:00
Po Yen Chen
7001322416 [CK_TILE] Fix compilation errors introduced in #2320, #2219 and #2214 (#2388)
* Fix compilation errors

* Fix more ck_tile example compilation errors

[ROCm/composable_kernel commit: 7d669440a6]
2025-06-23 12:29:15 +08:00
Thomas Ning
3414888f92 Transpose builtin macro defense (#2374)
* add the macro defense

* add the static assert check

[ROCm/composable_kernel commit: 107e3623c7]
2025-06-20 11:24:54 -07:00
joyeamd
3cb0dd8506 transpose load api development (#2177)
* add transpose load; no real logic

* fix some compile errors

* fix some issues

* update transpose load logic

* add some fixes

* fix a distribution issue

* update some codes

* add some fix

* can pass; but no logic

* transpose load enable

* update tile transpose

* miss output tile distribution mapping

* hack for transpose 16x16

* update output tensor distribution

* delete unused variables

* fix transpose related codes

* update transpose load example

* exchange the iteration order

* fix 16x16 related dimension transpose

* fix a transpose index issue

* fix a transpose index issue

* fix clang format check

* update load tile transpose related codes

* fix compile errors and pass 16x16 tests

* fix a typo

* update logic

* check other data types

* add transpose load api

* update transpose load api

* fix clang format check

* change file name

* refactor codes

* update code name

* delete some unused codes

* delete the unused oob flag for transpose load

* update tensor view api for transpose load

* update for testing

* fix a typo error

* move transpose ops to example directory

* update transpose api

* update include file

* fix for pr review

* fix compile errors

* add transpose load; no real logic

* fix some compile errors

* fix some issues

* update transpose load logic

* add some fixes

* fix a distribution issue

* update some codes

* add some fix

* can pass; but no logic

* transpose load enable

* update tile transpose

* miss output tile distribution mapping

* hack for transpose 16x16

* update output tensor distribution

* delete unused variables

* fix transpose related codes

* update transpose load example

* exchange the iteration order

* fix 16x16 related dimension transpose

* fix a transpose index issue

* fix a transpose index issue

* fix clang format check

* update load tile transpose related codes

* fix compile errors and pass 16x16 tests

* fix a typo

* update logic

* check other data types

* add transpose load api

* update transpose load api

* fix clang format check

* change file name

* refactor codes

* update code name

* delete some unused codes

* delete the unused oob flag for transpose load

* update tensor view api for transpose load

* update for testing

* fix a typo error

* move transpose ops to example directory

* update transpose api

* update include file

* fix for pr review

* fix compile errors

* change directory name

* delete the duplicated directory

* update cmakelists file

* delete the unused codes

* update function names

* update transpose policy

* update code after remod.py

* update codes

* add some comment

* Polish the instr infrastructure

* build up the fixed instr

* redesign the transpose api, currently it has numerical error

* add the bf16 transpose

* fix some issues

* add some comments

* update document

* Finished the refactor of API and pass through the verification

* fix the merging issue

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: a2f01141aa]
2025-06-18 01:28:34 -07:00
Satyanvesh Dittakavi
bde406245a Do not use warpSize as compile time constant as it is removed (#2320)
* Do not use warpSize as compile time constant as it is removed

* Update tile_image_to_column_shape.hpp

update warpSize usage.

* clean-up all use of warpSize, make sure code builds

* fix

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: Bartlomiej Kocot <barkocot@amd.com>

[ROCm/composable_kernel commit: 4c57157d50]
2025-06-17 11:54:30 -07:00
ruanjm
1fdac8b8fe Add support for specifying valid flag when fetching elements for tile_scatter_gather (#2332)
* Add support for specifying valid flag when fetching elements for tile_scatter_gather

Add constexpr for operator[] of TrueGenerator

* Use different path when valid is enabled

[ROCm/composable_kernel commit: b34c234f51]
2025-06-16 17:17:03 +08:00
Mateusz Ozga
6b3ddd0e23 [CK_TILE] Multiple-D GEMM example (#2219)
* Multiple d, initial commit

* Check Ds Layout

* Readme and clang format

* Update branch & conflicts

* Multiple D - fix clang-formatter

* Rename elemetwise_op

* Fix CI

* Code review part1

* Remove printf

* Remove unnecessary comment

* Add new tests with Col layout

* Review part 2

* Added support for Multiple D GEMM

* Update comment

* Remove maybe_unused

* Clang-format

* Review part 3

* Add comment to function

* Add comment to function: another

* Take number of params for a refrence function

* Remove additional d param for 0 tensor

* Change name of function

* Fix CI fails

[ROCm/composable_kernel commit: bd96ac9742]
2025-06-13 19:39:11 +02:00
kylasa
afbc0625f4 Code drop for 2 warp ping pong scheduler along K dimension. (#2276)
* Code drop for 2 warp ping pong scheduler along K dimension.

* Addressing code review comments.

* Addressing Clang formatting issues.

* Addressing build issues.

* Addressing build issues of other GEMM pipelines with ping pong scheduler code drop.

* Fix for LDS memory size for GEMM pipelines.

* Addressing code review feedback comments.

* Change log update.

* Addressing code review comments and build issues.

* Added new policy for pipeline specific logic about LDS needs.

* Clang Fix during build.

[ROCm/composable_kernel commit: 5f1ad09b61]
2025-06-12 18:24:02 -07:00
Thomas Ning
8b534e4037 OCP FP8 Macro restructure (#2331)
* solved the problem

[ROCm/composable_kernel commit: f59b8c7d3d]
2025-06-12 09:46:33 -07:00
John Afaganis
2fdbade459 Remove usage of 'warpSize' variable as it has been deprecated (#2295)
* SWDEV-535598 - remove usage of 'warpSize' variable as it has been deprecated. Ideally get_warp_size() should not be constexpr but this is just a workaround

* SWDEV-535598 - remove comment from get_warp_size as constexpr is required for this repo

---------

Co-authored-by: Gerardo Hernandez <gerardo.hernandez@amd.com>

[ROCm/composable_kernel commit: 6635d1bb88]
2025-06-10 07:34:54 -07:00
MHYangAMD
22250b2784 Fix fmha fwd precision issue on MI3XX series (#2285)
* Fix fmha fwd precision issue on MI3XX series

For fmha fwd fp16 cases, we found that using
impl::cast_tile_pk_fp16_fp32 for casting P would lead to precision
issues, since it uses __builtin_amdgcn_cvt_pkrtz, which is round to zero.

For examaple, fixing K,V to be all 1, and Q is random, which outputs are
expected to be all 1. But we found that it would have some incorrect
outputs 0.9995, which are smaller than the atol 0.001. (1 - 0.9995 =
0.0005 < 0.001) Thus, ck do not report this error.

* Add option to switch rtn/rtz for fmha fwd

[ROCm/composable_kernel commit: 9fcf21a4ec]
2025-06-10 15:03:23 +08:00
Sami Remes
24beb3bc6b [CK_TILE] Tileloop persistent gemm - resubmit (#2299)
* Reapply "[CK_TILE] Tile loop persistent gemm kernel (#2191)" (#2293)

This reverts commit 0c8aea8cb4.

* Add missing header for kentry

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: 1c6f83df6c]
2025-06-06 14:18:49 -07:00
valarLip
06f87e3fb9 extend buffer load to support load 32 bf16/fp16 at same time (#2291)
[ROCm/composable_kernel commit: 8482977a37]
2025-06-06 17:21:19 +08:00
Illia Silin
0c8aea8cb4 Revert "[CK_TILE] Tile loop persistent gemm kernel (#2191)" (#2293)
This reverts commit 0385ef2437.

[ROCm/composable_kernel commit: 233e274077]
2025-06-05 09:24:00 -07:00
Sami Remes
0385ef2437 [CK_TILE] Tile loop persistent gemm kernel (#2191)
* Implement tile loop persistent gemm kernel

* Enable timing

* Add tests for persistent gemm

* Fix formatting

* Fix gemm_basic

* Rename True/False to Persistent/NonPersistent

* Use only one set of layouts for persistent tests

* Fix gemm example persistent template parameter

* Fix formatting

[ROCm/composable_kernel commit: ffb52783d0]
2025-06-04 11:46:28 +03:00
valarLip
671f579903 extend buffer load for fp16/bf16x16 (#2270)
* extend buffer load for fp16/bf16x16

* format

[ROCm/composable_kernel commit: 0fdbf6bcd1]
2025-06-02 10:29:54 +08:00
Illia Silin
24057b3662 fix the buffer intrinsic names for clang >=20 (#2228)
[ROCm/composable_kernel commit: 8146e471f1]
2025-05-23 14:58:25 -07:00
Illia Silin
9c7b0a65f9 Revert "Update the buffer load/store intrinsic names for clang>=20. (#2192)" (#2227)
This reverts commit 7d92e48278.

[ROCm/composable_kernel commit: 1b846143c6]
2025-05-22 15:41:17 -07:00
Aviral Goel
7d47d71bc3 Refactor tile_window.hpp, tile_window_linear.hpp into a CK Tile Hierarchy (#2214)
* window_origin variable now in base class

* abstracted more functions

* consolidated tile_window_static_distribution and tile_window_static_lengths

* clang format

* skeleton code for tile_window and tile_window_linear consolidation

* more abstraction

* moved variables from child to parent

* clang format

* removed comments

* removed debug code

* removed debug code

* abstracting traits WIP

* consolidated traits

* removed comments and clang formatted

[ROCm/composable_kernel commit: 534d4594d0]
2025-05-21 23:28:00 -07:00
Sami Remes
038417bd3d [CK_TILE] Grouped GEMM tile loop (#2146)
* Add trait to use a persistent kernel and split the entrypoints in grouped gemm

* Some helper functions for persistent kernel case

* Get max occupancy grid using device properties

* Implement tile loop in main entry point to grouped gemm

* Enable GridSize() on device

* Handle offset tile index using real current block index

* Add persistent kernel choice to grouped gemm example

* Use a for-loop for iterating over the group

* Reduce VGPR spills by early-exit

* Enable persistent kernel choice in grouped_gemm example

* Add persistent kernel option to grouped_gemm test

* Fix formatting with remod.py

* Remove GridUpdateBlocks as blocks are now iteratively computed

* Add comment about VGPR spilling

* Fix formatting

* Use CK_TILE_HOST instead of __host__

* Enable all Row/Col combinations in grouped gemm unit test

* Add some KBatch=2 cases to grouped gemm tests

* Fix SplitK for grouped gemm

* Enable pipeline hotloop/tailnumber selection in-kernel for grouped gemm

* Add type traits

* Split examples to regular and tileloop

* Formatting

* Use hipExtStreamGetCUMask to get current active CUs for the given stream

* Align test and example kernel config, and disable validation for splitk repeats

* Remove debug options from CMakeLists.txt

* Separate the code paths for persistent/non-persistent in test

* Fix formatting

* Address review comments

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: d1e6f0982d]
2025-05-20 17:18:57 +03:00
Illia Silin
7d92e48278 Update the buffer load/store intrinsic names for clang>=20. (#2192)
* fix the buffer load/store intrinsic names

* fix clang format

[ROCm/composable_kernel commit: 58f9e9ffbc]
2025-05-13 10:18:14 -07:00
Po Yen Chen
2944f929e7 [CK_TILE] Add logits soft-capping & customization support to the FMHA forward kernel/pipelines (#2163)
* hack for cap logits

* fix bug

* Re-format files

* Allow specifying logits_soft_cap through APIs

* Support turn on/off logits_soft_cap in async pipeline

* Do not generate non-verified kernels

* Align receipt used in Aiter

* Sync logits soft-capping across pipelines

* Re-enable some hdim pipelines

* fix perf

* Add attention variant for logits_soft_cap

* Add newline at end-of-file

* Fix performance

* Add comment to explain logits_soft_cap pre-processing

* Unify code

* Unify floating-point literal style

* Use class data member to slience the compilation error

* [CK_TILE] Update attention customizaton interface: add LogitsMask() (#2133)

* Send 'mask' along with variant params to the LogitsMask()

* Send block indices to the variant

* Add indices parameters in variant interface

* Fix fmha bwd codegen error

* Allow switch logits_soft_cap impl

* Eliminate register spills

* Fix compilation errors

* Fix wrong LSE

* Fix LSE for splitkv kernel

* Sync splitkv pipeline changes

* Add batch_prefill kernel/pipeline

* Fix codegen error

* Undo changes in CMakeLists.txt

* Merge pipeline filtering check

* Use different code path if kHasLogitsSoftCap=false

* Remove [[maybe_unused]] attribute

* Use pre-existing compile-time flag to instantiate templates

* Sync pipeline changes

* Update CHANGELOG.md

---------

Co-authored-by: Bernard <bernaliu@amd.com>
Co-authored-by: coderfeli <coderfeli@163.com>

[ROCm/composable_kernel commit: 2920604786]
2025-05-13 12:19:25 +08:00
Thomas Ning
a2866ddb54 Vectorized Transpose for Batched Transpose CK Tile Operator (#2131)
* Shared Memory for single data point

* CKTile Transpose vectorize CP1

* CKTile Transpose vectorize CP2

* CKTile Transpose vectorize CP2.1

* fixed the compile error of the transpose tile 2d

* Have the correct result for the current test sample

* Changes to printing tensor

* fp8 support added

* Debugging for transpose

* solving the corner issue

* Changed padding flag

* Intermideate Debugging

* Intermidiate Debugging

* Intermediate Debugging

* Finished debugging of the transpose op

* Code Cleanup

* Adding edge case smoke tests

* Adding Transpose test to CI/CD

* Adding Transpose test to CI/CD

* Adding Transpose test to CI/CD

* Addressing Review Comment

* Addressing Comments

* Addressing Comments

* Measuring Perf Tests

* Code Cleanup

* Changlog

* Added the running iterations

* clang format

* Fix the changelog

* Fix the compilation error

* change the printing factor

---------

Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com>

[ROCm/composable_kernel commit: 9d1e44e56a]
2025-05-12 00:41:45 -07:00
Aviral Goel
6e3c437ffb [CK_TILE] Add type traits to detect tile window types at compile time (#2158)
* added WindowType enum to tile_window_structs and static assert checks in computev4 pipeline

* added type traits instead of enum to tile_window() and tile_window_linear() with debug comments

* removed comments, added documentation and clang format

[ROCm/composable_kernel commit: 769336b640]
2025-05-07 00:00:39 -07:00
carlushuang
652263d073 [CK_TILE] optimize moe sorting kernel, boost large context case up to 20x (#2153)
* combine 2-3 as single stage

* support zeroing

* improve long tokens

* update specialization

* b16 ws

* 8bit topk optimize

* update 15 example

[ROCm/composable_kernel commit: 4e9b76f88c]
2025-05-06 17:32:07 +08:00
Aviral Goel
87f6ac8949 Add documentation for ck_tile::array<T,N> (#2078)
* addded documentation for ck_tile::array<T,N>

* clang format fix

* spelling errros

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* spelling errros

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Revert "spelling errros"

This reverts commit 4179e7d193.

* Revert "spelling errros"

This reverts commit 3f90733dbe.

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
Co-authored-by: John Afaganis <john.afaganis@amd.com>

[ROCm/composable_kernel commit: 1d8ef40760]
2025-04-30 16:43:36 -07:00
Yi DING
790e7bbd4b Fix fp8 convert & add option for basic example (#2129)
[ROCm/composable_kernel commit: 8add2cf45d]
2025-04-27 16:26:05 -07:00
Thomas Ning
1b61d3a0ed Solve the Static Encoding Pattern compile error when the tile size is too small (#2079)
[ROCm/composable_kernel commit: 269f4f6af5]
2025-04-13 20:09:30 -07:00
Juan Manuel Martinez Caamaño
7a42b06988 Replace inline assembly with builtins in FHMA (#2067)
* Replace inline assembly with builtins in FHMA

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: f14e648e7c]
2025-04-10 09:48:37 +02:00
Illia Silin
7546e4bafe enable gfx115x support (#2065)
[ROCm/composable_kernel commit: 3e6d21adeb]
2025-04-09 10:06:42 -07:00
MHYang-gh
62ce5b906b Make buffer coherence configurable in tensor view (#2041)
* Make buffer coherence configurable in tensor view

* Fix clang-format for tensor_view.hpp

[ROCm/composable_kernel commit: 03ce8729fd]
2025-04-08 15:34:11 -07:00
Max Podkorytov
26724086f3 simplify generate_tuple (#2043)
[ROCm/composable_kernel commit: 6ce0797dad]
2025-04-08 09:00:51 -07:00
aledudek
6dbaeb5fe8 [CK_TILE] Fix GEMM Memory Pipeline (#2034)
* [CK_TILE] Fix GEMM Memory Pipeline

* Fix transpose tile

* Add comments

[ROCm/composable_kernel commit: 80aae6119b]
2025-04-08 12:40:04 +02:00
Illia Silin
ada1b5f341 Split env.hpp header from the ck.hpp header. (#2049)
* split env.hpp out of main headers

* fix namespace logic

[ROCm/composable_kernel commit: 572cd820ce]
2025-04-03 15:30:21 -07:00
Seunghoon Lee
345ab65612 Fix Windows build. (#2012)
* Remove duplicate using uint64_t.

* Cast before shift.

[ROCm/composable_kernel commit: df32020f93]
2025-04-01 12:22:10 -07:00
Max Podkorytov
cf08db04a6 add a fast compilation path for static for (0..N) (#2005)
* add a fast compilation path for static for (0..N)

* Update functional2.hpp

add comment and put range applier into detail namespace

* Update functional.hpp

ditto for ck-tile

* prettify

* prettify more

* add comment

* clang-format

[ROCm/composable_kernel commit: c59a8bb206]
2025-04-01 12:06:25 -07:00
felix
20ffa0f474 hotfix fix sorting int64 (#2025)
* fix sorting int64

* clang format

* fix example issue

* update WA issue #

---------

Co-authored-by: coderfeli <coderfeli@163.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>

[ROCm/composable_kernel commit: a82f338fb9]
2025-03-28 11:31:52 +08:00
carlushuang
f2dd57b76f Reapply "[CK_TILE] support hdim=192/128 pair for deepseekv3 (#1961)" … (#1971)
* Reapply "[CK_TILE] support hdim=192/128 pair for deepseekv3 (#1961)" (#1969)

This reverts commit b92caa3d84.

* fix codegen problem

* Update config.hpp

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 3e81279d26]
2025-03-13 11:41:39 +08:00
Illia Silin
a7614ad594 use old instrinsics with staging compiler (#1970)
[ROCm/composable_kernel commit: 4c97cc511e]
2025-03-12 07:29:09 -07:00
Illia Silin
b92caa3d84 Revert "[CK_TILE] support hdim=192/128 pair for deepseekv3 (#1961)" (#1969)
This reverts commit 45fbd9210a.

[ROCm/composable_kernel commit: 8cbcd3e0d0]
2025-03-11 10:40:18 -07:00
carlushuang
45fbd9210a [CK_TILE] support hdim=192/128 pair for deepseekv3 (#1961)
* support hdim=192/128 pair

* remove useless print

* update

[ROCm/composable_kernel commit: 7a93b16ff6]
2025-03-11 21:07:40 +08:00
Juan Manuel Martinez Caamaño
31b75f6fdd Fix typo: v_offset used in initialization of v_offset (#1951)
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 0f62dd9928]
2025-03-06 17:37:29 -08:00
kylasa
676d236a5e Addressing (Post Merge) code review comments for PR 1845 (#1883)
* Addressing code review comments.

* Addressing code review comments.

* Reorganized code for better readability.

* add ck_tile gemms for new types in CI

* fix jenkins syntax

* fix script syntax

* Add the test cases back

* Address the review comments

* Address review comments

* clang format

* Solve the merging issues

* Addressed the comments

* clang format

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: 66c5f5b0b6]
2025-03-06 11:40:30 -08:00
carlushuang
31b9e40b3d [CK_TILE][HOTFIX] WA for address space by disable it completely (#1947)
* port all moe changes from ck_moe_gemm branch

* refine codes in the pr

* fix tail odd

* fix clang format

* fix clang format2

* make hot loop scheduler compatible with 16x16 and 32x32

* clang format

* fix per token quant

* rename moe example

* clang format

* WA for address space by disable it completely

* hot fix moe gemm2

---------

Co-authored-by: coderfeli <coderfeli@163.com>
Co-authored-by: feli <felix.li@amd.com>

[ROCm/composable_kernel commit: c12fb0a624]
2025-03-06 12:01:25 +08:00
Illia Silin
9d24409070 Replace buffer load/store intrinsics with builtins (#1876)
* replace buffer load/store intrinsics with builtins

* fix clang format

* replace buffer load/store intrinsics with built-ins in ck_tile

* fix clang format

* add switch between buffer intrinsics and built-ins

* change the builtins threshold to clang20

* fix clang format

* fix some compilation errors

* revert changes in ck_tile

* revert changes in ck_tile

* delete all root files and folders when CI completes

* try changing the username in CI

* fix groovy syntax

* add user and group id info to ci dockers

* change ownership of all files in CI to jenkins at the end

* update changelog

[ROCm/composable_kernel commit: a88bf76ecc]
2025-03-05 14:33:28 -08:00
Illia Silin
53b89436b2 remove support for gfx940 and gfx941 targets (#1944)
* remove support for gfx940 and gfx941 targets

* update changelog

[ROCm/composable_kernel commit: 9b51c08bf7]
2025-03-05 11:07:33 -08:00
Juan Manuel Martinez Caamaño
c6c30f7486 Missing _ in __HIPCC__ (#1930)
[ROCm/composable_kernel commit: 57bb0e96a4]
2025-03-03 08:19:47 -08:00
aledudek
45f410af7d [CK_TILE] Add EnvLogging and missing gemm args checks (#1896)
* [CK_TILE] Add EnvLogging - refactor IsSupported error messages

* [CK_TILE] Add EnvLogging - wrap gemm kernel error messages

* [CK_TILE] Add EnvLogging - Add missing k_batch args check

* [CK_TILE] Add EnvLogging - remove debug log

* Add one check

* [CK_TILE] EnvLogging - add CK_TILE_ERROR logs

* [CK_TILE] EnvLogging quotes fix

* [CK_TILE] EngLogging use function instead of macro for err logs

* [CK_TILE] EnvLogging - refactor checking env var

[ROCm/composable_kernel commit: c9bcfd755e]
2025-02-25 11:48:38 +01:00
Bartłomiej Kocot
5705af0aad [CK TILE] GEMM with packed i4 (#1885)
* [CK TILE] GEMM with packed i4

* Fixes

* fixes

* fixes

* fixes

[ROCm/composable_kernel commit: 4d9973ec8e]
2025-02-20 09:59:49 +01:00