Commit Graph

9 Commits

Author SHA1 Message Date
Illia Silin
3873bf3b91 [CK] fix clang lifetimebound errors with staging compiler (#5921)
## Motivation

The ROCm staging compiler (newer Clang) enforces
`[[clang::lifetimebound]]` annotations on methods that return references
or pointers to internal object data. Without these annotations, the
staging compiler emits compilation errors for container accessor methods
across the CK and CK Tile namespaces.

  ## Technical Details

Adds `[[clang::lifetimebound]]` to all reference/pointer-returning
accessors in core container types:

  **`ck::` namespace:**
  - `Array` -- `At()`, `operator[]`, `operator()`, `begin()`, `end()`
  - `index_array` -- `operator[]`
  - `StaticallyIndexedArray_v2` -- `At()`, `operator[]`, `operator()`
  - `IndexLookupTable` -- `operator[]`

  **`ck_tile::` namespace:**
  - `array` -- `get(i)`, `at()`, `operator[]`, `operator()`
  - `static_array` -- `operator[]`
  - `thread_buffer` -- `get(i)`, `at()`, `operator[]`, `operator()`
  - `make_kernel()` -- parameter pack

Also removes the unused `instance_index` variable from
`batched_gemm_reduce_fp16.cpp` and simplifies its argument parsing
  accordingly.

  ## Test Plan

- Compile with the staging compiler to verify all lifetimebound errors
are resolved
- Existing tests pass unchanged -- the attribute is a compile-time
annotation with no runtime effect

 ## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-30 07:19:32 -07:00
Wojciech Laskowski
6ad65bc855 WMMA support for batched_gemm_reduce (#3332)
Summary:
- added new device impl of Batched GEMM Reduce for WMMA
- added instance library
- added WMMA impl to the Batched GEMM Reduce tests

[ROCm/composable_kernel commit: b09121f860]
2026-01-20 10:50:46 +01:00
Illia Silin
f559597c46 Split the instances by architecture. (#1223)
* parse examples inside the add_example_executable function

* fix the example 64 cmake file

* add xdl flag to the gemm_bias_softmax_gemm_permute example

* add filtering of tests based on architecture type

* enable test_grouped_gemm for gfx9 only

* enable test_transpose only for gfx9

* only linnk test_transpose if it gets built

* split the gemm instances by architectures

* split gemm_bilinear,grouped_conv_bwd_weight instances by targets

* split instances by architecture

* split grouped_conv instances by architecture

* fix clang format

* fix the if-else logic in group_conv headers

* small fix for grouped convolution instances

* fix the grouped conv bwd weight dl instances

* fix client examples

* only enable client examples 3 and 4 on gfx9

* set the gfx9 macro

* make sure the architecture macros are set by cmake

* use separate set of xdl/wmma flags for host code

* sinmplify the main cmake file

* add conv_fwd_bf8 instance declaration

[ROCm/composable_kernel commit: ae57e5938e]
2024-04-02 09:42:17 -07:00
Illia Silin
d40b8d5e2c update copyright headers (#726)
[ROCm/composable_kernel commit: b94fd0b227]
2023-05-31 18:46:57 -05:00
Po Yen Chen
3097b77236 Modularize ckProfiler operations (#514)
* Re-structure ckProfiler source files

* Rename profiler.cpp to main.cpp

* Modularize ckProfiler operations

* Add description for profiler operations

* Use longer name to avoid name collision

* Use macro to delay expansion

* Use std::move() to avoid object copying

* Prohibit users from calling dtor

* Use macro to eliminate redundant code

* Make friend function hidden

* Add missing include directive <iostream>

* Fix wrong include directives

* Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test

Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>

[ROCm/composable_kernel commit: 8784a72e23]
2022-12-01 15:15:02 -06:00
Chao Liu
2ef299e0ad add license in file (#303)
[ROCm/composable_kernel commit: d3051d7517]
2022-06-24 23:32:43 -05:00
Chao Liu
9df0a11a51 Absolute include path (#281)
* ad gelu and fast_gelu

* added GeLU and fast GeLU

* clean up

* add gemm+fastgelu example

* add gemm+gelu instances

* update profiler

* clean up

* clean up

* adding gemm+bias+activation

* clean

* adding bias

* clean

* adding gemm multiple d

* debugging

* add gemm bias add fastgelu

* rename, clean

* refactoring; add readme

* refactor

* refactor

* refactor

* refactor

* refactor

* refactor

* fix

* fix

* update example

* update example

* rename

* update example

* add ckProfiler

* clean

* clean

* clean

* clean

* add client app example

* update readme

* delete obselete files

* remove old client app

* delete old file

* cleaning

* clean

* remove half

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path for all examples

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* revert client app example

* clean build

* fix build

* temporary disable client test on Jenkins

* clean

* clean

* clean

[ROCm/composable_kernel commit: d1db6a0c3e]
2022-06-24 20:51:04 -05:00
JD
569dd9f47b Add host API (#220)
* Add host API

* manually rebase on develop

* clean

* manually rebase on develop

* exclude tests from all target

* address review comments

* update client app name

* fix missing lib name

* clang-format update

* refactor

* refactor

* refactor

* refactor

* refactor

* fix test issue

* refactor

* refactor

* refactor

* upate cmake and readme

Co-authored-by: Chao Liu <chao.liu2@amd.com>

[ROCm/composable_kernel commit: cec69bc3bc]
2022-05-12 09:21:01 -05:00
Jianfeng Yan
cb97ce68d8 Batched gemm and reduction (#156)
* adding batched_gemm_and_reduction

* batched_gemm_reduce works with bactch_count=1

* fix a bug in grid_size; batched_gemm_reduce works for batch_count > 1

* adding profiler for batched_gemm_fp16

* fixed a bug in declaration of d1 and d0; both example and profiler work

* clang-format

* cleanup

* batched_gemm_reduce: add test

* minor change

* fixed some typo in function names

[ROCm/composable_kernel commit: 34c661e71c]
2022-03-30 11:21:18 -05:00