4 Commits

Author SHA1 Message Date
Christopher Millette
e0d11b969b [rocm-libraries] ROCm/rocm-libraries#5030 (commit 8e02a26)
[CK] Replace tuple value construction with tuple_element_t
 type extraction [1A] (#5030)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Summary

### Rationale
CK's device operation instance registration uses
`add_device_operation_instances` at ~1,850
call sites to register GPU kernel configurations. The existing
implementation constructs
`std::tuple` values just to extract their types via `decltype`, then
copy-constructs each
instance into `make_unique`. This is wasteful — only the types matter,
not the values — and
forces the compiler to instantiate the full `std::tuple` constructor and
`std::get` machinery
at every call site.

### What changed
- Replace `remove_cvref_t<decltype(std::get<i>(tuple_obj))>` with
`std::tuple_element_t<i.value, TupleType>`, which extracts the type
directly without constructing any values
- Replace copy-from-default `make_unique<T>(value)` with direct default
construction `make_unique<T>()` — all CK device operation instances are
stateless structs with configuration encoded in template parameters
- Add `static_assert(std::is_default_constructible_v<NewOpInstance>)` to
enforce this contract at compile time with a clear error message
- Add Doxygen documentation for this high-traffic public API

### Value
- Eliminates unnecessary template instantiation of `std::tuple`
constructors and `std::get` across ~1,850 call sites
- Establishes a cleaner, more intention-revealing pattern for type-only
tuple usage
- The `static_assert` prevents silent breakage if a
non-default-constructible type is ever added
- No runtime behavior change — zero risk

### Files changed (9)
- `add_device_operation_instance.hpp`: Core pattern change
- 3 example files, 3 reduce instance headers, 1 convolution header, 1
profiler header

## Test plan
- [ ] Existing CI tests cover all ~1,850 call sites (GEMM, reduce,
softmax, convolution)
- [ ] `static_assert` provides compile-time validation stronger than
runtime tests
- [ ] No runtime behavior change — stateless struct default construction
is identical to copy-from-default
- [ ] Compatible with both `std::tuple` and `ck::type_list` containers

🤖 Generated with [Claude Code](https://claude.com/claude-code)
## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-06 16:28:22 +00:00
Johannes Graner
0a474aa62f [CI, CK examples] Disable time_kernel for CI tests and examples (#3464)
* Disable kernel timing in tests

* default time_kernel = false in old CK examples
2026-01-07 16:30:57 +01:00
Aviral Goel
d85f065b15 chore(copyright): update copyright header for example directory (#3273)
* chore(copyright): update copyright header for codegen directory

* chore(copyright): update copyright header for example directory
2025-11-24 18:02:41 -08:00
ltqin
c544eb4da0 Universal gemm splitk using reduce (with multi-d) (#1341)
* init for reduce_threadwise multi_d

* add reduce_threadwise_multi_d

* add reduce_multi_d

* clean

* start add an other splitk device op

* add reduce template parameter to SplitKBatchOffset

* add reduce c matrix

* clean up code

* change example data type to bf16

* add bf16Ai8B example

* remove reduce template parameter

* add splitk atomic status to v4

* example add multi d parameters

* device op add multi-d parameters

* add multi-d to reduce

* fix kbach=1 bug

* change B layout to col in  bf16Ai8B example

* remove float adding struct

* change  multi-d interface

* change file and class name

* remove multi-d of bf16Ai8B example

* change IsReduce function to IsReduceAdd

* change example layout to RRR from RCR

* according layout to set ds stride

* reset parameter layout

* add gemm universal reduce instance

* add reduce factory

* add profile_gemm_universal_reduce

* add reduce to profiler

* fix reduce instance

* fix profiler reduce compiling bug

* format

* format library instance code

* add mem instance for reduce library

* fix call instance names

* add workspace for reduce in ckProfiler

* format

* add mnpading to reduce library instance

* add fp16 instance to reduce of profiler

* change copyright time

* restore profiler cmake file

* add reduce text to instances

* add DsLayout and DsDataType to instances template parameter

* fixed gemm_reduce_multi_d

* add an example without multi_d

* Update common.hpp

* Update gtest.cmake

* Update gemm_xdl_splitk_reduce_bf16.cpp

* clean

* Update gtest.cmake

* format

* fixe api

* format

* default parameter change to RRR

* add vector_len for multi_d

* format

* Update gtest.cmake

* fix bf16A iBB elementwiseop

* add ReduceDataType

* move ReduceDataType to end position

* format

* remove googletest git method  address

* fix copyright time

* update init data

---------

Co-authored-by: root <jizhan@amd.com>
Co-authored-by: letaoqin <letaoqin@amd.com>
Co-authored-by: Jing Zhang <jizhan@meta.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
2024-07-19 22:01:22 +08:00