Commit Graph

11 Commits

Author SHA1 Message Date
Aviral Goel
a535de0f75 chore(copyright): update copyright header for example directory (#3273)
* chore(copyright): update copyright header for codegen directory

* chore(copyright): update copyright header for example directory

[ROCm/composable_kernel commit: d85f065b15]
2025-11-24 18:02:41 -08:00
John Shumway
b5e2f26808 [CK_BUILDER] Put global CK functions in an the CK namespace (#3232)
* Wrap ck host utitlies in CK namespace.

The CK and CK-Tile source code bases are incompatible because CK is not properly using namespaces everywhere. In particular, we need to put hip_check_error in the ck namespace.

Move all functions in include/ck_/host_utility that were in global namespace into the ck namespace.

There may be additional namespace problems like this, and it's possible we'll have namespace clashes. But it is good design to properly guard our to code bases (CK and CKTile) so that they can both coexist. Moreover, estabilishing this compatiblity is essential if we are going to allow the builder to instantiate  kernels from either template library.

* Add using declarations to test code.

After moving some of the untils into the ck namespace, most examples and a few tests had to be updated to recognize the new namespace declarations. We add using declarations to individual compute units for functions that were previously in the global namespace.

* Add using declarations to client examples.

[ROCm/composable_kernel commit: ad57f6ef0b]
2025-11-19 11:23:02 +01:00
Vidyasagar Ananthan
8654f7be8d [DOCS] Documentation Addition (Readme updates) (#2495)
* GH-2368 Adding a basic glossary

GH-2368 Minor edits

GH-2368 Adding missing READMEs and standardization.

resolving readme updates

GH-2368 Minor improvements to documentation.

Improving some readmes.

Further improvement for readmes.

Cleaned up the documentation in 'client_example' (#2468)

Update for PR

Update ACRONYMS.md to remove trivial terms

Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine.

revise 37_transpose readme

revise 36_copy readme

Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity.

Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity.

Remove references to the Tile Engine in README files across multiple examples

* GH-2368 Adding a basic glossary

GH-2368 Minor edits

GH-2368 Adding missing READMEs and standardization.

resolving readme updates

GH-2368 Minor improvements to documentation.

Improving some readmes.

Further improvement for readmes.

Cleaned up the documentation in 'client_example' (#2468)

Update for PR

Update ACRONYMS.md to remove trivial terms

Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine.

revise 37_transpose readme

revise 36_copy readme

Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity.

Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity.

Remove references to the Tile Engine in README files across multiple examples

Refine README files by removing outdated references to the Tile Engine

* Updates based on PR feedback 1

* Updates based on PR feedback 2

* Updates based on PR feedback 3

* Updates based on PR feedback 4

* Updates based on PR feedback 5

* Updates based on PR feedback 6

* Updates based on PR feedback 7

* Updates based on PR feedback 8

* Content Modification of CK Tile Example

* Modify the ck_tile gemm config

---------

Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: 92c67a824f]
2025-10-16 03:10:57 -07:00
Michal Kulikowski
f85778eab4 [CK][Examples] Extending support for rdna3/4 part 2:
-example_batched_gemm_xdl_int8
-example_batched_gemm_xdl_fp8_rowwise_v3
-example_batched_gemm_xdl_fp32
-example_batched_gemm_xdl_bf16
-example_batched_gemm_xdl_bf16_v3
-example_batched_gemm_xdl_fp16
-example_splitk_gemm_bias_e_permute_xdl_fp32
*fixing return value to return 0 as success in above examples.

Fixing cmdline parameters in:
-example_sparse_embedding3_forward_layernorm
-example_elementwise_binary_4D_fp16
-elementwise_scale_permute_amax_2D_fp16_fp8

Signed-off-by: Michal Kulikowski <Michal.Kulikowski@amd.com>


[ROCm/composable_kernel commit: 7259b9c4db]
2025-10-08 18:14:38 +02:00
linqunAMD
20333fd850 [CK] Add command option instance_index and param_mask to run partial ck test (#2889)
* [CK] Add command option instance_index and param_mask to run partial ck test

Many CK test are instance test. it will loop all instance in the instance library. It causes test often out-of-time if we run test on simulator/emulator.
This PR add option instance_index and param_mask to reduce the workload of instance test

instance_index: only run test 1 available instance with specified index.
param_mask: filter the embedded parameter with specified mask

* fix CI error

* fix clang format

---------

Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: e78a897ec0]
2025-09-30 08:24:40 -07:00
Illia Silin
3345f5f417 upgrade from clang-format-12 to clang-format-18 (#2568)
* upgrade to clang-format-18

* update to clang-format-18 in pre-commit-config

[ROCm/composable_kernel commit: 504b101da3]
2025-07-28 11:34:07 -07:00
Illia Silin
d40b8d5e2c update copyright headers (#726)
[ROCm/composable_kernel commit: b94fd0b227]
2023-05-31 18:46:57 -05:00
who who who
75c29a7c90 add multi embeddings support (#542)
* add multi embeddings support

* fix format

* optimize sqrt

* add reduce operation

* change to elementwise op

* fix name

* rename

* run ci cd

* format example

* format code

* format code

[ROCm/composable_kernel commit: 147b7db561]
2023-01-18 11:32:12 -06:00
Po Yen Chen
f2dd2e5b09 Rangify constructor of HostTensorDescriptor & Tensor<> (#445)
* Rangify STL algorithms

This commit adapts rangified std::copy(), std::fill() & std::transform()

* Rangify check_err()

By rangifying check_err(), we can not only compare values between
std::vector<>s, but also compare any ranges which have same value
type.

* Allow constructing Tensor<> like a HostTensorDescriptor

* Simplify Tensor<> object construction logics

* Remove more unnecessary 'HostTensorDescriptor' objects

* Re-format example code

* Re-write more HostTensorDescriptor ctor call

[ROCm/composable_kernel commit: 4a2a56c22f]
2022-11-11 11:36:01 -06:00
Adam Osewski
8a8f8521f9 Refactor device op implementations into impl subdirectory. (#420)
* Move kernel implementation files under impl directory.

* Update examples paths.

* Update device kernel impl include paths.

* Update tensor operation instances include paths.

* Update profiler and tests include paths.

* Clang-format

* Update include paths for batched gemm reduce

* Refactor UnitTest ConvNDBwdWeight.

* Refactor fwd and bwd data convND UT.

* Fix used test macro.

* Fix include path.

* Fix include paths.

* Fix include paths in profiler and tests.

* Fix include paths.

Co-authored-by: Adam Osewski <aosewski@amd.com>

[ROCm/composable_kernel commit: 3048028897]
2022-10-13 09:05:08 -05:00
carlushuang
ededb608ed embedding fuse layernorm (#405)
* add gridwise/device sparse embedding

* update code

* update code

* remove useless makefile

* code fix

* workable

* work properly

* emb add

* add more instance

* format

* remove useless code

* fix format

* fix clang-tidy

* clean

* fix a compile error

Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>

[ROCm/composable_kernel commit: efd1d25733]
2022-09-09 10:41:15 -05:00