Commit Graph

15 Commits

Author SHA1 Message Date
Aviral Goel
bb41ea37e1 chore(copyright) update library wide CMakeLists.txt copyright header template (#3313)
* chore(copyright) update library wide CMakeLists.txt files copyright header template

* Fix build

---------

Co-authored-by: Sami Remes <samremes@amd.com>

[ROCm/composable_kernel commit: 004784ef98]
2025-11-28 13:49:54 -08:00
Aviral Goel
7cee27c4a2 chore(copyright): update copyright header for test directory (#3252)
* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for client_example directory

* chore(copyright): update copyright header for test directory

[ROCm/composable_kernel commit: c8563f2101]
2025-11-20 20:36:57 -05:00
Vidyasagar Ananthan
8654f7be8d [DOCS] Documentation Addition (Readme updates) (#2495)
* GH-2368 Adding a basic glossary

GH-2368 Minor edits

GH-2368 Adding missing READMEs and standardization.

resolving readme updates

GH-2368 Minor improvements to documentation.

Improving some readmes.

Further improvement for readmes.

Cleaned up the documentation in 'client_example' (#2468)

Update for PR

Update ACRONYMS.md to remove trivial terms

Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine.

revise 37_transpose readme

revise 36_copy readme

Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity.

Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity.

Remove references to the Tile Engine in README files across multiple examples

* GH-2368 Adding a basic glossary

GH-2368 Minor edits

GH-2368 Adding missing READMEs and standardization.

resolving readme updates

GH-2368 Minor improvements to documentation.

Improving some readmes.

Further improvement for readmes.

Cleaned up the documentation in 'client_example' (#2468)

Update for PR

Update ACRONYMS.md to remove trivial terms

Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine.

revise 37_transpose readme

revise 36_copy readme

Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity.

Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity.

Remove references to the Tile Engine in README files across multiple examples

Refine README files by removing outdated references to the Tile Engine

* Updates based on PR feedback 1

* Updates based on PR feedback 2

* Updates based on PR feedback 3

* Updates based on PR feedback 4

* Updates based on PR feedback 5

* Updates based on PR feedback 6

* Updates based on PR feedback 7

* Updates based on PR feedback 8

* Content Modification of CK Tile Example

* Modify the ck_tile gemm config

---------

Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: 92c67a824f]
2025-10-16 03:10:57 -07:00
Haocong WANG
65d6442b4c [GEMM] gemm_universal related optimization (#1453)
* replace buffer_atomic with global_atomic

* fixed global_atomic_add

* added bf16 atomic_add

* format

* clang-format-12

* clean

* clean

* add guards

* Update gtest.cmake

* enabled splitk_gemm_multi_d

* format

* add ckProfiler

* format

* fixed naming

* format

* clean

* clean

* add guards

* fix clang format

* format

* add kbatch printout

* clean

* Add rocm6.2 related gemm optimization

* Limit bf16 atomic usage

* remove redundant RCR gemm_universal instance

* Add RRR fp8 gemm universal instance

* Bug fix

* Add GPU_TARGET guard to FP8/BF8 target

* bug fix

* update cmake

* remove all fp8/bf8 example if arch not support

* Enable fp8 RRR support in ckProfiler

* limit greedy-reverse flag to gemm_universal in ckProfiler

---------

Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: Jing Zhang <jizhan@meta.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 3049b5467c]
2024-08-14 10:42:30 +08:00
Rostyslav Geyyer
c16cff1498 Move grouped conv fwd client examples (#1299)
* Move grouped conv fwd client examples

* Update existing examples

* Format

[ROCm/composable_kernel commit: 204da9c522]
2024-05-21 09:52:41 -05:00
Rostyslav Geyyer
4e821aa876 Add instances for conv_scale with bf8@fp8->fp8 (#1231)
* Add instances

* Add example

* Add profiler mode

* Add client example

[ROCm/composable_kernel commit: bbefc12a26]
2024-04-11 10:35:00 -05:00
Rostyslav Geyyer
c1d4988b4c Add instances for conv_scale with fp8@bf8->fp8 (#1220)
* Update device op api to support BComputeType

* Add example

* Add instances

* Add profiler mode

* Add client example

* Update copyright year

* Add BComputeType check

* Fix compute types

[ROCm/composable_kernel commit: a61e73bc56]
2024-04-03 09:08:08 -05:00
Rostyslav Geyyer
5f84554b12 Add instances for conv_scale with bf8 in / fp8 out (#1200)
* Add bf8 conv fwd instances

* Add example

* Add profiler mode

* Add client example

* Fix copyright headers

* Format

[ROCm/composable_kernel commit: fd0d093e78]
2024-03-21 13:57:34 -05:00
Rostyslav Geyyer
591cd69bb2 Add instances for conv_scale with fp8 in/out (#1193)
* Add fp8 conv instances and client example

* Format

* Add example

* Update cmakelists

* Add profiler mode

* Format

* Fix copyright headers

[ROCm/composable_kernel commit: e626d5202a]
2024-03-15 09:50:03 -07:00
amoskvic
e9d9e4d0ad Style improvement: improving type alias usage consistency in gemm-related client examples. Also copyright year update for all client examples. (#1180)
Co-authored-by: Arseny Moskvichev <amoskvic@amd.com>

[ROCm/composable_kernel commit: a776978cbe]
2024-02-28 16:39:03 -08:00
Illia Silin
2e8781c151 Split the static library into several files. (#1044)
* spolit the static library into several

* update lib paths and fix client example

* do not use device_mha_operarions for client examples

* use appropriate libs to link to client examples

* remove the gpu/transpose path from the list

* try fixing clinet examples 3,4,9

* add necessary libs for client examples

* fix the layernorm client example

* fix the client examples 23 and 24

* fix typo

* add interface library and refresh clang format

[ROCm/composable_kernel commit: 7965d66a81]
2023-11-28 11:17:37 -08:00
Bartłomiej Kocot
c80803505f Introduce multiABD api and deprecate multiD (#1035)
* Introduce multiABD api and deprecate multiD

* Replace multiD with multiABD

* Mark structures as deprecated

* Change doxygen deprecated to note to avoid warnings

[ROCm/composable_kernel commit: f2398f612d]
2023-11-14 17:00:40 +01:00
zjing14
ffaff83a2f 3d grouped conv fwd with input/output fp16 and comp fp8 (#931)
* add f8 comp instance

* fixed

* fixed comments

* rename

* fixed dtype

* format

* fixed CI

* fixed ci

* add missing ComputeType

* fixed cit

* fixed

* Update cmake-ck-dev.sh

---------

Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: e921e1f08d]
2023-10-03 20:04:26 -05:00
zjing14
973fc655fd Fixed Weight layout of grouped_conv 3d fwd (#743)
* Changed wei layout

* changed layout for examples

* fixed client example

---------

Co-authored-by: root <root@ctr-ubbsmc15.amd.com>

[ROCm/composable_kernel commit: 309b1c6461]
2023-06-15 10:19:33 -05:00
Adam Osewski
85acf7ac2f Conv3D FWD BWD WRW fp16 fp32 client examples (#559)
* Conv3d bwd weight client example.

* Update year in license

* Convolution bwd data 3D fp16/fp32 client example.

* Client example for convnd fwd fp16 fp32

* clang-format

* Review remarks.

* Fix compiler err.

* Update data layout to standard one.

* Add conv 3d fwd NDHWGC instances

* clang-format

* Conv3d fwd NDHWGC instances.

---------

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: e9fd122889]
2023-02-15 11:16:47 -06:00