Commit Graph

17 Commits

Author SHA1 Message Date
Aviral Goel
0861395425 chore(copyright) update library wide CMakeLists.txt copyright header template (#3313)
* chore(copyright) update library wide CMakeLists.txt files copyright header template

* Fix build

---------

Co-authored-by: Sami Remes <samremes@amd.com>

[ROCm/composable_kernel commit: 004784ef98]
2025-11-28 13:49:54 -08:00
Aviral Goel
89e3931da8 chore(copyright): update copyright header for test directory (#3252)
* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for client_example directory

* chore(copyright): update copyright header for test directory

[ROCm/composable_kernel commit: c8563f2101]
2025-11-20 20:36:57 -05:00
Vidyasagar Ananthan
b26a7f9b84 [DOCS] Documentation Addition (Readme updates) (#2495)
* GH-2368 Adding a basic glossary

GH-2368 Minor edits

GH-2368 Adding missing READMEs and standardization.

resolving readme updates

GH-2368 Minor improvements to documentation.

Improving some readmes.

Further improvement for readmes.

Cleaned up the documentation in 'client_example' (#2468)

Update for PR

Update ACRONYMS.md to remove trivial terms

Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine.

revise 37_transpose readme

revise 36_copy readme

Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity.

Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity.

Remove references to the Tile Engine in README files across multiple examples

* GH-2368 Adding a basic glossary

GH-2368 Minor edits

GH-2368 Adding missing READMEs and standardization.

resolving readme updates

GH-2368 Minor improvements to documentation.

Improving some readmes.

Further improvement for readmes.

Cleaned up the documentation in 'client_example' (#2468)

Update for PR

Update ACRONYMS.md to remove trivial terms

Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine.

revise 37_transpose readme

revise 36_copy readme

Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity.

Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity.

Remove references to the Tile Engine in README files across multiple examples

Refine README files by removing outdated references to the Tile Engine

* Updates based on PR feedback 1

* Updates based on PR feedback 2

* Updates based on PR feedback 3

* Updates based on PR feedback 4

* Updates based on PR feedback 5

* Updates based on PR feedback 6

* Updates based on PR feedback 7

* Updates based on PR feedback 8

* Content Modification of CK Tile Example

* Modify the ck_tile gemm config

---------

Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: 92c67a824f]
2025-10-16 03:10:57 -07:00
Bartłomiej Kocot
169e3cb4f8 Add support for GKCYX grouped conv weight (#2023)
* Grouped conv bwd weight GKCYX support

* fix and changelog

* fix

* fix

* fixes

* comments

* fix

[ROCm/composable_kernel commit: 2ccf914888]
2025-04-02 23:59:49 +02:00
Bartłomiej Kocot
ba83883cd2 Add grouped conv bwd wei merged grouped instance for larger filter (#1984)
* Add grouped conv bwd wei merged grouped instance for larger filter

* Update readme

[ROCm/composable_kernel commit: fdaff5603e]
2025-03-18 16:16:24 +01:00
Bartłomiej Kocot
3b61a8314b Add Grouped Convolution and GEMM documentation (#1719)
* Add Grouped Convolution docs

* Add gemm docs

* Update docs

* fix

[ROCm/composable_kernel commit: 85d6fcd30a]
2025-02-04 16:41:49 +01:00
Haocong WANG
68d3fce998 [GEMM] gemm_universal related optimization (#1453)
* replace buffer_atomic with global_atomic

* fixed global_atomic_add

* added bf16 atomic_add

* format

* clang-format-12

* clean

* clean

* add guards

* Update gtest.cmake

* enabled splitk_gemm_multi_d

* format

* add ckProfiler

* format

* fixed naming

* format

* clean

* clean

* add guards

* fix clang format

* format

* add kbatch printout

* clean

* Add rocm6.2 related gemm optimization

* Limit bf16 atomic usage

* remove redundant RCR gemm_universal instance

* Add RRR fp8 gemm universal instance

* Bug fix

* Add GPU_TARGET guard to FP8/BF8 target

* bug fix

* update cmake

* remove all fp8/bf8 example if arch not support

* Enable fp8 RRR support in ckProfiler

* limit greedy-reverse flag to gemm_universal in ckProfiler

---------

Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: Jing Zhang <jizhan@meta.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 3049b5467c]
2024-08-14 10:42:30 +08:00
Bartłomiej Kocot
b6a17bc3e2 Add two stage grouped conv bwd weight kernel (#1280)
[ROCm/composable_kernel commit: 0b6b5d1785]
2024-05-08 09:53:24 +02:00
amoskvic
4256edcdd2 Style improvement: improving type alias usage consistency in gemm-related client examples. Also copyright year update for all client examples. (#1180)
Co-authored-by: Arseny Moskvichev <amoskvic@amd.com>

[ROCm/composable_kernel commit: a776978cbe]
2024-02-28 16:39:03 -08:00
Illia Silin
6a5e94f475 Split the static library into several files. (#1044)
* spolit the static library into several

* update lib paths and fix client example

* do not use device_mha_operarions for client examples

* use appropriate libs to link to client examples

* remove the gpu/transpose path from the list

* try fixing clinet examples 3,4,9

* add necessary libs for client examples

* fix the layernorm client example

* fix the client examples 23 and 24

* fix typo

* add interface library and refresh clang format

[ROCm/composable_kernel commit: 7965d66a81]
2023-11-28 11:17:37 -08:00
Rostyslav Geyyer
c7588d46d2 Add conv bwd weight client example (#1005)
* Add conv bwd weight client example

* Update instance selector

* Fake the conversion

* Bring the conversion back

[ROCm/composable_kernel commit: 5356c4a943]
2023-11-13 11:16:04 -06:00
Bartłomiej Kocot
50dbce5272 Add wei_strides to grouped conv3d wei to keep consistency (#817)
* Add wei_strides to grouped conv3d wei to keep consistency

* Fix strides in client examples

* Unify backward weight api with forward

* Fix for example

* Fixes for examples

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 22443f7aae]
2023-08-07 10:23:45 -05:00
Bartłomiej Kocot
a1a8901df8 Support NHWGC conv2d_bwd_weight (#769)
* Support NHWGC conv2d_bwd_weight

* Fix client example

* Fix client example

* Fix comments

* Redesign grouped_conv_bwd_weight instances

* Clang format fix

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 1ee99dcaa6]
2023-07-12 08:25:02 -05:00
Illia Silin
b57fbee2f1 update copyright headers (#726)
[ROCm/composable_kernel commit: b94fd0b227]
2023-05-31 18:46:57 -05:00
ltqin
4d10c937f0 Grouped conv1d client example (#589)
* add conv1d fwd client example

* change 07_grouped_conv2d_fwd to 07_grouped_convnd_fwd

* add conv1d bwd weight

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 830d37a7d5]
2023-02-22 11:55:21 -06:00
Adam Osewski
23c45ec25e Conv3D FWD BWD WRW fp16 fp32 client examples (#559)
* Conv3d bwd weight client example.

* Update year in license

* Convolution bwd data 3D fp16/fp32 client example.

* Client example for convnd fwd fp16 fp32

* clang-format

* Review remarks.

* Fix compiler err.

* Update data layout to standard one.

* Add conv 3d fwd NDHWGC instances

* clang-format

* Conv3d fwd NDHWGC instances.

---------

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: e9fd122889]
2023-02-15 11:16:47 -06:00
Po Yen Chen
44e669e0dd Add client example of grouped conv2d backward weight (data type: fp16) (#498)
* Remove redundant CMake setting

* Extract common code from files

* Rename folder 'convnd' to 'conv'

* Use std::array<> to accept compile-time kwnown # of arguments

* Fix compilation error of tuning parameter

* In example, use same setting as unit-test

* Remove no-longer used include directive

* Add interface for grouped conv bwd weight

* Add group support for conv bwd weight

* Add grouped conv bwd weight example

* Use group parameter in example

* Rename example folder

* Remove non-grouped version example source files

* Rename device op template

* Add group support to convolution backward weight

* Remove debug messages

* Use smaller group size in example

* Use named variable as loop terminate condition

* Prettify example output message

* Enlarge used grid size

* Allow real grid size exceeds expected grid size

* Rename interface file

* Add client example for grouped conv2d bwd weight

* Fix wrong include directive

* Rename client example folder

[ROCm/composable_kernel commit: 38470e0497]
2022-11-09 18:50:03 -06:00