Commit Graph

1476 Commits

Author SHA1 Message Date
root
74f50fe0da bug was discovered succesfully which is :: dst_buf_template Update not happen if DstImMemOp is atomicAdd for bf16 2024-12-05 11:58:20 +00:00
root
97e71ef025 print more incorrect value 2024-12-05 09:23:46 +00:00
root
ecaff601a0 debug print type casting problem was solved 2024-12-05 08:53:25 +00:00
root
d891a59603 off some trace 2024-12-05 07:06:52 +00:00
root
dc3519aef1 debug trace 2024-12-04 19:57:47 +00:00
root
718cadcaaa debug trace deep tread level 2024-12-02 21:09:09 +00:00
root
20e283dc5a trace copy 2024-11-14 16:51:57 +00:00
root
be8e076687 trace copy 2024-11-14 16:35:15 +00:00
root
6f2101556e Merge branch 'gemm_bf16_sk_muozturk' of https://github.com/ROCm/composable_kernel into gemm_bf16_sk_muozturk 2024-10-21 21:23:03 +00:00
Muhammed Emin Ozturk
7839419475 update 2024-10-11 22:29:09 -05:00
M.Emin Ozturk
bd68dd12b1 Merge branch 'develop' into gemm_bf16_sk_muozturk 2024-10-11 21:28:46 -06:00
Illia Silin
11444e4cf2 [CI] remove the --rm docker container flags (#1568) 2024-10-11 14:29:46 -07:00
M.Emin Ozturk
f79c19550c Merge branch 'develop' into gemm_bf16_sk_muozturk 2024-10-10 21:07:54 -06:00
Illia Silin
f46a9eee9d only build tests and examples if user sets GPU_TARGETS (#1565) 2024-10-10 15:31:56 -07:00
spolifroni-amd
14c52befda removed API usage header (#1566) 2024-10-10 13:57:23 -07:00
Rostyslav Geyyer
d18fc0797f Fix default stride value (#1559) 2024-10-10 07:37:09 -07:00
Thomas Ning
6f27bc9872 Ck tile gemm cshuffle & CK Tile GEMM restructure (#1535)
* ake the cshuffle compilable

* modify Mhe reference on gpu and cpu. Correaccess of cshuffle

* fix the cpu reference code

* Complete the in tile shuffle logic

* restructure the kernel template input

* change the naming pattern of ck_tile gemm pipeline

* Re-format files using remod.py

* Solve the fmha conflict with gemm

* Comment Addressed from Carlus

---------

Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com>
2024-10-10 18:02:22 +08:00
Illia Silin
2e1165c1a7 fix the target selection logic (#1561) 2024-10-09 15:21:57 -07:00
Illia Silin
cfac9497e2 remove gfx12 targets from daily builds with rocm6.2 (#1560) 2024-10-09 10:18:05 -07:00
Christopher Millette
ceaed8e097 Fixes small memory leak from missing hipEventDestroy (#1554) 2024-10-09 09:41:35 +02:00
M.Emin Ozturk
e35e64c5ac Merge branch 'develop' into gemm_bf16_sk_muozturk 2024-10-08 23:22:17 -06:00
Rostyslav Geyyer
aa932445ea Add a gpu gemm reference kernel (#1528)
* Add a gpu gemm reference kernel

* Switch to gpu reference in gemm examples

* Remove redundant arguments

* Update all related examples

* Update more examples

* Try less threads per block

* Try even less threads per block

* Add support for all matrix layouts

* Increase block size

* Clean up

* Remove hardcoded strides

* Clean up

* Try a column-major case

* Revert back to row-major

* Run both CPU and GPU veriffication

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-10-08 11:05:28 -05:00
Po Yen Chen
0c094daa7e [CK_TILE] Update example README files & fix script compatibility issue (#1548)
* Fix text alignment of ArgParser::print()

* Update example README files

* Clarify make-ck-dev.sh <arch> usage

* Only keep some of the argument from '-?' output

* Undo command line output changes in README

* Only keep existing argument on doc and update description

* Fix text alignment

* Make cmake-ck-*.sh compatible with 'sh' command
2024-10-08 10:45:12 +08:00
Qianfeng
74d68e3b99 [CK_TILE] Simplify the codes in splitkv_combine pipeline (#1549)
* Simplify the codes in splitkv_combine pipeline

* Always set kPadSeqLenK=true for fmha splitkv kernels

* Change in Oacc Alignment and TileDistribution to be more adaptable to tile sizes

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-10-08 10:44:34 +08:00
Illia Silin
7733ae167b add a CK_USE_CODEGEN build argument to enable codegen (#1552)
* add a CK_USE_CODEGEN build argument to enable codegen

* fix cmake codegen logic
2024-10-07 15:45:19 -07:00
M.Emin Ozturk
965021d2dc Merge branch 'develop' into gemm_bf16_sk_muozturk 2024-10-07 14:05:59 -06:00
Illia Silin
7d8ea5f08b Fix build logic using GRU_ARCHS. (#1536)
* update build logic with GPU_ARCHS

* fix the GPU_ARCHS build for codegen

* unset GPU_TARGETS when GPU_ARCHS are set
2024-10-07 08:18:23 -07:00
Bartłomiej Kocot
cc8f466a7e [CK_TILE] Fix conv param multiple definition (#1550)
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-10-07 15:21:21 +02:00
rocking
0023f01ab0 [Ck tile] Support layernorm one pass (#1512)
* Fix compile error

* Add one pass pipeline

* Extract creating tile_window to operator()

* clang format

* reduce duplicated code

* do not hardcode

* Support padding in layernorm

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-10-07 14:25:53 +08:00
kylasa
c24fae2346 Adding seed and offset pointer support to the philox random number generator. (#1523)
* Adding seed and offset pointer support to the philox random number generator.

* Separating seed and offset pointer checks with different condition statements.

* Changes include, adding support for device seed and offset pointers, union is used to store seed/offset values and device pointers to minimize device SGPRs.

* Correcting a typo in the readme file

* Re-format files using remod.py

* Use STL type for API parameters

* Use simpler struct design for drop_seed & drop_offset

* Undo unnecessary changes

* Sync kargs style for fmha_fwd.hpp/.cpp

* Use templated union to reduce code

* Use structured binding to make code more readable

---------

Co-authored-by: Sudhir Kylasa <sukylasa@amd.com>
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-10-05 02:48:47 +08:00
arai713
b545de175a Codegen build (#1526)
* updating codegen build for MIOpen access: adding .cmake for codegen component

(cherry picked from commit 652a7c0463)

* updating CMake

(cherry picked from commit a685822e36)
2024-10-04 10:51:50 -07:00
Bartłomiej Kocot
6b54d2faf8 Fix grouped gemm check to avoid overflow (#1545) 2024-10-04 17:32:43 +02:00
root
41b5461197 Merge branch 'develop' into gemm_bf16_sk_muozturk 2024-10-03 07:10:01 +00:00
root
8a34c640e9 Cmake 2024-10-03 07:06:52 +00:00
macurtis-amd
aeb7c91f48 Fix compilation errors generated by forthcoming Clang changes (#1544)
Without this change, the following diagnostic is generated:
  a template argument list is expected after a name prefixed by the template
  keyword [-Wmissing-template-arg-list-after-template-kw]

See C++17 spec [temp.names] p5.
2024-10-02 13:56:22 -07:00
BrianHarrisonAMD
294cb82314 Add generating mha static library for gfx90a (#1540)
* Add generating mha static library for gfx90a

* Update comment to reflect changes
2024-10-02 09:26:11 -07:00
M.Emin Ozturk
10e8be4896 Merge branch 'develop' into gemm_bf16_sk_muozturk 2024-10-01 18:42:39 -06:00
root
b416c87787 successfull compilation but validation failed 2024-10-01 21:09:37 +00:00
Muhammed Emin
0227762fd1 Cmake file 2024-10-01 20:47:59 +00:00
Illia Silin
11b7a4db00 re-enable the FMHA performance monitoring (#1539) 2024-10-01 13:17:55 -07:00
Illia Silin
8e4c3fb1bc [CK_TILE] add missing vector header (#1537)
* add missing vector header

* Re-format header using remod.py

---------

Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com>
2024-10-01 07:58:20 -07:00
Po Yen Chen
a1c07e8d91 [CK_TILE] Change output accum tensor layout of fmha fwd split-kv & combine kernels (#1527)
* Use same layout for o_acc and o tensor

* Use better param names in partitioner

* Remove redundant kargs 'max_seqlen_q'

* Use better param names in splitkv kernel

* Add comment for additional kernel arguments

* Sync empty loop early return logics between pipelines

* Pass more arguments to cmake in scripts

* Align backslashes

* Fix wrong o_acc tensor view strides

* Change o_acc layout if o_perm=0

* Handle whole row masked via attn_bias

* Use use vector width = 1 for o_acc

* Use more even split sizes
2024-10-01 22:13:52 +08:00
M.Emin Ozturk
4cd1dc7f06 Complex Contraction CK Bilinear Example (#1061)
* complex type contraction

* bug fix

* update

* Tensor Contraction Complex Data Type is working

* 4D Kernel

* some change

* validation check in progress

* validation issue

* fp32 verification error is fixed

* fp32 and fp64 are done

* remove old files

* remove cmake files

* remove cmake files

* Readme

* img verification

* CMakeList

* number changed

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: Emin Ozturk <emin.ozturk@utah.edu>
2024-09-30 21:05:42 -06:00
Bartłomiej Kocot
de3e3b6424 [CK_TILE] Image to Column kernel (#1532)
* [CK_TILE] Image to Column kernel

* Fixes

* Vector loads and stores

* Fixes

* Fixes

* change test dir name
2024-09-27 22:57:38 +02:00
Emin Ozturk
eb52bc1ece initial 2024-09-27 20:23:52 +00:00
Dan Yao
9d69a099a4 [CK_TILE] Fix compiler related FA bwd issues (#1530)
* add barriers

* tail bias barriers

* adjust bf16/hd256 tol

* continue adjust bf16/hd256 tol
2024-09-26 12:18:39 -07:00
Illia Silin
42e6dceacc Fix compilation errors with Clang20.0. (#1533)
* fix clang20 compilation errors for gfx90a

* fix clang20 compilation errors for gfx11 targets
2024-09-25 13:45:38 -07:00
Illia Silin
65f8d1440f make CK CI use different git credentials (#1529) 2024-09-25 09:05:48 -07:00
dependabot[bot]
1c5a4d1b9f Bump rocm-docs-core from 1.8.1 to 1.8.2 in /docs/sphinx (#1531)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.1 to 1.8.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.2/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.1...v1.8.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-25 06:59:25 -07:00
BrianHarrisonAMD
3528a523ff Add additional instances to device_mha_instance (#1522)
* Add additional instances to device_mha_instance

* Add comment to describe what receipt 3 option filters

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-09-24 10:15:30 -06:00