Commit Graph

1482 Commits

Author SHA1 Message Date
Aleksander Dudek
93c115574f CK Tile Batched Gemm 2024-10-29 20:54:29 +00:00
Aleksander Dudek
3c171550f6 Batched gemm - messy validation check 2024-10-29 15:58:02 +00:00
Aleksander Dudek
71eea17c5f Batched gemm - counting strides 2024-10-29 09:47:50 +00:00
Aleksander Dudek
5ab7607525 Batched gemm - passed batch args 2024-10-24 11:15:42 +00:00
Aleksander Dudek
533204d6a2 Batched gemm - initial example setup 2024-10-24 09:36:18 +00:00
Aleksander Dudek
6cb12c614c Add batched gemm kernel - initial stride parameters 2024-10-22 11:22:22 +00:00
Adam Osewski
9d709a68e1 Add load tile overload which accepts output tensor as parameter.
* This give 8% perf boost at the cost of using more registers.
2024-10-14 11:59:20 +00:00
Adam Osewski
93c30d2cf9 Add missing include. 2024-10-11 14:35:50 +00:00
Adam Osewski
824809c194 Fixes after merge. 2024-10-10 13:33:23 +00:00
Adam Osewski
4085e3d074 Merge branch 'develop' into aosewski/ck_tile_universal_gemm_p1 2024-10-10 13:33:09 +00:00
Adam Osewski
1b2bf88d8f Refactoring and review comment.s 2024-10-10 12:33:45 +00:00
Adam Osewski
ba676917f8 Use currently available pipeline policy. 2024-10-10 12:18:21 +00:00
Adam Osewski
be7cd73566 Switch over to current block gemm. 2024-10-10 12:07:49 +00:00
Adam Osewski
bb1298c699 Refactor gemm examples. 2024-10-10 11:48:43 +00:00
Thomas Ning
6f27bc9872 Ck tile gemm cshuffle & CK Tile GEMM restructure (#1535)
* ake the cshuffle compilable

* modify Mhe reference on gpu and cpu. Correaccess of cshuffle

* fix the cpu reference code

* Complete the in tile shuffle logic

* restructure the kernel template input

* change the naming pattern of ck_tile gemm pipeline

* Re-format files using remod.py

* Solve the fmha conflict with gemm

* Comment Addressed from Carlus

---------

Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com>
2024-10-10 18:02:22 +08:00
Adam Osewski
60bdc10c04 Formatting. 2024-10-10 09:49:04 +00:00
Adam Osewski
a60cf0d0ee Use AccDataType for Output of MFMA instruction. 2024-10-10 09:48:33 +00:00
Illia Silin
2e1165c1a7 fix the target selection logic (#1561) 2024-10-09 15:21:57 -07:00
Illia Silin
cfac9497e2 remove gfx12 targets from daily builds with rocm6.2 (#1560) 2024-10-09 10:18:05 -07:00
Adam Osewski
b045fad59a Merge remote-tracking branch 'origin/develop' into aosewski/ck_tile_universal_gemm_p1 2024-10-09 10:41:47 +00:00
Adam Osewski
6465914598 Add gtests. 2024-10-09 10:41:23 +00:00
Adam Osewski
611064a13f Do not use macro. 2024-10-09 10:37:26 +00:00
Adam Osewski
41fc6a2433 Few small changes & formatting. 2024-10-09 10:37:01 +00:00
Christopher Millette
ceaed8e097 Fixes small memory leak from missing hipEventDestroy (#1554) 2024-10-09 09:41:35 +02:00
Rostyslav Geyyer
aa932445ea Add a gpu gemm reference kernel (#1528)
* Add a gpu gemm reference kernel

* Switch to gpu reference in gemm examples

* Remove redundant arguments

* Update all related examples

* Update more examples

* Try less threads per block

* Try even less threads per block

* Add support for all matrix layouts

* Increase block size

* Clean up

* Remove hardcoded strides

* Clean up

* Try a column-major case

* Revert back to row-major

* Run both CPU and GPU veriffication

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-10-08 11:05:28 -05:00
Po Yen Chen
0c094daa7e [CK_TILE] Update example README files & fix script compatibility issue (#1548)
* Fix text alignment of ArgParser::print()

* Update example README files

* Clarify make-ck-dev.sh <arch> usage

* Only keep some of the argument from '-?' output

* Undo command line output changes in README

* Only keep existing argument on doc and update description

* Fix text alignment

* Make cmake-ck-*.sh compatible with 'sh' command
2024-10-08 10:45:12 +08:00
Qianfeng
74d68e3b99 [CK_TILE] Simplify the codes in splitkv_combine pipeline (#1549)
* Simplify the codes in splitkv_combine pipeline

* Always set kPadSeqLenK=true for fmha splitkv kernels

* Change in Oacc Alignment and TileDistribution to be more adaptable to tile sizes

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-10-08 10:44:34 +08:00
Illia Silin
7733ae167b add a CK_USE_CODEGEN build argument to enable codegen (#1552)
* add a CK_USE_CODEGEN build argument to enable codegen

* fix cmake codegen logic
2024-10-07 15:45:19 -07:00
Illia Silin
7d8ea5f08b Fix build logic using GRU_ARCHS. (#1536)
* update build logic with GPU_ARCHS

* fix the GPU_ARCHS build for codegen

* unset GPU_TARGETS when GPU_ARCHS are set
2024-10-07 08:18:23 -07:00
Bartłomiej Kocot
cc8f466a7e [CK_TILE] Fix conv param multiple definition (#1550)
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-10-07 15:21:21 +02:00
Adam Osewski
7ffb092155 Merge branch 'develop' into aosewski/ck_tile_universal_gemm_p1 2024-10-07 10:28:59 +00:00
Adam Osewski
4cf45f1b9c Add comment to load_tile_raw and change variable naming style. 2024-10-07 10:25:33 +00:00
rocking
0023f01ab0 [Ck tile] Support layernorm one pass (#1512)
* Fix compile error

* Add one pass pipeline

* Extract creating tile_window to operator()

* clang format

* reduce duplicated code

* do not hardcode

* Support padding in layernorm

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-10-07 14:25:53 +08:00
kylasa
c24fae2346 Adding seed and offset pointer support to the philox random number generator. (#1523)
* Adding seed and offset pointer support to the philox random number generator.

* Separating seed and offset pointer checks with different condition statements.

* Changes include, adding support for device seed and offset pointers, union is used to store seed/offset values and device pointers to minimize device SGPRs.

* Correcting a typo in the readme file

* Re-format files using remod.py

* Use STL type for API parameters

* Use simpler struct design for drop_seed & drop_offset

* Undo unnecessary changes

* Sync kargs style for fmha_fwd.hpp/.cpp

* Use templated union to reduce code

* Use structured binding to make code more readable

---------

Co-authored-by: Sudhir Kylasa <sukylasa@amd.com>
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-10-05 02:48:47 +08:00
arai713
b545de175a Codegen build (#1526)
* updating codegen build for MIOpen access: adding .cmake for codegen component

(cherry picked from commit 652a7c0463)

* updating CMake

(cherry picked from commit a685822e36)
2024-10-04 10:51:50 -07:00
Bartłomiej Kocot
6b54d2faf8 Fix grouped gemm check to avoid overflow (#1545) 2024-10-04 17:32:43 +02:00
macurtis-amd
aeb7c91f48 Fix compilation errors generated by forthcoming Clang changes (#1544)
Without this change, the following diagnostic is generated:
  a template argument list is expected after a name prefixed by the template
  keyword [-Wmissing-template-arg-list-after-template-kw]

See C++17 spec [temp.names] p5.
2024-10-02 13:56:22 -07:00
BrianHarrisonAMD
294cb82314 Add generating mha static library for gfx90a (#1540)
* Add generating mha static library for gfx90a

* Update comment to reflect changes
2024-10-02 09:26:11 -07:00
Adam Osewski
6ea43353f4 Fixes in pipeline. 2024-10-02 14:12:02 +00:00
Illia Silin
11b7a4db00 re-enable the FMHA performance monitoring (#1539) 2024-10-01 13:17:55 -07:00
Illia Silin
8e4c3fb1bc [CK_TILE] add missing vector header (#1537)
* add missing vector header

* Re-format header using remod.py

---------

Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com>
2024-10-01 07:58:20 -07:00
Adam Osewski
4f18c2ded3 Get hot loop and TailNum information before lunching kernel. 2024-10-01 14:40:03 +00:00
Adam Osewski
8bd49370f1 Refactoring & Move Layout info to pipeline problem. 2024-10-01 14:38:19 +00:00
Po Yen Chen
a1c07e8d91 [CK_TILE] Change output accum tensor layout of fmha fwd split-kv & combine kernels (#1527)
* Use same layout for o_acc and o tensor

* Use better param names in partitioner

* Remove redundant kargs 'max_seqlen_q'

* Use better param names in splitkv kernel

* Add comment for additional kernel arguments

* Sync empty loop early return logics between pipelines

* Pass more arguments to cmake in scripts

* Align backslashes

* Fix wrong o_acc tensor view strides

* Change o_acc layout if o_perm=0

* Handle whole row masked via attn_bias

* Use use vector width = 1 for o_acc

* Use more even split sizes
2024-10-01 22:13:52 +08:00
M.Emin Ozturk
4cd1dc7f06 Complex Contraction CK Bilinear Example (#1061)
* complex type contraction

* bug fix

* update

* Tensor Contraction Complex Data Type is working

* 4D Kernel

* some change

* validation check in progress

* validation issue

* fp32 verification error is fixed

* fp32 and fp64 are done

* remove old files

* remove cmake files

* remove cmake files

* Readme

* img verification

* CMakeList

* number changed

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: Emin Ozturk <emin.ozturk@utah.edu>
2024-09-30 21:05:42 -06:00
Adam Osewski
d3689b0686 Do not use ck_tile:: within ck_tile namespace. 2024-09-30 13:43:46 +00:00
Bartłomiej Kocot
de3e3b6424 [CK_TILE] Image to Column kernel (#1532)
* [CK_TILE] Image to Column kernel

* Fixes

* Vector loads and stores

* Fixes

* Fixes

* change test dir name
2024-09-27 22:57:38 +02:00
Dan Yao
9d69a099a4 [CK_TILE] Fix compiler related FA bwd issues (#1530)
* add barriers

* tail bias barriers

* adjust bf16/hd256 tol

* continue adjust bf16/hd256 tol
2024-09-26 12:18:39 -07:00
Illia Silin
42e6dceacc Fix compilation errors with Clang20.0. (#1533)
* fix clang20 compilation errors for gfx90a

* fix clang20 compilation errors for gfx11 targets
2024-09-25 13:45:38 -07:00
Illia Silin
65f8d1440f make CK CI use different git credentials (#1529) 2024-09-25 09:05:48 -07:00