Commit Graph

1481 Commits

Author SHA1 Message Date
mtgu0705
45c6c584a0 Modify the kernel to 128x128x64, and use mfma_32x32x4
Add int4+scale based on Zhang, Jing pk_i4. Compile pass, function pass.
2024-11-04 15:01:00 +08:00
Jing Zhang
f03dda4826 add ckProfiler 2024-10-27 07:27:23 -07:00
Jing Zhang
e463256fc7 fixed 2024-10-24 08:10:16 -07:00
Jing Zhang
9e15aa34a8 fixed 2024-10-24 08:09:43 -07:00
Jing Zhang
f16f55af0c failure with intrawave v2 2024-10-23 14:39:43 -07:00
Jing Zhang
e9b7f26799 clean 2024-10-23 14:00:36 -07:00
Jing Zhang
7cb3d6fd58 recover v3r1 2024-10-23 13:57:52 -07:00
Jing Zhang
786a0faaac add permute switch as a template 2024-10-23 13:42:00 -07:00
Jing Zhang
6a2521ea5d fixed splitk crush 2024-10-23 10:23:14 -07:00
Jing Zhang
af2c016631 add and_or_b32 2024-10-22 18:30:35 -07:00
Jing Zhang
6d0e78bdee improve weight layout 2024-10-22 18:09:30 -07:00
Jing Zhang
5d42067e90 format 2024-10-22 14:30:52 -07:00
Jing Zhang
9fed0adea8 weight permute with splitki 2024-10-22 14:29:01 -07:00
Jing Zhang
35d8627b39 clean 2024-10-22 11:58:40 -07:00
Jing Zhang
be98313d80 add b tile permute 2024-10-22 10:25:53 -07:00
Jing Zhang
e053e94764 weight permute 2024-10-21 21:18:07 -07:00
Jing Zhang
82bb8dde6e fixed splitk 2024-10-21 12:42:15 -07:00
Jing Zhang
65cfb2a15c format 2024-10-21 12:26:13 -07:00
Jing Zhang
398f8851c5 debug i4_to_f16_convert 2024-10-21 12:25:39 -07:00
Jing Zhang
222e968893 format 2024-10-20 09:59:32 -07:00
Jing Zhang
2807c69eff fixed tensor init 2024-10-20 09:42:00 -07:00
Jing Zhang
05ab9105f5 fixed reference and host_tensor 2024-10-19 19:53:17 -07:00
Jing Zhang
205e0365e3 fix 2024-10-18 10:12:37 -07:00
Jing Zhang
c13366af6d add fast pki4 to half conversion 2024-10-18 10:10:40 -07:00
Jing Zhang
24e18ae830 fixed coord reset 2024-10-15 19:48:46 -07:00
Jing Zhang
c3a4652a68 move packed into dynamic_buffer 2024-10-15 11:30:09 -07:00
Jing Zhang
77ad000e8a clean 2024-10-15 10:08:44 -07:00
Jing Zhang
40d038e90d clean 2024-10-14 22:10:44 -07:00
Jing Zhang
c3d05c0cf2 debug 2024-10-13 22:17:30 -07:00
Jing Zhang
3ef4d2c2c9 clean 2024-10-13 15:36:43 -07:00
Jing Zhang
0f3b88bf57 add a prototype of int4 2024-10-11 15:07:47 -07:00
Illia Silin
cfac9497e2 remove gfx12 targets from daily builds with rocm6.2 (#1560) 2024-10-09 10:18:05 -07:00
Christopher Millette
ceaed8e097 Fixes small memory leak from missing hipEventDestroy (#1554) 2024-10-09 09:41:35 +02:00
Rostyslav Geyyer
aa932445ea Add a gpu gemm reference kernel (#1528)
* Add a gpu gemm reference kernel

* Switch to gpu reference in gemm examples

* Remove redundant arguments

* Update all related examples

* Update more examples

* Try less threads per block

* Try even less threads per block

* Add support for all matrix layouts

* Increase block size

* Clean up

* Remove hardcoded strides

* Clean up

* Try a column-major case

* Revert back to row-major

* Run both CPU and GPU veriffication

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-10-08 11:05:28 -05:00
Po Yen Chen
0c094daa7e [CK_TILE] Update example README files & fix script compatibility issue (#1548)
* Fix text alignment of ArgParser::print()

* Update example README files

* Clarify make-ck-dev.sh <arch> usage

* Only keep some of the argument from '-?' output

* Undo command line output changes in README

* Only keep existing argument on doc and update description

* Fix text alignment

* Make cmake-ck-*.sh compatible with 'sh' command
2024-10-08 10:45:12 +08:00
Qianfeng
74d68e3b99 [CK_TILE] Simplify the codes in splitkv_combine pipeline (#1549)
* Simplify the codes in splitkv_combine pipeline

* Always set kPadSeqLenK=true for fmha splitkv kernels

* Change in Oacc Alignment and TileDistribution to be more adaptable to tile sizes

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-10-08 10:44:34 +08:00
Illia Silin
7733ae167b add a CK_USE_CODEGEN build argument to enable codegen (#1552)
* add a CK_USE_CODEGEN build argument to enable codegen

* fix cmake codegen logic
2024-10-07 15:45:19 -07:00
Illia Silin
7d8ea5f08b Fix build logic using GRU_ARCHS. (#1536)
* update build logic with GPU_ARCHS

* fix the GPU_ARCHS build for codegen

* unset GPU_TARGETS when GPU_ARCHS are set
2024-10-07 08:18:23 -07:00
Bartłomiej Kocot
cc8f466a7e [CK_TILE] Fix conv param multiple definition (#1550)
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-10-07 15:21:21 +02:00
rocking
0023f01ab0 [Ck tile] Support layernorm one pass (#1512)
* Fix compile error

* Add one pass pipeline

* Extract creating tile_window to operator()

* clang format

* reduce duplicated code

* do not hardcode

* Support padding in layernorm

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-10-07 14:25:53 +08:00
kylasa
c24fae2346 Adding seed and offset pointer support to the philox random number generator. (#1523)
* Adding seed and offset pointer support to the philox random number generator.

* Separating seed and offset pointer checks with different condition statements.

* Changes include, adding support for device seed and offset pointers, union is used to store seed/offset values and device pointers to minimize device SGPRs.

* Correcting a typo in the readme file

* Re-format files using remod.py

* Use STL type for API parameters

* Use simpler struct design for drop_seed & drop_offset

* Undo unnecessary changes

* Sync kargs style for fmha_fwd.hpp/.cpp

* Use templated union to reduce code

* Use structured binding to make code more readable

---------

Co-authored-by: Sudhir Kylasa <sukylasa@amd.com>
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-10-05 02:48:47 +08:00
arai713
b545de175a Codegen build (#1526)
* updating codegen build for MIOpen access: adding .cmake for codegen component

(cherry picked from commit 652a7c0463)

* updating CMake

(cherry picked from commit a685822e36)
2024-10-04 10:51:50 -07:00
Bartłomiej Kocot
6b54d2faf8 Fix grouped gemm check to avoid overflow (#1545) 2024-10-04 17:32:43 +02:00
macurtis-amd
aeb7c91f48 Fix compilation errors generated by forthcoming Clang changes (#1544)
Without this change, the following diagnostic is generated:
  a template argument list is expected after a name prefixed by the template
  keyword [-Wmissing-template-arg-list-after-template-kw]

See C++17 spec [temp.names] p5.
2024-10-02 13:56:22 -07:00
BrianHarrisonAMD
294cb82314 Add generating mha static library for gfx90a (#1540)
* Add generating mha static library for gfx90a

* Update comment to reflect changes
2024-10-02 09:26:11 -07:00
Illia Silin
11b7a4db00 re-enable the FMHA performance monitoring (#1539) 2024-10-01 13:17:55 -07:00
Illia Silin
8e4c3fb1bc [CK_TILE] add missing vector header (#1537)
* add missing vector header

* Re-format header using remod.py

---------

Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com>
2024-10-01 07:58:20 -07:00
Po Yen Chen
a1c07e8d91 [CK_TILE] Change output accum tensor layout of fmha fwd split-kv & combine kernels (#1527)
* Use same layout for o_acc and o tensor

* Use better param names in partitioner

* Remove redundant kargs 'max_seqlen_q'

* Use better param names in splitkv kernel

* Add comment for additional kernel arguments

* Sync empty loop early return logics between pipelines

* Pass more arguments to cmake in scripts

* Align backslashes

* Fix wrong o_acc tensor view strides

* Change o_acc layout if o_perm=0

* Handle whole row masked via attn_bias

* Use use vector width = 1 for o_acc

* Use more even split sizes
2024-10-01 22:13:52 +08:00
M.Emin Ozturk
4cd1dc7f06 Complex Contraction CK Bilinear Example (#1061)
* complex type contraction

* bug fix

* update

* Tensor Contraction Complex Data Type is working

* 4D Kernel

* some change

* validation check in progress

* validation issue

* fp32 verification error is fixed

* fp32 and fp64 are done

* remove old files

* remove cmake files

* remove cmake files

* Readme

* img verification

* CMakeList

* number changed

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: Emin Ozturk <emin.ozturk@utah.edu>
2024-09-30 21:05:42 -06:00
Bartłomiej Kocot
de3e3b6424 [CK_TILE] Image to Column kernel (#1532)
* [CK_TILE] Image to Column kernel

* Fixes

* Vector loads and stores

* Fixes

* Fixes

* change test dir name
2024-09-27 22:57:38 +02:00