Commit Graph

371 Commits

Author SHA1 Message Date
PoYen, Chen
391210ed9e Pass RoPE kernel args 2024-07-14 23:18:32 +00:00
PoYen, Chen
b5ad1411b0 Merge branch 'feature/cond-add-splitkv' into feature/fmha-fwd-appendkv 2024-07-14 22:13:17 +00:00
PoYen, Chen
8c1647d778 Avoid invoking deprecated method 'find_module' 2024-07-14 22:10:30 +00:00
PoYen, Chen
55f55025ee Fix wrong tensor size 2024-07-14 15:40:56 +00:00
PoYen, Chen
93e5125d7a Rename RoPE utility function 2024-07-14 14:48:06 +00:00
PoYen, Chen
83d6acc111 Apply RoPE on host side 2024-07-14 14:45:17 +00:00
PoYen, Chen
3183b68921 Simplify v_host_ref definition 2024-07-12 06:42:41 +00:00
PoYen, Chen
e5885cab83 Simplify K appending logics 2024-07-12 06:37:23 +00:00
PoYen, Chen
3578c6f836 Append K/V in the host verification code 2024-07-12 06:32:35 +00:00
PoYen, Chen
4107bf03a6 Merge remote-tracking branch 'origin/feature/cond-add-splitkv' into feature/fmha-fwd-appendkv 2024-07-12 04:43:04 +00:00
PoYen, Chen
b34ddf5f71 Merge remote-tracking branch 'origin/feature/cond-add-splitkv' into feature/fmha-fwd-appendkv 2024-07-12 04:42:45 +00:00
Po Yen Chen
b4306af655 Merge branch 'develop' into feature/cond-add-splitkv 2024-07-12 12:34:31 +08:00
zjing14
13c1e64daa add gemm_bias_add example (#1361)
* add gemm_bias_add example

* changed strideD

* clang-format

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2024-07-11 18:08:07 -07:00
Rostyslav Geyyer
7a46a91c84 Add instances for grouped conv fwd 3d with ConvScale for bf8@fp8->fp8 (#1369)
* Add an example

* Add instances

* Add a client example
2024-07-11 13:31:39 -07:00
Illia Silin
98a01bbc72 Add CK_TILE tests to daily CI builds. (#1381)
* add ck_tile tests to CI

* build and run ck_tile tests on gfx90a and gfx942 in parallel

* fix groovy syntax

* turn ck_tile tests OFF by default

* skip creating the build folder

* build ck_tile examples with 64 threads

* build ck_tile examples with cmake-ck-dev.sh script

* add video group to docker on mi300

* do not retry to rebuild the early CI stages

* help prevent jenkins false failure

* restore cron trigger
2024-07-11 13:22:40 -07:00
carlushuang
bbdb0a5dc0 Merge branch 'develop' into feature/cond-add-splitkv 2024-07-11 16:01:19 +08:00
PoYen, Chen
8c733fb3be Fix compilation errors 2024-07-10 10:53:58 +00:00
PoYen, Chen
e939082bdc Add RoPE example utilities 2024-07-09 05:20:47 +00:00
Illia Silin
a328df25a1 Fix the cmake logic when building with INSTANCES_ONLY=ON. (#1376)
* fix the cmake logic when building for various targets

* another minor fix
2024-07-08 21:21:16 -07:00
Po Yen Chen
dc72074ec7 Merge branch 'develop' into feature/cond-add-splitkv 2024-07-09 03:42:25 +08:00
carlushuang
8182976c37 [CK_TILE] wa prec, remove sgpr offset for inline asm (#1356)
* wa prec, remove sgpr offset for inline asm

* macro for set tile

* ignore unused param if no kernel instances in host API

* fix more prec issue

* cache buffer resource

* fix

* support pre-nop

* clear tile by vector type members

* add workaround to reduce scratch memory

* conditionally enable workaround code

* enable workaround start from certain build version

* fallback set_tile() implementation from certain build version

* undo template argument changes

* put dummy asm in load_raw()

* fix comments, refactor s_nop inside buffer_load

---------

Co-authored-by: PoYen, Chen <PoYen.Chen@amd.com>
2024-07-08 11:09:55 -07:00
PoYen, Chen
18a3834fb4 Set num_splits=1 if split-kv is not supported 2024-07-08 10:27:32 +00:00
PoYen, Chen
8ac6bacf26 Unify CMakeLists.txt coding style 2024-07-08 10:19:31 +00:00
PoYen, Chen
5d21b4d736 Merge branch 'feature/cond-add-splitkv' into feature/fmha-fwd-appendkv 2024-07-08 10:18:28 +00:00
PoYen, Chen
6ca3910199 Show message if we are ignoring option 2024-07-08 10:17:55 +00:00
PoYen, Chen
fe4ae5dcd9 Early return if 0 < s_k_new is not supported 2024-07-08 10:09:36 +00:00
PoYen, Chen
be076db91c Merge branch 'feature/cond-add-splitkv' into feature/fmha-fwd-appendkv 2024-07-08 10:03:58 +00:00
PoYen, Chen
aba46cd655 Regsiter API handlers automatically 2024-07-08 09:39:15 +00:00
PoYen, Chen
3aefb560e0 Remove "EXAMPLE_" prefix of cmake variables 2024-07-08 07:17:24 +00:00
PoYen, Chen
1c070380fa Merge branch 'feature/cond-add-splitkv' into feature/fmha-fwd-appendkv 2024-07-08 07:13:34 +00:00
PoYen, Chen
82f3b3d0a0 Conditionally add call to fmha_fwd_splitkv() 2024-07-08 06:40:18 +00:00
PoYen, Chen
efd18fa887 Conditionally add fwd_splitkv API in fmha_fwd example 2024-07-08 06:27:44 +00:00
Harisankar Sadasivan
75e622f02f Universal streamk with atomics (#1360)
* universal streamk with atomics with ckprofiler support. grid_size and streamk strategy are tunable. grid_size of -1 leads to #WGs = maximum occupancy X num_CUs. implementation supports many different streamk policies: 1-tile, 2-tile, 3-tile and 4-tile. streamk strategy of -1 leads to default streamk policy (4-tile). 

* Update README.md

* fixing clang-format issues

* removed conflicts in struct members between streamk and universal streamk

* corrected arg parsing for streamk and universal streamk

* added stream-k policies for 3 tile and 4 tile

* fixed argument type issue with parsing cmd args

* changes suggested in PR review are made- removing comments and correcting copyright

* file permissions updated

* added default value support for grid_size and streamk-policy selection set to -1

* print messages for arguments

* print messages for arguments

* print messages for arguments1
2024-07-05 21:40:30 -07:00
Jun Liu
959073842c Fix issue with multiple targets and remove smfmac tests from unsupported test targets (#1372) 2024-07-03 23:34:38 -07:00
Ruturaj Vaidya
2525864fda Update CMakeLists.txt (#1364)
It is a good practice to check if the file CMakeLists.txt is in fact in the directory.
2024-06-27 12:34:25 -07:00
Illia Silin
941d1f7ce0 Merging the gfx12 code into public repo. (#1362) 2024-06-27 00:33:34 -07:00
PoYen, Chen
8fb567c286 Fix vnew append errro 2024-06-26 17:00:07 +00:00
Po Yen Chen
0cb2e06ddc [CK_TILE] fmha forward split-kv + combine kernels (#1338)
* FA fwd dropout

* FA bwd

* epilogue reuse

* CMakeLists update

* [CK_TILE] support alibi (#1269)

* add alibi support

* fix code

* update code based on comment

* Support more hdim

* fix fp8 bias

* support seqlen_k=0 case

* remove unused printf

* fix format

---------

Co-authored-by: rocking <ChunYu.Lai@amd.com>

* now fwd/bwd can build

* bwd alibi

* add bwd validation stream_config

* update generated filenames

* update bwd kernel launch

* CK_TILE_HOST_DEVICE in philox

* Transpose -> transpose

* format

* format

* format

* Generate the instance for FA required

* format

* fix error in WarpGemm

* Add num_splits option and dummy split-kv api method

* Generate fmha_fwd_splitkv()

* Add SplitKV kernel codegen logics

* Add SplitKV combine kernel codegen logics

* Fix mismatched return type

* Clean-up code

* Replace sentinel value before storing

* Fix wrong layout of LSE/LSEacc/Oacc

* Format codes

* Fix o_acc memory error

* Fix wrong kBlockSize used in policy

* Reduce # of combine kernels

* Fix split-kv combine kernel name

* Fix wrong LDS indexing logics

* Fix wrong loop counter step logic

* Undo vector size changes

* Remove no-longer used field

* Remove in-consistent comment

* Remove debug statements in example

* Remove more debug statements

* Add constness to local variables

* Clearn up generate.py

* Fix unstable clang-format comment

* Remove unused include directive

* Use shorter template parameter name

* Enable non-split-kv blobs

* Update license date

* Print num_splits conditionally

* Undo disabling data types

* Remove unnessary tile size for fp8

* Fix wrong pipeline args for fp8

* Fix example output format

* Remove more debug code in combine pipeline

* Add stride kernel arguments for LSE/O acc workspace

* Re-order split-kv pipeline call operator arguments

* Pass LSE/O strides in kernel argument

* Re-order pipeline call operator arguments

* Use tensor_descriptor to locate LSEacc elements

* Support providing invalid element for tensor view

* Set invalid element value for LSEacc tensor view

* Remove hand-written store_tile() code

* Remove necessary value-overwrite logic

* Add transposed lds descriptor

* Support load_tile() for tile_window_with_static_lengths<>

* Undo removing necessary value-overwrite logic

* Use read descriptor to locate lds elements

* Simplify pipeline source code

* Add constraint to kMaxSplits

* Default use kMaxSplits=64 in generate.py

* Revert "Add constraint to kMaxSplits"

This reverts commit 0a2132d758.

* Revert "Default use kMaxSplits=64 in generate.py"

This reverts commit c7d9c80b77.

* Decide alignment by the padding parameter

* Remove no-longer used utility functions

* Remove not-working code

* Add comment & remove no-longer used code

* Fix computation errors

* Add heuristic to override num_splits option

* Add constraint to kMaxSplits

* Fix compilation error

* Clean up pipeline code

* Wrap pointer access as lambda function

* Rename confusing methods

* Use kLogMasSplits as template parameter

* Finish splitkv combine kernel codegen

* Update kMaxSplits limit

* Use smaller kM0 for splitkv combine kernel

* Ignore droupout flag in splitkv pipeline

* Unify flag usage

* Add back flag kStoreLSE

* Merge lambda calls in pipeline

* Fix compilation errors

* Avoid all empty splits

* Always check for empty loop in splitkv pipelines

* Re-order parameters

* Remove redundant p_drop option check

* Add traits/problem for fwd splitkv kernel

* Conditionally enable uneven split boundary checks

* Add comment for the splitkv traits field

* Change even split criteria

* Re-order statements

* Refine occupancy value for hdim=128&256

* Refine occupancy value for hdim=32&64

* Remove redundant kernel argument

* Separate fmha bwd codegen logics

* Separate fmha fwd codegen logics

* Remove redundant direction parameter in fwd&bwd codegen logics

* Support generate multiple APIs for an example

* Let 'api' an alias of 'direction' option

* Remove choices for the 'direction' option

* Use dictionary to config all the functions

* Move fmha splitkv codegen logics to other file

* Add fwd_splitkv api for tile_example_fmha_fwd

---------

Co-authored-by: danyao12 <danyao12>
Co-authored-by: carlushuang <carlus.huang@amd.com>
Co-authored-by: rocking <ChunYu.Lai@amd.com>
Co-authored-by: Jing Zhang <jizhan@amd.com>
2024-06-26 17:41:15 +08:00
PoYen, Chen
4e6c28522c Fix wrong K values after appending 2024-06-25 10:12:13 +00:00
PoYen, Chen
1ac17dae50 Add knew/vnew tensors to the kernel argument 2024-06-25 07:56:36 +00:00
PoYen, Chen
344902732a Sync kernel name with the codegen 2024-06-24 14:50:25 +00:00
PoYen, Chen
eee035ade5 Setup meaningfull arguments 2024-06-24 14:34:31 +00:00
PoYen, Chen
bace0e5df0 Add init codegen logic for fmha fwd appendkv 2024-06-24 12:33:51 +00:00
rocking
cb13839425 layernorm2d forward (#1339)
* Add layernorm2d forward

* Refind file path

* clang format

* Exclude ck_tile op from all

* use add_executable instead

* refactor layernorm2d_fwd example

---------

Co-authored-by: carlushuang <carlus.huang@amd.com>
2024-06-24 08:45:52 +08:00
PoYen, Chen
4060416c97 Use dictionary to config all the functions 2024-06-23 19:22:25 +00:00
PoYen, Chen
d0b9fd0c5c Merge branch 'develop' into feature/refactor-fmha-codegen 2024-06-23 18:54:08 +00:00
Andriy Roshchenko
05b10e0e5a Add instances of grouped convolution 3d forward with a ConvScale element-wise op for bf8@bf8->fp8 (#1326)
We are adding more instances of grouped convolution 3d forward with a ConvScale element-wise operation.
This commit handles bf8@bf8->fp8 data types combination.

* Included an example.
* Added instances.
* Added a client example.

---------

Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com>
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
2024-06-21 19:02:57 -06:00
PoYen, Chen
9fe165fe1d Remove choices for the 'direction' option 2024-06-21 12:45:28 +00:00
PoYen, Chen
2d57c34348 Let 'api' an alias of 'direction' option 2024-06-21 10:38:25 +00:00
PoYen, Chen
51487f238a Support generate multiple APIs for an example 2024-06-21 10:04:17 +00:00