Commit Graph

79 Commits

Author SHA1 Message Date
PoYen, Chen
f053ae2b5b Add missing init code 2024-07-24 07:12:06 +00:00
PoYen, Chen
c50c36a07f Re-arrange the 'set +x' command 2024-07-24 06:56:53 +00:00
PoYen, Chen
8fb015b83f Remove more debug statements 2024-07-24 06:48:40 +00:00
PoYen, Chen
2126d4d88d Add append-kv smoke tests 2024-07-24 06:35:53 +00:00
PoYen, Chen
f7fb3fafaa Allow only apply RoPE on Q (without append KV) 2024-07-24 06:26:00 +00:00
PoYen, Chen
08b4e8a125 Fix wrong rope key for fp8 pipeline 2024-07-24 06:06:07 +00:00
PoYen, Chen
d84c915549 Disable host verification if API not exist 2024-07-24 06:02:41 +00:00
PoYen, Chen
8a73d334b8 Rename utility function 2024-07-24 05:19:05 +00:00
PoYen, Chen
d59e098ec4 Fix wrong pipeline 2024-07-24 05:17:57 +00:00
PoYen, Chen
29c9b650b5 Align commit message to the real comment 2024-07-24 05:14:00 +00:00
PoYen, Chen
c7b7b44883 Add comment for why I just 't' for all padding flags 2024-07-24 05:13:16 +00:00
PoYen, Chen
59e1d9b84f Shift rotary_cos/rotary_sin by cache_seqlen_k 2024-07-24 05:06:47 +00:00
PoYen, Chen
a4da1e7f22 Remove RoPEComputeDataType type alias 2024-07-24 04:45:28 +00:00
PoYen, Chen
eb4ea3ac2a Fix wrong rotary_cos/rotary_sin memory size for Q 2024-07-23 16:22:25 +00:00
PoYen, Chen
85bac93951 Fix wrong index into knew_host/vnew_host 2024-07-23 15:31:15 +00:00
PoYen, Chen
b11f92dc4c Fix wrong shape of knew_host/vnew_host 2024-07-23 14:52:42 +00:00
PoYen, Chen
ca4b208b60 Fix wrong grid size 2024-07-23 14:20:52 +00:00
PoYen, Chen
2192bbc68a Rename RotaryEmbeddingEnum 2024-07-23 07:50:50 +00:00
PoYen, Chen
48c70720b5 Apply RoPE to q_tile 2024-07-23 03:54:11 +00:00
PoYen, Chen
1dbed18555 Remove constness from q_ptr 2024-07-23 03:11:31 +00:00
PoYen, Chen
631f29d527 Handle RoPE half-rotated logics 2024-07-22 08:50:03 +00:00
PoYen, Chen
fffd6799e6 Instantiate multiple kernels for RoPE approaches 2024-07-20 02:28:21 +00:00
PoYen, Chen
23450526c0 Only apply interleaved RoPE on Knew for now 2024-07-18 19:42:14 +00:00
PoYen, Chen
e83c3c7fa0 Add constraint to the rotary_dim option 2024-07-16 06:54:37 +00:00
PoYen, Chen
879710a495 Fix wrong seqlen_k for kvcache 2024-07-16 03:42:51 +00:00
PoYen, Chen
65dac9fb90 Fix wrong boundaries 2024-07-15 01:42:53 +00:00
PoYen, Chen
4e01307e04 Fix compilation error in debug mode 2024-07-15 01:26:46 +00:00
PoYen, Chen
1a093f94b2 Add minimum seqlen_k to generate compliance kvcache 2024-07-15 01:11:16 +00:00
PoYen, Chen
57c6a4125c Fix seqlen_knew enabling check logic 2024-07-15 00:40:39 +00:00
PoYen, Chen
ad61d9d4b2 Randomly generate seqlen_knew if needed 2024-07-15 00:39:03 +00:00
PoYen, Chen
f6850aef29 Add compute data type alias for RoPE 2024-07-15 00:05:33 +00:00
PoYen, Chen
391210ed9e Pass RoPE kernel args 2024-07-14 23:18:32 +00:00
PoYen, Chen
b5ad1411b0 Merge branch 'feature/cond-add-splitkv' into feature/fmha-fwd-appendkv 2024-07-14 22:13:17 +00:00
PoYen, Chen
8c1647d778 Avoid invoking deprecated method 'find_module' 2024-07-14 22:10:30 +00:00
PoYen, Chen
55f55025ee Fix wrong tensor size 2024-07-14 15:40:56 +00:00
PoYen, Chen
93e5125d7a Rename RoPE utility function 2024-07-14 14:48:06 +00:00
PoYen, Chen
83d6acc111 Apply RoPE on host side 2024-07-14 14:45:17 +00:00
PoYen, Chen
3183b68921 Simplify v_host_ref definition 2024-07-12 06:42:41 +00:00
PoYen, Chen
e5885cab83 Simplify K appending logics 2024-07-12 06:37:23 +00:00
PoYen, Chen
3578c6f836 Append K/V in the host verification code 2024-07-12 06:32:35 +00:00
PoYen, Chen
4107bf03a6 Merge remote-tracking branch 'origin/feature/cond-add-splitkv' into feature/fmha-fwd-appendkv 2024-07-12 04:43:04 +00:00
PoYen, Chen
b34ddf5f71 Merge remote-tracking branch 'origin/feature/cond-add-splitkv' into feature/fmha-fwd-appendkv 2024-07-12 04:42:45 +00:00
Po Yen Chen
b4306af655 Merge branch 'develop' into feature/cond-add-splitkv 2024-07-12 12:34:31 +08:00
Illia Silin
98a01bbc72 Add CK_TILE tests to daily CI builds. (#1381)
* add ck_tile tests to CI

* build and run ck_tile tests on gfx90a and gfx942 in parallel

* fix groovy syntax

* turn ck_tile tests OFF by default

* skip creating the build folder

* build ck_tile examples with 64 threads

* build ck_tile examples with cmake-ck-dev.sh script

* add video group to docker on mi300

* do not retry to rebuild the early CI stages

* help prevent jenkins false failure

* restore cron trigger
2024-07-11 13:22:40 -07:00
PoYen, Chen
8c733fb3be Fix compilation errors 2024-07-10 10:53:58 +00:00
PoYen, Chen
e939082bdc Add RoPE example utilities 2024-07-09 05:20:47 +00:00
Po Yen Chen
dc72074ec7 Merge branch 'develop' into feature/cond-add-splitkv 2024-07-09 03:42:25 +08:00
carlushuang
8182976c37 [CK_TILE] wa prec, remove sgpr offset for inline asm (#1356)
* wa prec, remove sgpr offset for inline asm

* macro for set tile

* ignore unused param if no kernel instances in host API

* fix more prec issue

* cache buffer resource

* fix

* support pre-nop

* clear tile by vector type members

* add workaround to reduce scratch memory

* conditionally enable workaround code

* enable workaround start from certain build version

* fallback set_tile() implementation from certain build version

* undo template argument changes

* put dummy asm in load_raw()

* fix comments, refactor s_nop inside buffer_load

---------

Co-authored-by: PoYen, Chen <PoYen.Chen@amd.com>
2024-07-08 11:09:55 -07:00
PoYen, Chen
18a3834fb4 Set num_splits=1 if split-kv is not supported 2024-07-08 10:27:32 +00:00
PoYen, Chen
8ac6bacf26 Unify CMakeLists.txt coding style 2024-07-08 10:19:31 +00:00