Commit Graph

514 Commits

Author SHA1 Message Date
PoYen, Chen
2192bbc68a Rename RotaryEmbeddingEnum 2024-07-23 07:50:50 +00:00
PoYen, Chen
d4606cf3c3 Rename header 2024-07-23 07:45:25 +00:00
PoYen, Chen
b275732128 Remove always true static_assert() 2024-07-23 07:25:50 +00:00
PoYen, Chen
eb649a2f25 Move thread locating logics into policy 2024-07-23 07:21:20 +00:00
PoYen, Chen
0e5cb6f913 Skip code if # of block is more than needed 2024-07-23 06:53:24 +00:00
PoYen, Chen
7124f3eda5 Add make_tile_window() for adding distribution only 2024-07-23 06:52:38 +00:00
PoYen, Chen
0925c0e941 Use better naming for tile indices 2024-07-23 06:40:53 +00:00
PoYen, Chen
bc7c7ee0c5 Fix wrong knew/vnew appending positions 2024-07-23 04:46:53 +00:00
PoYen, Chen
56df4d6397 Remove debug print code in kernel 2024-07-23 04:01:55 +00:00
PoYen, Chen
48c70720b5 Apply RoPE to q_tile 2024-07-23 03:54:11 +00:00
PoYen, Chen
e88253a2f4 Add code blocks for q_tile 2024-07-23 03:28:40 +00:00
PoYen, Chen
1dbed18555 Remove constness from q_ptr 2024-07-23 03:11:31 +00:00
PoYen, Chen
c26c60db4c Unify parameter/variable naming style 2024-07-23 02:59:17 +00:00
PoYen, Chen
c0bc097758 Apply elementwise function to the loaded tiles 2024-07-23 02:50:07 +00:00
PoYen, Chen
df352f955a Add comment 2024-07-23 02:31:45 +00:00
PoYen, Chen
d1ecfdc700 Support 8x rotary_dim under half-rotated RoPE 2024-07-23 02:19:16 +00:00
PoYen, Chen
631f29d527 Handle RoPE half-rotated logics 2024-07-22 08:50:03 +00:00
PoYen, Chen
1136e6b560 Fix error in RoPE host reference 2024-07-22 08:39:09 +00:00
PoYen, Chen
01865d2ae4 Clean-up pipeline 2024-07-22 03:14:10 +00:00
PoYen, Chen
fffd6799e6 Instantiate multiple kernels for RoPE approaches 2024-07-20 02:28:21 +00:00
PoYen, Chen
27b5141706 Fix wrong thread starting offset 2024-07-18 20:02:06 +00:00
PoYen, Chen
23450526c0 Only apply interleaved RoPE on Knew for now 2024-07-18 19:42:14 +00:00
PoYen, Chen
85bfed07fa Add dram distribution for rotary_cos/rotary_sin (interleaved) 2024-07-18 09:11:22 +00:00
PoYen, Chen
39ef09bb23 Remove unused inner namespace 2024-07-18 09:10:51 +00:00
PoYen, Chen
99f863e4cd Fix rotary cos/sin tensor/tile size 2024-07-16 06:31:17 +00:00
PoYen, Chen
b32fd8d3f4 Rename variables used in distributio encoding 2024-07-16 06:27:28 +00:00
PoYen, Chen
879710a495 Fix wrong seqlen_k for kvcache 2024-07-16 03:42:51 +00:00
PoYen, Chen
b0925bb7f6 Create Rotary Cos/Sin tile windows in kernel 2024-07-14 23:47:40 +00:00
PoYen, Chen
391210ed9e Pass RoPE kernel args 2024-07-14 23:18:32 +00:00
PoYen, Chen
b5ad1411b0 Merge branch 'feature/cond-add-splitkv' into feature/fmha-fwd-appendkv 2024-07-14 22:13:17 +00:00
Bartłomiej Kocot
82e8a78a3f Support access per groups and filter3x3 in grouped conv fwd (#1382)
* Support access per groups and filter3x3 in grouped conv fwd

* Fixes for large cases

* Fixes for large tensors
2024-07-12 11:08:42 -07:00
PoYen, Chen
44c9bacff7 Rename function: add "batched" prefix 2024-07-12 06:51:31 +00:00
PoYen, Chen
ff75eff3bf Reduce input/output dimensions 2024-07-12 06:49:43 +00:00
PoYen, Chen
b34ddf5f71 Merge remote-tracking branch 'origin/feature/cond-add-splitkv' into feature/fmha-fwd-appendkv 2024-07-12 04:42:45 +00:00
PoYen, Chen
ee365bbc66 Fix wrong answer when interleaved=true 2024-07-11 00:26:18 +00:00
PoYen, Chen
52da00acd6 Fix wrong answer when interleaved=false 2024-07-10 12:50:00 +00:00
PoYen, Chen
8c733fb3be Fix compilation errors 2024-07-10 10:53:58 +00:00
PoYen, Chen
03b6d99be0 Fix typo of HostTensor<>::get_length() 2024-07-10 09:33:15 +00:00
PoYen, Chen
9d29311da0 Finish reference_rotary_position_embedding() impl 2024-07-10 09:16:54 +00:00
PoYen, Chen
f2d28e8ab4 Add reference_rotary_position_embedding() (not implemented) 2024-07-09 05:22:08 +00:00
PoYen, Chen
2e164f1b79 Add length/stride getters for HostTensor 2024-07-09 05:20:04 +00:00
carlushuang
8182976c37 [CK_TILE] wa prec, remove sgpr offset for inline asm (#1356)
* wa prec, remove sgpr offset for inline asm

* macro for set tile

* ignore unused param if no kernel instances in host API

* fix more prec issue

* cache buffer resource

* fix

* support pre-nop

* clear tile by vector type members

* add workaround to reduce scratch memory

* conditionally enable workaround code

* enable workaround start from certain build version

* fallback set_tile() implementation from certain build version

* undo template argument changes

* put dummy asm in load_raw()

* fix comments, refactor s_nop inside buffer_load

---------

Co-authored-by: PoYen, Chen <PoYen.Chen@amd.com>
2024-07-08 11:09:55 -07:00
PoYen, Chen
1c070380fa Merge branch 'feature/cond-add-splitkv' into feature/fmha-fwd-appendkv 2024-07-08 07:13:34 +00:00
Harisankar Sadasivan
75e622f02f Universal streamk with atomics (#1360)
* universal streamk with atomics with ckprofiler support. grid_size and streamk strategy are tunable. grid_size of -1 leads to #WGs = maximum occupancy X num_CUs. implementation supports many different streamk policies: 1-tile, 2-tile, 3-tile and 4-tile. streamk strategy of -1 leads to default streamk policy (4-tile). 

* Update README.md

* fixing clang-format issues

* removed conflicts in struct members between streamk and universal streamk

* corrected arg parsing for streamk and universal streamk

* added stream-k policies for 3 tile and 4 tile

* fixed argument type issue with parsing cmd args

* changes suggested in PR review are made- removing comments and correcting copyright

* file permissions updated

* added default value support for grid_size and streamk-policy selection set to -1

* print messages for arguments

* print messages for arguments

* print messages for arguments1
2024-07-05 21:40:30 -07:00
jakpiase
eaa870a1ab Add structural sparsity xdlops (#1363)
* Implemented smfmac xdlops

* add reviewer comments
2024-07-04 12:00:14 +02:00
Jun Liu
959073842c Fix issue with multiple targets and remove smfmac tests from unsupported test targets (#1372) 2024-07-03 23:34:38 -07:00
PoYen, Chen
34a3ff849f Fix Vnew tile dstr for row major case 2024-06-27 09:55:35 +00:00
jakpiase
ed21948bcd Add structural sparsity gemm instruction tests (#1309)
* first version of smfmac test

* add reviewer comments

* add reviewer suggestions
2024-06-27 11:30:32 +02:00
Illia Silin
941d1f7ce0 Merging the gfx12 code into public repo. (#1362) 2024-06-27 00:33:34 -07:00
PoYen, Chen
c40c1daff0 Extract common logics 2024-06-26 18:02:28 +00:00