Commit Graph

1451 Commits

Author SHA1 Message Date
PoYen, Chen
8fb015b83f Remove more debug statements 2024-07-24 06:48:40 +00:00
PoYen, Chen
5c733dc568 Remove debug statements 2024-07-24 06:47:52 +00:00
PoYen, Chen
2126d4d88d Add append-kv smoke tests 2024-07-24 06:35:53 +00:00
PoYen, Chen
f7fb3fafaa Allow only apply RoPE on Q (without append KV) 2024-07-24 06:26:00 +00:00
PoYen, Chen
08b4e8a125 Fix wrong rope key for fp8 pipeline 2024-07-24 06:06:07 +00:00
PoYen, Chen
d84c915549 Disable host verification if API not exist 2024-07-24 06:02:41 +00:00
PoYen, Chen
8a73d334b8 Rename utility function 2024-07-24 05:19:05 +00:00
PoYen, Chen
d59e098ec4 Fix wrong pipeline 2024-07-24 05:17:57 +00:00
PoYen, Chen
29c9b650b5 Align commit message to the real comment 2024-07-24 05:14:00 +00:00
PoYen, Chen
c7b7b44883 Add comment for why I just 't' for all padding flags 2024-07-24 05:13:16 +00:00
PoYen, Chen
59e1d9b84f Shift rotary_cos/rotary_sin by cache_seqlen_k 2024-07-24 05:06:47 +00:00
PoYen, Chen
a4da1e7f22 Remove RoPEComputeDataType type alias 2024-07-24 04:45:28 +00:00
PoYen, Chen
251f8cfea9 Merge branch 'develop' into feature/fmha-fwd-appendkv 2024-07-24 04:16:35 +00:00
PoYen, Chen
3348131699 Fix wrong data type for Q rotary_cos/rotary_sin 2024-07-24 04:10:43 +00:00
PoYen, Chen
5ea60715ea Update host/device specifiers 2024-07-24 03:45:19 +00:00
PoYen, Chen
6f95239229 Use different rotary_cos/rotary_sin distr for Q/Knew 2024-07-24 03:40:29 +00:00
PoYen, Chen
47a74f282d Extract Q/Knew vector size to helper methods 2024-07-24 03:23:18 +00:00
PoYen, Chen
eb4ea3ac2a Fix wrong rotary_cos/rotary_sin memory size for Q 2024-07-23 16:22:25 +00:00
Haocong WANG
d22713a719 disable bad instance (#1410) 2024-07-23 09:05:03 -07:00
PoYen, Chen
85bac93951 Fix wrong index into knew_host/vnew_host 2024-07-23 15:31:15 +00:00
PoYen, Chen
b11f92dc4c Fix wrong shape of knew_host/vnew_host 2024-07-23 14:52:42 +00:00
PoYen, Chen
ca4b208b60 Fix wrong grid size 2024-07-23 14:20:52 +00:00
PoYen, Chen
52b47810bb Rename more tile size constants 2024-07-23 09:30:05 +00:00
PoYen, Chen
99c1d463de Align naming of some tile size constants 2024-07-23 09:24:38 +00:00
PoYen, Chen
ce5e0f1d67 Re-order parameters 2024-07-23 09:02:41 +00:00
PoYen, Chen
fb80c7b2cb Extract rotary embedding logic out 2024-07-23 08:51:59 +00:00
PoYen, Chen
2192bbc68a Rename RotaryEmbeddingEnum 2024-07-23 07:50:50 +00:00
PoYen, Chen
d4606cf3c3 Rename header 2024-07-23 07:45:25 +00:00
PoYen, Chen
b275732128 Remove always true static_assert() 2024-07-23 07:25:50 +00:00
PoYen, Chen
eb649a2f25 Move thread locating logics into policy 2024-07-23 07:21:20 +00:00
PoYen, Chen
0e5cb6f913 Skip code if # of block is more than needed 2024-07-23 06:53:24 +00:00
PoYen, Chen
7124f3eda5 Add make_tile_window() for adding distribution only 2024-07-23 06:52:38 +00:00
PoYen, Chen
0925c0e941 Use better naming for tile indices 2024-07-23 06:40:53 +00:00
PoYen, Chen
bc7c7ee0c5 Fix wrong knew/vnew appending positions 2024-07-23 04:46:53 +00:00
PoYen, Chen
56df4d6397 Remove debug print code in kernel 2024-07-23 04:01:55 +00:00
PoYen, Chen
48c70720b5 Apply RoPE to q_tile 2024-07-23 03:54:11 +00:00
PoYen, Chen
e88253a2f4 Add code blocks for q_tile 2024-07-23 03:28:40 +00:00
PoYen, Chen
1dbed18555 Remove constness from q_ptr 2024-07-23 03:11:31 +00:00
PoYen, Chen
c26c60db4c Unify parameter/variable naming style 2024-07-23 02:59:17 +00:00
PoYen, Chen
c0bc097758 Apply elementwise function to the loaded tiles 2024-07-23 02:50:07 +00:00
PoYen, Chen
df352f955a Add comment 2024-07-23 02:31:45 +00:00
PoYen, Chen
d1ecfdc700 Support 8x rotary_dim under half-rotated RoPE 2024-07-23 02:19:16 +00:00
Bartłomiej Kocot
5d8c3d8190 Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406) 2024-07-22 14:21:24 +02:00
PoYen, Chen
631f29d527 Handle RoPE half-rotated logics 2024-07-22 08:50:03 +00:00
PoYen, Chen
1136e6b560 Fix error in RoPE host reference 2024-07-22 08:39:09 +00:00
PoYen, Chen
01865d2ae4 Clean-up pipeline 2024-07-22 03:14:10 +00:00
PoYen, Chen
fffd6799e6 Instantiate multiple kernels for RoPE approaches 2024-07-20 02:28:21 +00:00
Haocong WANG
8c90f25be3 [GEMM] F8 GEMM, performance optimized. (#1384)
* add ab_scale init support

* enabled interwave

* add scale type; update isSupport

* adjust example

* clean

* enable f8 pure gemm rcr ckprofiler

* Add gemm_multiply_multiply instances

* clang format

* Optimize for ScaleBlockMNK=128

* enable abscale f8 gemm ck profiler

* Add pure f8 gemm test suite

* Reverting to the state of project at f60fd77

* update copyright

* clang format

* update copyright

---------

Co-authored-by: root <jizhan@amd.com>
2024-07-19 22:06:52 +08:00
ltqin
c544eb4da0 Universal gemm splitk using reduce (with multi-d) (#1341)
* init for reduce_threadwise multi_d

* add reduce_threadwise_multi_d

* add reduce_multi_d

* clean

* start add an other splitk device op

* add reduce template parameter to SplitKBatchOffset

* add reduce c matrix

* clean up code

* change example data type to bf16

* add bf16Ai8B example

* remove reduce template parameter

* add splitk atomic status to v4

* example add multi d parameters

* device op add multi-d parameters

* add multi-d to reduce

* fix kbach=1 bug

* change B layout to col in  bf16Ai8B example

* remove float adding struct

* change  multi-d interface

* change file and class name

* remove multi-d of bf16Ai8B example

* change IsReduce function to IsReduceAdd

* change example layout to RRR from RCR

* according layout to set ds stride

* reset parameter layout

* add gemm universal reduce instance

* add reduce factory

* add profile_gemm_universal_reduce

* add reduce to profiler

* fix reduce instance

* fix profiler reduce compiling bug

* format

* format library instance code

* add mem instance for reduce library

* fix call instance names

* add workspace for reduce in ckProfiler

* format

* add mnpading to reduce library instance

* add fp16 instance to reduce of profiler

* change copyright time

* restore profiler cmake file

* add reduce text to instances

* add DsLayout and DsDataType to instances template parameter

* fixed gemm_reduce_multi_d

* add an example without multi_d

* Update common.hpp

* Update gtest.cmake

* Update gemm_xdl_splitk_reduce_bf16.cpp

* clean

* Update gtest.cmake

* format

* fixe api

* format

* default parameter change to RRR

* add vector_len for multi_d

* format

* Update gtest.cmake

* fix bf16A iBB elementwiseop

* add ReduceDataType

* move ReduceDataType to end position

* format

* remove googletest git method  address

* fix copyright time

* update init data

---------

Co-authored-by: root <jizhan@amd.com>
Co-authored-by: letaoqin <letaoqin@amd.com>
Co-authored-by: Jing Zhang <jizhan@meta.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
2024-07-19 22:01:22 +08:00
Bartłomiej Kocot
70a814f163 Refactor transform conv to gemm fwd (#1391)
* Refactor transform conv to gemm fwd

* fixes codegen

* wmma fixes

* fix wmma

* Fix copyright
2024-07-19 09:29:25 +02:00