Commit Graph

1385 Commits

Author SHA1 Message Date
PoYen, Chen
4e01307e04 Fix compilation error in debug mode 2024-07-15 01:26:46 +00:00
PoYen, Chen
1a093f94b2 Add minimum seqlen_k to generate compliance kvcache 2024-07-15 01:11:16 +00:00
PoYen, Chen
57c6a4125c Fix seqlen_knew enabling check logic 2024-07-15 00:40:39 +00:00
PoYen, Chen
ad61d9d4b2 Randomly generate seqlen_knew if needed 2024-07-15 00:39:03 +00:00
PoYen, Chen
f6850aef29 Add compute data type alias for RoPE 2024-07-15 00:05:33 +00:00
PoYen, Chen
b0925bb7f6 Create Rotary Cos/Sin tile windows in kernel 2024-07-14 23:47:40 +00:00
PoYen, Chen
391210ed9e Pass RoPE kernel args 2024-07-14 23:18:32 +00:00
PoYen, Chen
b5ad1411b0 Merge branch 'feature/cond-add-splitkv' into feature/fmha-fwd-appendkv 2024-07-14 22:13:17 +00:00
PoYen, Chen
c6717bb300 Merge branch 'feature/cond-add-splitkv' of github.com:ROCm/composable_kernel into feature/cond-add-splitkv 2024-07-14 22:11:39 +00:00
PoYen, Chen
8c1647d778 Avoid invoking deprecated method 'find_module' 2024-07-14 22:10:30 +00:00
Po Yen Chen
5ce0fecf36 Merge branch 'develop' into feature/cond-add-splitkv 2024-07-15 05:48:51 +08:00
PoYen, Chen
55f55025ee Fix wrong tensor size 2024-07-14 15:40:56 +00:00
PoYen, Chen
93e5125d7a Rename RoPE utility function 2024-07-14 14:48:06 +00:00
PoYen, Chen
83d6acc111 Apply RoPE on host side 2024-07-14 14:45:17 +00:00
Bartłomiej Kocot
82e8a78a3f Support access per groups and filter3x3 in grouped conv fwd (#1382)
* Support access per groups and filter3x3 in grouped conv fwd

* Fixes for large cases

* Fixes for large tensors
2024-07-12 11:08:42 -07:00
PoYen, Chen
44c9bacff7 Rename function: add "batched" prefix 2024-07-12 06:51:31 +00:00
PoYen, Chen
ff75eff3bf Reduce input/output dimensions 2024-07-12 06:49:43 +00:00
PoYen, Chen
3183b68921 Simplify v_host_ref definition 2024-07-12 06:42:41 +00:00
PoYen, Chen
e5885cab83 Simplify K appending logics 2024-07-12 06:37:23 +00:00
PoYen, Chen
3578c6f836 Append K/V in the host verification code 2024-07-12 06:32:35 +00:00
PoYen, Chen
4107bf03a6 Merge remote-tracking branch 'origin/feature/cond-add-splitkv' into feature/fmha-fwd-appendkv 2024-07-12 04:43:04 +00:00
PoYen, Chen
b34ddf5f71 Merge remote-tracking branch 'origin/feature/cond-add-splitkv' into feature/fmha-fwd-appendkv 2024-07-12 04:42:45 +00:00
Po Yen Chen
b4306af655 Merge branch 'develop' into feature/cond-add-splitkv 2024-07-12 12:34:31 +08:00
zjing14
13c1e64daa add gemm_bias_add example (#1361)
* add gemm_bias_add example

* changed strideD

* clang-format

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2024-07-11 18:08:07 -07:00
Rostyslav Geyyer
7a46a91c84 Add instances for grouped conv fwd 3d with ConvScale for bf8@fp8->fp8 (#1369)
* Add an example

* Add instances

* Add a client example
2024-07-11 13:31:39 -07:00
Illia Silin
98a01bbc72 Add CK_TILE tests to daily CI builds. (#1381)
* add ck_tile tests to CI

* build and run ck_tile tests on gfx90a and gfx942 in parallel

* fix groovy syntax

* turn ck_tile tests OFF by default

* skip creating the build folder

* build ck_tile examples with 64 threads

* build ck_tile examples with cmake-ck-dev.sh script

* add video group to docker on mi300

* do not retry to rebuild the early CI stages

* help prevent jenkins false failure

* restore cron trigger
2024-07-11 13:22:40 -07:00
Illia Silin
f914c228c6 [Jenkins] restore cron jobs (#1380)
* test the cron trigger

* fix the cron jobs

* restore the list of cron jobs
2024-07-11 10:28:11 -07:00
carlushuang
bbdb0a5dc0 Merge branch 'develop' into feature/cond-add-splitkv 2024-07-11 16:01:19 +08:00
PoYen, Chen
ee365bbc66 Fix wrong answer when interleaved=true 2024-07-11 00:26:18 +00:00
Illia Silin
a8eb872055 [gfx12] add gfx12 to the default target list (#1379) 2024-07-10 14:54:04 -07:00
Sam Wu
860f957c22 Update changelog release headers (#1378)
* Update doc codeowner syntax

* Add doc link to changelog

* Update changelog formatting for markdownlint

Also change headings for releases
2024-07-10 09:36:10 -06:00
PoYen, Chen
52da00acd6 Fix wrong answer when interleaved=false 2024-07-10 12:50:00 +00:00
PoYen, Chen
8c733fb3be Fix compilation errors 2024-07-10 10:53:58 +00:00
PoYen, Chen
03b6d99be0 Fix typo of HostTensor<>::get_length() 2024-07-10 09:33:15 +00:00
PoYen, Chen
9d29311da0 Finish reference_rotary_position_embedding() impl 2024-07-10 09:16:54 +00:00
dependabot[bot]
da42a88964 Bump rocm-docs-core from 1.4.1 to 1.5.0 in /docs/sphinx (#1374)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.4.1 to 1.5.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.4.1...v1.5.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
2024-07-09 12:48:23 -07:00
carlushuang
ccfdc53022 update owner (#1377)
* remove zjing14, add poyenc

* remove yigex
2024-07-09 20:30:07 +08:00
PoYen, Chen
f2d28e8ab4 Add reference_rotary_position_embedding() (not implemented) 2024-07-09 05:22:08 +00:00
PoYen, Chen
e939082bdc Add RoPE example utilities 2024-07-09 05:20:47 +00:00
PoYen, Chen
2e164f1b79 Add length/stride getters for HostTensor 2024-07-09 05:20:04 +00:00
Illia Silin
a328df25a1 Fix the cmake logic when building with INSTANCES_ONLY=ON. (#1376)
* fix the cmake logic when building for various targets

* another minor fix
2024-07-08 21:21:16 -07:00
Po Yen Chen
dc72074ec7 Merge branch 'develop' into feature/cond-add-splitkv 2024-07-09 03:42:25 +08:00
carlushuang
8182976c37 [CK_TILE] wa prec, remove sgpr offset for inline asm (#1356)
* wa prec, remove sgpr offset for inline asm

* macro for set tile

* ignore unused param if no kernel instances in host API

* fix more prec issue

* cache buffer resource

* fix

* support pre-nop

* clear tile by vector type members

* add workaround to reduce scratch memory

* conditionally enable workaround code

* enable workaround start from certain build version

* fallback set_tile() implementation from certain build version

* undo template argument changes

* put dummy asm in load_raw()

* fix comments, refactor s_nop inside buffer_load

---------

Co-authored-by: PoYen, Chen <PoYen.Chen@amd.com>
2024-07-08 11:09:55 -07:00
Andriy Roshchenko
eb44e0472a Add ckProfiler support for forward 3D convolutions with OUT element-wise operations. (#1354) 2024-07-08 10:55:54 -07:00
PoYen, Chen
18a3834fb4 Set num_splits=1 if split-kv is not supported 2024-07-08 10:27:32 +00:00
PoYen, Chen
8ac6bacf26 Unify CMakeLists.txt coding style 2024-07-08 10:19:31 +00:00
PoYen, Chen
5d21b4d736 Merge branch 'feature/cond-add-splitkv' into feature/fmha-fwd-appendkv 2024-07-08 10:18:28 +00:00
PoYen, Chen
6ca3910199 Show message if we are ignoring option 2024-07-08 10:17:55 +00:00
PoYen, Chen
fe4ae5dcd9 Early return if 0 < s_k_new is not supported 2024-07-08 10:09:36 +00:00
PoYen, Chen
be076db91c Merge branch 'feature/cond-add-splitkv' into feature/fmha-fwd-appendkv 2024-07-08 10:03:58 +00:00