PoYen, Chen
fffd6799e6
Instantiate multiple kernels for RoPE approaches
2024-07-20 02:28:21 +00:00
PoYen, Chen
27b5141706
Fix wrong thread starting offset
2024-07-18 20:02:06 +00:00
PoYen, Chen
23450526c0
Only apply interleaved RoPE on Knew for now
2024-07-18 19:42:14 +00:00
PoYen, Chen
85bfed07fa
Add dram distribution for rotary_cos/rotary_sin (interleaved)
2024-07-18 09:11:22 +00:00
PoYen, Chen
39ef09bb23
Remove unused inner namespace
2024-07-18 09:10:51 +00:00
PoYen, Chen
e83c3c7fa0
Add constraint to the rotary_dim option
2024-07-16 06:54:37 +00:00
PoYen, Chen
99f863e4cd
Fix rotary cos/sin tensor/tile size
2024-07-16 06:31:17 +00:00
PoYen, Chen
b32fd8d3f4
Rename variables used in distributio encoding
2024-07-16 06:27:28 +00:00
PoYen, Chen
879710a495
Fix wrong seqlen_k for kvcache
2024-07-16 03:42:51 +00:00
PoYen, Chen
65dac9fb90
Fix wrong boundaries
2024-07-15 01:42:53 +00:00
PoYen, Chen
4e01307e04
Fix compilation error in debug mode
2024-07-15 01:26:46 +00:00
PoYen, Chen
1a093f94b2
Add minimum seqlen_k to generate compliance kvcache
2024-07-15 01:11:16 +00:00
PoYen, Chen
57c6a4125c
Fix seqlen_knew enabling check logic
2024-07-15 00:40:39 +00:00
PoYen, Chen
ad61d9d4b2
Randomly generate seqlen_knew if needed
2024-07-15 00:39:03 +00:00
PoYen, Chen
f6850aef29
Add compute data type alias for RoPE
2024-07-15 00:05:33 +00:00
PoYen, Chen
b0925bb7f6
Create Rotary Cos/Sin tile windows in kernel
2024-07-14 23:47:40 +00:00
PoYen, Chen
391210ed9e
Pass RoPE kernel args
2024-07-14 23:18:32 +00:00
PoYen, Chen
b5ad1411b0
Merge branch 'feature/cond-add-splitkv' into feature/fmha-fwd-appendkv
2024-07-14 22:13:17 +00:00
PoYen, Chen
c6717bb300
Merge branch 'feature/cond-add-splitkv' of github.com:ROCm/composable_kernel into feature/cond-add-splitkv
2024-07-14 22:11:39 +00:00
PoYen, Chen
8c1647d778
Avoid invoking deprecated method 'find_module'
2024-07-14 22:10:30 +00:00
Po Yen Chen
5ce0fecf36
Merge branch 'develop' into feature/cond-add-splitkv
2024-07-15 05:48:51 +08:00
PoYen, Chen
55f55025ee
Fix wrong tensor size
2024-07-14 15:40:56 +00:00
PoYen, Chen
93e5125d7a
Rename RoPE utility function
2024-07-14 14:48:06 +00:00
PoYen, Chen
83d6acc111
Apply RoPE on host side
2024-07-14 14:45:17 +00:00
Bartłomiej Kocot
82e8a78a3f
Support access per groups and filter3x3 in grouped conv fwd ( #1382 )
...
* Support access per groups and filter3x3 in grouped conv fwd
* Fixes for large cases
* Fixes for large tensors
2024-07-12 11:08:42 -07:00
PoYen, Chen
44c9bacff7
Rename function: add "batched" prefix
2024-07-12 06:51:31 +00:00
PoYen, Chen
ff75eff3bf
Reduce input/output dimensions
2024-07-12 06:49:43 +00:00
PoYen, Chen
3183b68921
Simplify v_host_ref definition
2024-07-12 06:42:41 +00:00
PoYen, Chen
e5885cab83
Simplify K appending logics
2024-07-12 06:37:23 +00:00
PoYen, Chen
3578c6f836
Append K/V in the host verification code
2024-07-12 06:32:35 +00:00
PoYen, Chen
4107bf03a6
Merge remote-tracking branch 'origin/feature/cond-add-splitkv' into feature/fmha-fwd-appendkv
2024-07-12 04:43:04 +00:00
PoYen, Chen
b34ddf5f71
Merge remote-tracking branch 'origin/feature/cond-add-splitkv' into feature/fmha-fwd-appendkv
2024-07-12 04:42:45 +00:00
Po Yen Chen
b4306af655
Merge branch 'develop' into feature/cond-add-splitkv
2024-07-12 12:34:31 +08:00
zjing14
13c1e64daa
add gemm_bias_add example ( #1361 )
...
* add gemm_bias_add example
* changed strideD
* clang-format
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2024-07-11 18:08:07 -07:00
Rostyslav Geyyer
7a46a91c84
Add instances for grouped conv fwd 3d with ConvScale for bf8@fp8->fp8 ( #1369 )
...
* Add an example
* Add instances
* Add a client example
2024-07-11 13:31:39 -07:00
Illia Silin
98a01bbc72
Add CK_TILE tests to daily CI builds. ( #1381 )
...
* add ck_tile tests to CI
* build and run ck_tile tests on gfx90a and gfx942 in parallel
* fix groovy syntax
* turn ck_tile tests OFF by default
* skip creating the build folder
* build ck_tile examples with 64 threads
* build ck_tile examples with cmake-ck-dev.sh script
* add video group to docker on mi300
* do not retry to rebuild the early CI stages
* help prevent jenkins false failure
* restore cron trigger
2024-07-11 13:22:40 -07:00
Illia Silin
f914c228c6
[Jenkins] restore cron jobs ( #1380 )
...
* test the cron trigger
* fix the cron jobs
* restore the list of cron jobs
2024-07-11 10:28:11 -07:00
carlushuang
bbdb0a5dc0
Merge branch 'develop' into feature/cond-add-splitkv
2024-07-11 16:01:19 +08:00
PoYen, Chen
ee365bbc66
Fix wrong answer when interleaved=true
2024-07-11 00:26:18 +00:00
Illia Silin
a8eb872055
[gfx12] add gfx12 to the default target list ( #1379 )
2024-07-10 14:54:04 -07:00
Sam Wu
860f957c22
Update changelog release headers ( #1378 )
...
* Update doc codeowner syntax
* Add doc link to changelog
* Update changelog formatting for markdownlint
Also change headings for releases
2024-07-10 09:36:10 -06:00
PoYen, Chen
52da00acd6
Fix wrong answer when interleaved=false
2024-07-10 12:50:00 +00:00
PoYen, Chen
8c733fb3be
Fix compilation errors
2024-07-10 10:53:58 +00:00
PoYen, Chen
03b6d99be0
Fix typo of HostTensor<>::get_length()
2024-07-10 09:33:15 +00:00
PoYen, Chen
9d29311da0
Finish reference_rotary_position_embedding() impl
2024-07-10 09:16:54 +00:00
dependabot[bot]
da42a88964
Bump rocm-docs-core from 1.4.1 to 1.5.0 in /docs/sphinx ( #1374 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.4.1 to 1.5.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.4.1...v1.5.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com >
2024-07-09 12:48:23 -07:00
carlushuang
ccfdc53022
update owner ( #1377 )
...
* remove zjing14, add poyenc
* remove yigex
2024-07-09 20:30:07 +08:00
PoYen, Chen
f2d28e8ab4
Add reference_rotary_position_embedding() (not implemented)
2024-07-09 05:22:08 +00:00
PoYen, Chen
e939082bdc
Add RoPE example utilities
2024-07-09 05:20:47 +00:00
PoYen, Chen
2e164f1b79
Add length/stride getters for HostTensor
2024-07-09 05:20:04 +00:00