PoYen, Chen
40a4d96cf5
Return earlier if split is empty
2024-08-19 10:16:23 +00:00
PoYen, Chen
b9a4ab0d7e
Only randomize kvcache seqlen_k if 1 < batch
2024-08-19 09:28:52 +00:00
PoYen, Chen
8166aa58aa
Fix wrong uneven split checking logics
2024-08-19 08:14:45 +00:00
PoYen, Chen
3f0dab6a77
Revert "Avoid seqlen_k=0 for kvcache"
...
This reverts commit 21c4df89e4 .
2024-08-19 07:50:50 +00:00
PoYen, Chen
21c4df89e4
Avoid seqlen_k=0 for kvcache
2024-08-19 07:13:57 +00:00
PoYen, Chen
3fb77a0ebb
Remove type argument
2024-08-18 19:29:40 +00:00
PoYen, Chen
f37cd416e3
Fix typo in comment
2024-08-18 19:07:36 +00:00
PoYen, Chen
e8cd975d6a
Fix compilation errors
2024-08-18 19:00:05 +00:00
PoYen, Chen
9d5c33da04
Add more comments
2024-08-18 18:56:37 +00:00
PoYen, Chen
90c2008fe5
Add comment
2024-08-18 18:42:31 +00:00
PoYen, Chen
8a856f57ab
Add seqlen_q & seqlen_k rules
2024-08-18 18:38:08 +00:00
PoYen, Chen
a93c5e820f
Rename parameter
2024-08-18 18:37:25 +00:00
PoYen, Chen
4cd3432361
Avoid using too small rotary_cos & rotary_sin
2024-08-18 18:27:37 +00:00
PoYen, Chen
e5db71cc59
Use randomized seqlen_k for kvcache
2024-08-18 17:42:32 +00:00
PoYen, Chen
996f46b0d1
Randomize seqlen_k if use kvcache
2024-08-18 17:31:51 +00:00
PoYen, Chen
3d3d73bee2
Fix wrong parameter name
2024-08-18 17:25:39 +00:00
PoYen, Chen
48b7a5bad2
Fix mode overriding logics
2024-08-18 14:44:28 +00:00
PoYen, Chen
05157bf3a3
Force batch mode when invoking appendkv & splitkv apis
2024-08-18 06:05:42 +00:00
PoYen, Chen
cc52587bcc
Remove macro checking
2024-08-18 05:50:51 +00:00
PoYen, Chen
6b361f5a4b
Clarify the case in warning message
2024-08-18 00:42:22 +00:00
PoYen, Chen
c30d7f9d29
Remove 0 < seqlen_knew constraint
2024-08-16 22:14:05 +00:00
PoYen, Chen
352f6d58b0
Merge branch 'feature/fmha-fwd-appendkv' of github.com:ROCm/composable_kernel into feature/fmha-fwd-appendkv
2024-08-16 22:13:02 +00:00
rocking
34fea2935a
Fix unexisted attribute
2024-08-16 20:30:49 +00:00
PoYen, Chen
d52278a5ef
Add more case for appendkv
2024-08-16 18:23:55 +00:00
PoYen, Chen
d3fd64cd26
Add more appendkv test
2024-08-16 18:03:28 +00:00
PoYen, Chen
51062cae0b
Merge remote-tracking branch 'origin/develop' into feature/fmha-fwd-appendkv
2024-08-16 16:47:06 +00:00
PoYen, Chen
41fdf9b2bc
Fix compilation error
2024-08-16 16:39:11 +00:00
PoYen, Chen
43b8100b7f
Support cache_batch_idx in example
2024-08-16 16:27:56 +00:00
PoYen, Chen
9c904b0e4c
Pass cache_batch_idx to kernels
2024-08-16 15:32:24 +00:00
Bartłomiej Kocot
2581727d2a
Add performance and large tensor tests for grouped conv ( #1456 )
...
* Add performance and large tensor tests for grouped conv
* Resize tests
* Resize tests
* update the python script to parse the grouped_conv results
* Remove int8 tests
* change bwd wei layout
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
2024-08-16 07:48:30 -07:00
PoYen, Chen
e6239e14f7
Re-organize bash functions
2024-08-16 12:46:16 +00:00
PoYen, Chen
2523c8e36c
Fix more format
2024-08-16 10:32:17 +00:00
PoYen, Chen
5728c0be65
Fix formatting
2024-08-16 10:25:46 +00:00
PoYen, Chen
095819a387
Remove options
2024-08-16 10:22:44 +00:00
PoYen, Chen
f2b3620511
Use meaningful options in smoke test
2024-08-16 10:18:14 +00:00
PoYen, Chen
aadd3ec63e
Fix wrong syntax in skcheck expr
2024-08-16 10:09:46 +00:00
PoYen, Chen
a4c6029a3d
Fix skcheck logic
2024-08-16 10:08:01 +00:00
PoYen, Chen
5805f5aa73
Remove group mode from appendkv kernel
2024-08-16 10:04:48 +00:00
dependabot[bot]
76bd0af6af
Bump rocm-docs-core from 1.6.2 to 1.7.0 in /docs/sphinx ( #1467 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.6.2 to 1.7.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.2...v1.7.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-15 13:59:40 -07:00
trixirt
49769ec889
Check compiler flags before using ( #1403 )
...
* Check compiler flags before using
The user's compiler may not support these flags, so check.
Resolves failures on Fedora.
Signed-off-by: Tom Rix <trix@redhat.com >
* fix syntax CMakeLists.txt
Fix syntax in the check_cxx_compiler_flag.
---------
Signed-off-by: Tom Rix <trix@redhat.com >
Co-authored-by: Tom Rix <trix@redhat.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2024-08-14 20:43:10 -07:00
Haocong WANG
3049b5467c
[GEMM] gemm_universal related optimization ( #1453 )
...
* replace buffer_atomic with global_atomic
* fixed global_atomic_add
* added bf16 atomic_add
* format
* clang-format-12
* clean
* clean
* add guards
* Update gtest.cmake
* enabled splitk_gemm_multi_d
* format
* add ckProfiler
* format
* fixed naming
* format
* clean
* clean
* add guards
* fix clang format
* format
* add kbatch printout
* clean
* Add rocm6.2 related gemm optimization
* Limit bf16 atomic usage
* remove redundant RCR gemm_universal instance
* Add RRR fp8 gemm universal instance
* Bug fix
* Add GPU_TARGET guard to FP8/BF8 target
* bug fix
* update cmake
* remove all fp8/bf8 example if arch not support
* Enable fp8 RRR support in ckProfiler
* limit greedy-reverse flag to gemm_universal in ckProfiler
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
Co-authored-by: Jing Zhang <jizhan@meta.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
2024-08-14 10:42:30 +08:00
AngryLoki
50c423481b
Fix compilation errors with libc++ ( #1461 )
...
This fixes 2 issues when compiled with libc++.
First issue is attempt to call std::numeric_limits<ranges::range_value_t<_Float16>>::min().
_Float16 is extension of libstdc++, it does not exist in C++ standard[2].
Luckily, there is NumericLimits class in composable_kernel, which does everything needed.
Second issue with call to 'check_err' is ambiguous: there are 2 candidates.
It happens because composable_kernel relies on idea that f8_t (defined as _BitInt(8)) does not pass is_integral trait.
However, libc++ treats _BitInt(N) as integral (per standard "any implementation-defined extended integer types" can be integral).
Closes : #1460
Signed-off-by: Sv. Lockal <lockalsash@gmail.com >
2024-08-13 14:31:15 -05:00
Mateusz Ozga
0606e5498e
Support large: 12d tensor size for reduction kenrel ( #1465 )
2024-08-13 16:15:47 +02:00
PoYen, Chen
9de0f35ebc
Remove unused template paremeter
2024-08-13 09:29:20 +00:00
PoYen, Chen
370babc996
Make tile window directly via PageBlockNavigator
2024-08-13 09:18:24 +00:00
PoYen, Chen
a8a2275aca
Fix wrong arugment count
2024-08-13 08:42:23 +00:00
PoYen, Chen
d96752d0f5
Refine smoke_test_fwd.sh
2024-08-13 08:36:04 +00:00
PoYen, Chen
3dd6ef61ef
Re-order pipeline paremeters
2024-08-13 07:38:41 +00:00
PoYen, Chen
19c19d8bd3
Only expose necessary methods (not attributes)
2024-08-13 07:26:26 +00:00
PoYen, Chen
c54de6416a
Rename TileWindowNavigator to PageBlockNavigator
2024-08-13 07:23:40 +00:00