Commit Graph

1575 Commits

Author SHA1 Message Date
PoYen, Chen
d3fd64cd26 Add more appendkv test 2024-08-16 18:03:28 +00:00
PoYen, Chen
51062cae0b Merge remote-tracking branch 'origin/develop' into feature/fmha-fwd-appendkv 2024-08-16 16:47:06 +00:00
PoYen, Chen
41fdf9b2bc Fix compilation error 2024-08-16 16:39:11 +00:00
PoYen, Chen
43b8100b7f Support cache_batch_idx in example 2024-08-16 16:27:56 +00:00
PoYen, Chen
9c904b0e4c Pass cache_batch_idx to kernels 2024-08-16 15:32:24 +00:00
Bartłomiej Kocot
2581727d2a Add performance and large tensor tests for grouped conv (#1456)
* Add performance and large tensor tests for grouped conv

* Resize tests

* Resize tests

* update the python script to parse the grouped_conv results

* Remove int8 tests

* change bwd wei layout

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>
2024-08-16 07:48:30 -07:00
PoYen, Chen
e6239e14f7 Re-organize bash functions 2024-08-16 12:46:16 +00:00
PoYen, Chen
2523c8e36c Fix more format 2024-08-16 10:32:17 +00:00
PoYen, Chen
5728c0be65 Fix formatting 2024-08-16 10:25:46 +00:00
PoYen, Chen
095819a387 Remove options 2024-08-16 10:22:44 +00:00
PoYen, Chen
f2b3620511 Use meaningful options in smoke test 2024-08-16 10:18:14 +00:00
PoYen, Chen
aadd3ec63e Fix wrong syntax in skcheck expr 2024-08-16 10:09:46 +00:00
PoYen, Chen
a4c6029a3d Fix skcheck logic 2024-08-16 10:08:01 +00:00
PoYen, Chen
5805f5aa73 Remove group mode from appendkv kernel 2024-08-16 10:04:48 +00:00
dependabot[bot]
76bd0af6af Bump rocm-docs-core from 1.6.2 to 1.7.0 in /docs/sphinx (#1467)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.2 to 1.7.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.2...v1.7.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-15 13:59:40 -07:00
trixirt
49769ec889 Check compiler flags before using (#1403)
* Check compiler flags before using

The user's compiler may not support these flags, so check.
Resolves failures on Fedora.

Signed-off-by: Tom Rix <trix@redhat.com>

* fix syntax CMakeLists.txt

Fix syntax in the check_cxx_compiler_flag.

---------

Signed-off-by: Tom Rix <trix@redhat.com>
Co-authored-by: Tom Rix <trix@redhat.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2024-08-14 20:43:10 -07:00
Haocong WANG
3049b5467c [GEMM] gemm_universal related optimization (#1453)
* replace buffer_atomic with global_atomic

* fixed global_atomic_add

* added bf16 atomic_add

* format

* clang-format-12

* clean

* clean

* add guards

* Update gtest.cmake

* enabled splitk_gemm_multi_d

* format

* add ckProfiler

* format

* fixed naming

* format

* clean

* clean

* add guards

* fix clang format

* format

* add kbatch printout

* clean

* Add rocm6.2 related gemm optimization

* Limit bf16 atomic usage

* remove redundant RCR gemm_universal instance

* Add RRR fp8 gemm universal instance

* Bug fix

* Add GPU_TARGET guard to FP8/BF8 target

* bug fix

* update cmake

* remove all fp8/bf8 example if arch not support

* Enable fp8 RRR support in ckProfiler

* limit greedy-reverse flag to gemm_universal in ckProfiler

---------

Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: Jing Zhang <jizhan@meta.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>
2024-08-14 10:42:30 +08:00
AngryLoki
50c423481b Fix compilation errors with libc++ (#1461)
This fixes 2 issues when compiled with libc++.

First issue is attempt to call std::numeric_limits<ranges::range_value_t<_Float16>>::min().
_Float16 is extension of libstdc++, it does not exist in C++ standard[2].
Luckily, there is NumericLimits class in composable_kernel, which does everything needed.

Second issue with call to 'check_err' is ambiguous: there are 2 candidates.
It happens because composable_kernel relies on idea that f8_t (defined as _BitInt(8)) does not pass is_integral trait.
However, libc++ treats _BitInt(N) as integral (per standard "any implementation-defined extended integer types" can be integral).

Closes: #1460

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
2024-08-13 14:31:15 -05:00
Mateusz Ozga
0606e5498e Support large: 12d tensor size for reduction kenrel (#1465) 2024-08-13 16:15:47 +02:00
PoYen, Chen
9de0f35ebc Remove unused template paremeter 2024-08-13 09:29:20 +00:00
PoYen, Chen
370babc996 Make tile window directly via PageBlockNavigator 2024-08-13 09:18:24 +00:00
PoYen, Chen
a8a2275aca Fix wrong arugment count 2024-08-13 08:42:23 +00:00
PoYen, Chen
d96752d0f5 Refine smoke_test_fwd.sh 2024-08-13 08:36:04 +00:00
PoYen, Chen
3dd6ef61ef Re-order pipeline paremeters 2024-08-13 07:38:41 +00:00
PoYen, Chen
19c19d8bd3 Only expose necessary methods (not attributes) 2024-08-13 07:26:26 +00:00
PoYen, Chen
c54de6416a Rename TileWindowNavigator to PageBlockNavigator 2024-08-13 07:23:40 +00:00
Illia Silin
cbb6f2ab8c Disable inapplicable xdl and mha instances for gfx12 (#1464) 2024-08-12 15:11:58 -07:00
Mateusz Ozga
ab60b390f8 Rewrite *sh reduce unit tests to gtest: part 1 (#1407)
* Rewrite .sh test to Gtest

* review chnages

* Removew unused comments

* Review v2

* Typo

* Separete UT: AMAX, MAX, MIN; added template params to trigger them

* Update test/reduce/reduce_no_index.cpp

---------

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
2024-08-12 16:28:10 +02:00
Bartłomiej Kocot
4a870942e6 Fix bug with n block id calculation in DeviceGroupedConvXdlCShuffle (#1457)
* Fix typo in TransformConvFwdToGemm

* Fix bug in n offset calculation
2024-08-10 13:12:05 +02:00
arai713
da214a5a58 Codegen build w/CK (#1428)
* initial push

* cleaned up compiler errors

* removed commented code

* build codegen folder only for gfx9 targets

* remove separate stage for codegen tests from CI

* removed commented code from CMake

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>
2024-08-09 08:15:06 -07:00
Jun Liu
5ff8eeebf9 Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406) (#1415)" (#1455)
This reverts commit 33b399cc15.
2024-08-08 19:09:33 -07:00
Illia Silin
4a5ab67871 Enable CI on gfx12. (#1454)
* enable CI build and test on gfx1201

* skip DL kernels in CI for gfx12

* only run CI on gfx12 if rocm version >= 6.2

* remove the rocm version check for CI on gfx12

* add a switch for CI builds on gfx12
2024-08-08 16:29:15 -07:00
PoYen, Chen
e8603dc21a Add missing comment 2024-08-08 20:40:50 +00:00
PoYen, Chen
822d5dcd8e Fix wrong seqlen for kvcache 2024-08-08 20:39:36 +00:00
PoYen, Chen
6a399ea47e Use generic lambda to init all the api traits/args 2024-08-08 19:22:53 +00:00
PoYen, Chen
9206808835 Move functors to the begining of validation code 2024-08-08 18:01:10 +00:00
PoYen, Chen
028d89862a Wrap code by #if directives 2024-08-08 17:58:49 +00:00
PoYen, Chen
d2f5d0910a Remove no-longer used pipeline files 2024-08-08 17:40:05 +00:00
PoYen, Chen
9dddf6e437 Rename 'max_num_blocks' to 'max_num_page_blocks' 2024-08-08 17:38:08 +00:00
PoYen, Chen
e3a4bfba88 Show more detailed warning message 2024-08-08 17:35:36 +00:00
PoYen, Chen
d3624a03de Merge branch 'develop' into feature/fmha-fwd-appendkv 2024-08-08 17:26:53 +00:00
PoYen, Chen
3e2b69e163 Display more info for specific kernels 2024-08-08 17:26:09 +00:00
PoYen, Chen
c8f63d4848 Separate more non-splitkv & splitkv traits/args 2024-08-08 16:54:00 +00:00
PoYen, Chen
677d9b28dd Use generic lambda to init traits objects 2024-08-08 16:38:17 +00:00
Illia Silin
ae3b8ff86c check if the coerce-illegal-types flag is supported (#1451) 2024-08-08 07:29:29 -07:00
Illia Silin
8a75728406 add rocm-llvm-dev package to docker image (#1452) 2024-08-08 07:29:13 -07:00
PoYen, Chen
2f42e4460f Allow problem types without define kHasDropout attr 2024-08-08 10:53:42 +00:00
PoYen, Chen
a0d2163045 Remove dropout code in splitkv kernel 2024-08-08 10:21:34 +00:00
PoYen, Chen
9d9c5a6c24 Fix compilation errors 2024-08-08 08:26:55 +00:00
PoYen, Chen
247e135cfc Remove fmha_fwd_dispatch() 2024-08-08 08:15:04 +00:00