PoYen, Chen
ff75eff3bf
Reduce input/output dimensions
2024-07-12 06:49:43 +00:00
PoYen, Chen
3183b68921
Simplify v_host_ref definition
2024-07-12 06:42:41 +00:00
PoYen, Chen
e5885cab83
Simplify K appending logics
2024-07-12 06:37:23 +00:00
PoYen, Chen
3578c6f836
Append K/V in the host verification code
2024-07-12 06:32:35 +00:00
PoYen, Chen
4107bf03a6
Merge remote-tracking branch 'origin/feature/cond-add-splitkv' into feature/fmha-fwd-appendkv
2024-07-12 04:43:04 +00:00
PoYen, Chen
b34ddf5f71
Merge remote-tracking branch 'origin/feature/cond-add-splitkv' into feature/fmha-fwd-appendkv
2024-07-12 04:42:45 +00:00
Po Yen Chen
b4306af655
Merge branch 'develop' into feature/cond-add-splitkv
2024-07-12 12:34:31 +08:00
zjing14
13c1e64daa
add gemm_bias_add example ( #1361 )
...
* add gemm_bias_add example
* changed strideD
* clang-format
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2024-07-11 18:08:07 -07:00
Rostyslav Geyyer
7a46a91c84
Add instances for grouped conv fwd 3d with ConvScale for bf8@fp8->fp8 ( #1369 )
...
* Add an example
* Add instances
* Add a client example
2024-07-11 13:31:39 -07:00
Illia Silin
98a01bbc72
Add CK_TILE tests to daily CI builds. ( #1381 )
...
* add ck_tile tests to CI
* build and run ck_tile tests on gfx90a and gfx942 in parallel
* fix groovy syntax
* turn ck_tile tests OFF by default
* skip creating the build folder
* build ck_tile examples with 64 threads
* build ck_tile examples with cmake-ck-dev.sh script
* add video group to docker on mi300
* do not retry to rebuild the early CI stages
* help prevent jenkins false failure
* restore cron trigger
2024-07-11 13:22:40 -07:00
Illia Silin
f914c228c6
[Jenkins] restore cron jobs ( #1380 )
...
* test the cron trigger
* fix the cron jobs
* restore the list of cron jobs
2024-07-11 10:28:11 -07:00
carlushuang
bbdb0a5dc0
Merge branch 'develop' into feature/cond-add-splitkv
2024-07-11 16:01:19 +08:00
PoYen, Chen
ee365bbc66
Fix wrong answer when interleaved=true
2024-07-11 00:26:18 +00:00
Illia Silin
a8eb872055
[gfx12] add gfx12 to the default target list ( #1379 )
2024-07-10 14:54:04 -07:00
Sam Wu
860f957c22
Update changelog release headers ( #1378 )
...
* Update doc codeowner syntax
* Add doc link to changelog
* Update changelog formatting for markdownlint
Also change headings for releases
2024-07-10 09:36:10 -06:00
PoYen, Chen
52da00acd6
Fix wrong answer when interleaved=false
2024-07-10 12:50:00 +00:00
PoYen, Chen
8c733fb3be
Fix compilation errors
2024-07-10 10:53:58 +00:00
PoYen, Chen
03b6d99be0
Fix typo of HostTensor<>::get_length()
2024-07-10 09:33:15 +00:00
PoYen, Chen
9d29311da0
Finish reference_rotary_position_embedding() impl
2024-07-10 09:16:54 +00:00
dependabot[bot]
da42a88964
Bump rocm-docs-core from 1.4.1 to 1.5.0 in /docs/sphinx ( #1374 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.4.1 to 1.5.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.4.1...v1.5.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com >
2024-07-09 12:48:23 -07:00
carlushuang
ccfdc53022
update owner ( #1377 )
...
* remove zjing14, add poyenc
* remove yigex
2024-07-09 20:30:07 +08:00
PoYen, Chen
f2d28e8ab4
Add reference_rotary_position_embedding() (not implemented)
2024-07-09 05:22:08 +00:00
PoYen, Chen
e939082bdc
Add RoPE example utilities
2024-07-09 05:20:47 +00:00
PoYen, Chen
2e164f1b79
Add length/stride getters for HostTensor
2024-07-09 05:20:04 +00:00
Illia Silin
a328df25a1
Fix the cmake logic when building with INSTANCES_ONLY=ON. ( #1376 )
...
* fix the cmake logic when building for various targets
* another minor fix
2024-07-08 21:21:16 -07:00
Po Yen Chen
dc72074ec7
Merge branch 'develop' into feature/cond-add-splitkv
2024-07-09 03:42:25 +08:00
carlushuang
8182976c37
[CK_TILE] wa prec, remove sgpr offset for inline asm ( #1356 )
...
* wa prec, remove sgpr offset for inline asm
* macro for set tile
* ignore unused param if no kernel instances in host API
* fix more prec issue
* cache buffer resource
* fix
* support pre-nop
* clear tile by vector type members
* add workaround to reduce scratch memory
* conditionally enable workaround code
* enable workaround start from certain build version
* fallback set_tile() implementation from certain build version
* undo template argument changes
* put dummy asm in load_raw()
* fix comments, refactor s_nop inside buffer_load
---------
Co-authored-by: PoYen, Chen <PoYen.Chen@amd.com >
2024-07-08 11:09:55 -07:00
Andriy Roshchenko
eb44e0472a
Add ckProfiler support for forward 3D convolutions with OUT element-wise operations. ( #1354 )
2024-07-08 10:55:54 -07:00
PoYen, Chen
18a3834fb4
Set num_splits=1 if split-kv is not supported
2024-07-08 10:27:32 +00:00
PoYen, Chen
8ac6bacf26
Unify CMakeLists.txt coding style
2024-07-08 10:19:31 +00:00
PoYen, Chen
5d21b4d736
Merge branch 'feature/cond-add-splitkv' into feature/fmha-fwd-appendkv
2024-07-08 10:18:28 +00:00
PoYen, Chen
6ca3910199
Show message if we are ignoring option
2024-07-08 10:17:55 +00:00
PoYen, Chen
fe4ae5dcd9
Early return if 0 < s_k_new is not supported
2024-07-08 10:09:36 +00:00
PoYen, Chen
be076db91c
Merge branch 'feature/cond-add-splitkv' into feature/fmha-fwd-appendkv
2024-07-08 10:03:58 +00:00
PoYen, Chen
aba46cd655
Regsiter API handlers automatically
2024-07-08 09:39:15 +00:00
PoYen, Chen
3aefb560e0
Remove "EXAMPLE_" prefix of cmake variables
2024-07-08 07:17:24 +00:00
PoYen, Chen
1c070380fa
Merge branch 'feature/cond-add-splitkv' into feature/fmha-fwd-appendkv
2024-07-08 07:13:34 +00:00
PoYen, Chen
82f3b3d0a0
Conditionally add call to fmha_fwd_splitkv()
2024-07-08 06:40:18 +00:00
PoYen, Chen
efd18fa887
Conditionally add fwd_splitkv API in fmha_fwd example
2024-07-08 06:27:44 +00:00
Harisankar Sadasivan
75e622f02f
Universal streamk with atomics ( #1360 )
...
* universal streamk with atomics with ckprofiler support. grid_size and streamk strategy are tunable. grid_size of -1 leads to #WGs = maximum occupancy X num_CUs. implementation supports many different streamk policies: 1-tile, 2-tile, 3-tile and 4-tile. streamk strategy of -1 leads to default streamk policy (4-tile).
* Update README.md
* fixing clang-format issues
* removed conflicts in struct members between streamk and universal streamk
* corrected arg parsing for streamk and universal streamk
* added stream-k policies for 3 tile and 4 tile
* fixed argument type issue with parsing cmd args
* changes suggested in PR review are made- removing comments and correcting copyright
* file permissions updated
* added default value support for grid_size and streamk-policy selection set to -1
* print messages for arguments
* print messages for arguments
* print messages for arguments1
2024-07-05 21:40:30 -07:00
jakpiase
eaa870a1ab
Add structural sparsity xdlops ( #1363 )
...
* Implemented smfmac xdlops
* add reviewer comments
2024-07-04 12:00:14 +02:00
Jun Liu
959073842c
Fix issue with multiple targets and remove smfmac tests from unsupported test targets ( #1372 )
2024-07-03 23:34:38 -07:00
Illia Silin
497ccb872b
fix the optional ckProfiler grouped_gemm arguments ( #1368 )
2024-06-28 06:50:46 -07:00
dependabot[bot]
614ebd050a
Bump rocm-docs-core from 1.4.0 to 1.4.1 in /docs/sphinx ( #1367 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.4.0 to 1.4.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.4.0...v1.4.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-27 22:14:36 -07:00
Ruturaj Vaidya
2525864fda
Update CMakeLists.txt ( #1364 )
...
It is a good practice to check if the file CMakeLists.txt is in fact in the directory.
2024-06-27 12:34:25 -07:00
Illia Silin
fafa567b3c
Adding a private docker for ROCm6.2 release candidate. ( #1365 )
...
* add private docker for rocm6.2_rc1
* update dockerfile
2024-06-27 11:09:00 -07:00
alexxu-amd
3bb0fe6c7e
remove PR trigger for now due to high cost ( #1329 )
2024-06-27 09:57:58 -04:00
PoYen, Chen
34a3ff849f
Fix Vnew tile dstr for row major case
2024-06-27 09:55:35 +00:00
jakpiase
ed21948bcd
Add structural sparsity gemm instruction tests ( #1309 )
...
* first version of smfmac test
* add reviewer comments
* add reviewer suggestions
2024-06-27 11:30:32 +02:00
Illia Silin
941d1f7ce0
Merging the gfx12 code into public repo. ( #1362 )
2024-06-27 00:33:34 -07:00