Bartłomiej Kocot
b1f8ae379b
Fix contraction IsSupported checks ( #1257 )
2024-04-23 22:59:39 +02:00
rocking
43879b89e4
Small refactor ( #1246 )
...
* Remove kIsFp8
* Extract alias
* Fix K, V and corresponding acc type
---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
2024-04-22 20:28:49 +08:00
Bartłomiej Kocot
ad1597c499
Refactor elementwise kernels ( #1222 )
...
* Refactor elementwise kernels
* Instances fixes
* Fix cmake
* Fix max pool bwd test
* Update two stage gemm split k
* Restore elementwise scale for hiptensor backward compatiblity
* Fix Acc data type check in conv fwd multiple abd
* Disable conv fp64 fwd example
* Update grouped conv weight multi d
2024-04-19 13:31:17 +02:00
jakpiase
e0f3f918f1
Add bf16 and bf16@int8 mk_nk_mn instances for grouped gemm two stage ( #1228 )
...
* added bf16 and bf16@int8 mk_nk_mn instances
* fix preprocessor guards
2024-04-19 13:16:10 +02:00
Bartłomiej Kocot
fd923b6d86
Add grouped conv bwd weight multi d kernel ( #1237 )
...
* Add grouped conv bwd weight multi d kernel
* Reference fix
* Fix cmake files
* bwd weight scale only xdl
* Fixes
* Fix client conv fwd example
2024-04-18 23:35:04 +02:00
Illia Silin
930f889c34
Make daily cron jobs use the rocm6.1 compiler. ( #1253 )
...
* add rocm6.1 docker and make it default for CI
* fix typo
* move the rocm6.1 image into public dockerhub repo
* upgrade daily cron jobs to use rocm6.1
2024-04-18 09:40:21 -07:00
Illia Silin
caae537d8e
Upgrade to ROCm6.1 and turn on the -enable-post-misched=0 compiler flag. ( #1250 )
...
* add rocm6.1 docker and make it default for CI
* fix typo
* move the rocm6.1 image into public dockerhub repo
2024-04-18 11:10:23 -05:00
peter
501a6b68eb
docs: fix broken contributing link ( #1244 )
2024-04-16 11:32:14 -04:00
zjing14
12865fbf28
Added Multi_ABD support into Gemm and GroupedGemmFixedNK ( #978 )
...
* added an example grouped_gemm_multi_abd
* fixed ci
* add setElementwiseOp
* changed API
* clean code: add multiA into example
* fixed v7r2 copy
* add transpose
* clean
* fixed vector_load check
* Update example/15_grouped_gemm/grouped_gemm_multi_abd_xdl_fixed_nk_bias_fp16.cpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update example/15_grouped_gemm/grouped_gemm_multi_abd_xdl_fixed_nk_bias_fp16.cpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update example/15_grouped_gemm/grouped_gemm_multi_abd_xdl_fixed_nk_bias_fp16.cpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_multiple_abd_xdl_cshuffle.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_multiple_abd_xdl_cshuffle.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd_fixed_nk.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd_fixed_nk.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* add reduce
* testing
* add example_b16_i8
* refactor example
* clean
* add mpading
* disable reduce for kbatch = 1
* seperate reduce device op
* add reduce op
* add guard for workspace_size
* add instances
* format
* fixed
* add client example
* add a colmajor
* add instances
* Update cmake-ck-dev.sh
* Update profile_gemm_splitk.cpp
* Update gridwise_gemm_xdlops_v2r4r2.hpp
* format
* Update profile_gemm_splitk.cpp
* fixed
* fixed
* adjust test
* adjust precision loss
* adjust test
* fixed
* add bf16_i8 scale bias
* fixed scale
* fixed scale elementwise_op
* revert contraction deviceop changes
* fixed
* Add AddFastGelu
* Revert "Merge branch 'jizhan/gemm_splitk_reduce' into grouped_gemm_multi_abd_fixed_nk_example"
This reverts commit 3b5d001efd , reversing
changes made to 943199a991 .
* add Scales into elementwise
* add gemm_multi_abd client example
* add client examples
* add rcr and crr
* add grouped gemm client example
* add grouped gemm client example
* add instance for rcr crr
* format
* fixed
* fixed cmake
* fixed
* fixed client_example
* format
* fixed contraction isSupport
* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd_fixed_nk.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update device_reduce_threadwise.hpp
* clean
* Fixes
* Fix example
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
2024-04-15 21:09:45 -05:00
carlushuang
db376dd8a4
introducing ck_tile! ( #1216 )
...
* enable gfx940
* switch between intrinsic mfma routines on mi100/200 and mi300
* fix mfma_int8 on MI300
* disable 2 int8 examples on MI300
* Update cmake-ck-dev.sh
* restore gitignore file
* modify Jenkinsfile to the internal repo
* Bump rocm-docs-core from 0.24.0 to 0.29.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 0.24.0 to 0.29.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.24.0...v0.29.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
* initial enablement of gfx950
* fix clang format
* disable examples 31 and 41 int8 on gfx950
* add code
* fix build wip
* fix xx
* now can build
* naming
* minor fix
* wip fix
* fix macro for exp2; fix warpgemm a/b in transposedC
* unify as tuple_array
* Update the required Python version to 3.9
* Update executable name in test scripts
* re-structure tuple/array to avoid spill
* Merge function templates
* Fix format
* Add constraint to array<> ctor
* Re-use function
* Some minor changes
* remove wrong code in store_raw()
* fix compile issue in transpose
* Rename enum
Rename 'cood_transform_enum' to 'coord_transform_enum'
* let more integral_constant->constant, and formating
* make sure thread_buffer can be tuple/array
* temp fix buffer_store spill
* not using custom data type by default, now we can have ISA-level same code as opt_padding
* fix compile error, fp8 not ready now
* fix fp8 duplicated move/shift/and/or problem
* Default use CK_TILE_FLOAT_TO_FP8_STOCHASTIC rounding mode
* fix scratch in fp8 kernel
* update some readme
* fix merge from upstream
* sync with upstream
* sync upstream again
* sync 22
* remove unused
* fix clang-format
* update README of ck_tile example
* fix several issue
* let python version to be 3.8 as minimal
* remove ck_tile example from default cmake target like all/install/check
* remove mistake
* 1).support receipe in generate.py 2).use simplified mask type 3).change left/right to pass into karg
* fix some bug in group-mode masking and codegen. update README
* F8 quantization for FMHA forward (#1224 )
* Add SAccElementFunction, PComputeElementFunction, OAccElementFunction in pipeline
* Add element function to fmha api
* Adjust P elementwise function
* Fix bug of elementwise op, our elementwise op is not inout
* Add some elementwise op, prepare to quantization
* Let generate.py can generate different elementwise function
* To prevent compiler issue, remove the elementwise function we have not used.
* Remove f8 pipeline, we should share the same pipeline even in f8
* Remove remove_cvref_t
* Avoid warning
* Fix wrong fp8 QK/KV block gemm setting
* Check fp8 rounding error in check_err()
* Set fp8 rounding error for check_err()
* Use CK_TILE_FLOAT_TO_FP8_STANDARD as default fp8 rounding mode
* 1. codgen the f8 api and kernel
2. f8 host code
* prevent warning in filter mode
* Remove not-in-use elementwise function kargs
* Remove more not-in-use elementwise function kargs
* Small refinements in C++ source files
* Use conditional_t<> to simplify code
* Support heterogeneous argument for binary function types
* Re-use already-existing scales<> functor template
* Fix wrong value produced by saturating
* Generalize the composes<> template
* Unify saturates<> implementation
* Fix type errors in composes<>
* Extend less_equal<>
* Reuse the existing template less_equal<> in check_err()
* Add equal<float> & equal<double>
* Rename check_err() parameter
* Rename check_err() parameter
* Add FIXME comment for adding new macro in future
* Remove unnecessary cast to void
* Eliminate duplicated code
* Avoid dividing api pool into more than 2 groups
* Use more clear variable names
* Use affirmative condition in if stmt
* Remove blank lines
* Donot perfect forwarding in composes<>
* To fix compile error, revert generate.py back to 4439cc107d
* Fix bug of p element function
* Add compute element op to host softmax
* Remove element function in api interface
* Extract user parameter
* Rename pscale and oscale variable
* rename f8 to fp8
* rename more f8 to fp8
* Add pipeline::operator() without element_functor
* 1. Remove deprecated pipeline enum
2. Refine host code parameter
* Use quantization range as input
* 1. Rename max_dtype to dtype_max.
2. Rename scale to scale_s
3.Add init description
* Refine description
* prevent early return
* unify _squant kernel name in cpp, update README
* Adjust the default range.
* Refine error message and bias range
* Add fp8 benchmark and smoke test
* fix fp8 swizzle_factor=4 case
---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
---------
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: Jing Zhang <jizha@amd.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Po-Yen, Chen <PoYen.Chen@amd.com >
Co-authored-by: rocking <ChunYu.Lai@amd.com >
2024-04-15 19:27:12 -05:00
Illia Silin
dd34ab6e64
add CK_USE_XDL/WMMA for client examples ( #1238 )
2024-04-15 10:01:22 -05:00
Haocong WANG
f83e9701e9
[GEMM] Gemm universal device operation ( #1154 )
...
* Optimize GEMM on MI200/300:
1. Add new blockwise gemm pipeline
2. Add irregular splitk intances
* clang format + typo fix
* Fix a bug
* initial commit
* Add more instances to irregular splitk
* blkgemm pipeline v1~4 prototype
* Sanity Checked. Known issue:
1. Poor performance of splitk
2. Register spill on blkgemmpipeline v3
* Sanity and Performance fix:
1. fix a bug related to sanity in grouped b2c mapping
2. fix a bug related to sanity and performance in splitk offset
* Sanity and API update:
1. Remove prefetch stage
2. Fix valid check bug
3, Add first gemm_universal instance into ckProfiler
* Add NN instances for gemm universal
* 1. Add NT instances for gemm_universal
2. Fix a bug about Kpadding in gemm_universal
* Fix a bug regarding padding Odd K number
* remove kernel print
* Fix KPadding bug...
* Update safety check
* another try to fix kpadding..
* Sanity checked
* new instances..
* clang format+typo fix
* remove clang format script's change
* Add non-hotloop compile option
* 1. Add fp16xfp8 example
2. pull packed convert f8 from pr1150
* Some miscs.. opt and fix
* Add pipeline description docs
* Split universal gemm instance library to cut profiler compiling time
* uncomment cmakefile
* Fix a bug caused by blockwise_gemm_pipe_v2
* reduce default splitk to 1
* Add 224x256x64 tile size
* update, including:
1. Experiment pipeline 5~7
2. Optimization for pipeline 4
3. Organized instance library
* temp save
* temp save
* Permuted lds layout, sanity and function checked
* clang format
* Move OOB check from RunRead to RunWrite, for better software pipeline.
TODO: agpr spill when NN layout
* clangformat
* A/B splitpipe scheduler for v3
* Fix two bugs
* bug fix
* fix a bug in oob check
* Example for mixed fp16_fp8 gemm
* Clean experimental code blocks
* Add mixed precision gemm into profiler
* tempsave
* optimize m/n major lds layout
* Add RRR GEMM mixed precision instances
* Optimize f8 matrix transpose
* Add test_gemm_universal
* A/B spilt schedule for blkpip v5
* Take ds_read2 into iglp scheduling scheme
* format
* fixed cmake
* Add llvm-option into CI cmake flag
---------
Co-authored-by: Jing Zhang <jizhan@amd.com >
2024-04-13 21:03:18 -05:00
Illia Silin
7cdf5a96d2
Update the config.h after the CK_USE_XDL/WMMA are set. ( #1236 )
...
* pass XDL and WMMA macros to libs that use CK
* update config.h after XDL and WMMA macros get set
2024-04-12 10:55:02 -07:00
Illia Silin
d7f05fb996
[HotFix] pass XDL and WMMA macros to libs that use CK ( #1234 )
2024-04-11 16:40:45 -07:00
Rostyslav Geyyer
bbefc12a26
Add instances for conv_scale with bf8@fp8->fp8 ( #1231 )
...
* Add instances
* Add example
* Add profiler mode
* Add client example
2024-04-11 10:35:00 -05:00
dependabot[bot]
b2735caf46
Bump rocm-docs-core from 0.38.0 to 0.38.1 in /docs/sphinx ( #1232 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 0.38.0 to 0.38.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.38.0...v0.38.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-11 07:39:44 -07:00
zjing14
381d44aa60
add yigex ( #1230 )
2024-04-09 21:32:02 -05:00
Bartłomiej Kocot
ced5af16f7
Extend support for contraction 6D ( #1207 )
...
* Extend support for contraction up to 5D
* Extend contraction bilinear instances
* Fix interface test
* Add 6d support, remove 3d,4d,5d
* Fixes
* Fix readme
* Make defualt dim for contraction instances
2024-04-09 23:46:21 +02:00
Rostyslav Geyyer
366592b0ff
Add an example ( #1227 )
2024-04-09 13:57:32 -05:00
Rostyslav Geyyer
50cc0a13a6
Add an example ( #1225 )
2024-04-09 13:56:54 -05:00
Illia Silin
7e5c81fed2
fix the latest errors with staging compiler ( #1229 )
2024-04-04 11:33:29 -07:00
jakpiase
c701071666
Add Grouped Gemm Multiple D SplitK TwoStage ( #1212 )
...
* Support A/B/C elementwise ops.
* First part of GGEMM multiD splitk two stage.
* WIP - changes for debuggin.
* tmp save
* working version
* added bf16@int8 version
* fixes
* add reviewers sugestions
* pre-commited missing files
* switched to ifs from elseifs
---------
Co-authored-by: Adam Osewski <Adam.Osewski@amd.com >
2024-04-04 11:01:33 +02:00
Rostyslav Geyyer
a61e73bc56
Add instances for conv_scale with fp8@bf8->fp8 ( #1220 )
...
* Update device op api to support BComputeType
* Add example
* Add instances
* Add profiler mode
* Add client example
* Update copyright year
* Add BComputeType check
* Fix compute types
2024-04-03 09:08:08 -05:00
Bartłomiej Kocot
9a194837af
Introduce combined elementwise ops ( #1217 )
...
* Introduce combined elementwise ops
* Introduce refrence elementwise
2024-04-02 17:23:49 -05:00
Illia Silin
ae57e5938e
Split the instances by architecture. ( #1223 )
...
* parse examples inside the add_example_executable function
* fix the example 64 cmake file
* add xdl flag to the gemm_bias_softmax_gemm_permute example
* add filtering of tests based on architecture type
* enable test_grouped_gemm for gfx9 only
* enable test_transpose only for gfx9
* only linnk test_transpose if it gets built
* split the gemm instances by architectures
* split gemm_bilinear,grouped_conv_bwd_weight instances by targets
* split instances by architecture
* split grouped_conv instances by architecture
* fix clang format
* fix the if-else logic in group_conv headers
* small fix for grouped convolution instances
* fix the grouped conv bwd weight dl instances
* fix client examples
* only enable client examples 3 and 4 on gfx9
* set the gfx9 macro
* make sure the architecture macros are set by cmake
* use separate set of xdl/wmma flags for host code
* sinmplify the main cmake file
* add conv_fwd_bf8 instance declaration
2024-04-02 09:42:17 -07:00
zjing14
303d4594f4
improved zeroing ( #1221 )
2024-04-02 11:02:52 -05:00
dependabot[bot]
5f2c89e8b4
Bump rocm-docs-core from 0.37.1 to 0.38.0 in /docs/sphinx ( #1218 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 0.37.1 to 0.38.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.37.1...v0.38.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-27 10:23:54 -07:00
Illia Silin
cc1f733d0e
allow the CI to pass even if can't connect to db ( #1214 )
2024-03-22 15:39:11 -07:00
dependabot[bot]
2ae16e901f
Bump rocm-docs-core from 0.37.0 to 0.37.1 in /docs/sphinx ( #1211 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 0.37.0 to 0.37.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.37.0...v0.37.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-22 07:58:36 -07:00
Bartłomiej Kocot
9c052804a7
Add elementwise with dynamic vector dim ( #1198 )
...
* Add elementwise with dynamic vector dim
* Reduce number of instaces
* Fixes
* Fixes
2024-03-22 10:40:43 +01:00
Rostyslav Geyyer
fd0d093e78
Add instances for conv_scale with bf8 in / fp8 out ( #1200 )
...
* Add bf8 conv fwd instances
* Add example
* Add profiler mode
* Add client example
* Fix copyright headers
* Format
2024-03-21 13:57:34 -05:00
dependabot[bot]
9e50426915
Bump rocm-docs-core from 0.36.0 to 0.37.0 in /docs/sphinx ( #1208 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 0.36.0 to 0.37.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.36.0...v0.37.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-20 09:28:03 -06:00
Illia Silin
f52109531b
Fix a couple of docker issues. ( #1206 )
...
* do not install sccache by default, only install rocm-llvm-dev for rocm6.1
* add sccache flag to docker build options
2024-03-19 08:38:52 -07:00
Illia Silin
9e011bcd6e
update the changelog for ROCm6.1 release ( #1205 )
...
* update the changelog for ROCm6.1 release
* modifty the order of items in changelog, capitalize GEMMs
2024-03-18 10:16:45 -07:00
Illia Silin
bdcd037428
Re-enable the performance tracking in CI. ( #1203 )
...
* test CK with rocm6.1 RC2
* add docker credentials for pull
* update the performance db name
* use environment variable for db name
* add rocm-llvm-dev package to ck docker
* turn off verification for daily performance runs
* do not stash ckProfiler on MI300 node
* add processing of mixed gemms to qa, fix parsing of splitk gemm logs
* fix the splitk gemm log file name
* turn the timing on for splitk gemm performance
2024-03-18 09:48:29 -07:00
Rostyslav Geyyer
e626d5202a
Add instances for conv_scale with fp8 in/out ( #1193 )
...
* Add fp8 conv instances and client example
* Format
* Add example
* Update cmakelists
* Add profiler mode
* Format
* Fix copyright headers
2024-03-15 09:50:03 -07:00
Bartłomiej Kocot
285251768e
Add conv fwd/bwd data scale instances, extend bilinear instances ( #1178 )
...
* Add conv fwd/bwd data scale instances
* Fix cmake client example file
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2024-03-13 23:09:08 +01:00
randyh62
12441af014
Doc reorg2 ( #1189 )
...
* doc_reorg2 updated TOC
* doc_reorg2 updates
* fix conflicts, add grid
2024-03-12 18:25:48 -07:00
dependabot[bot]
8e97e85ac6
Bump rocm-docs-core from 0.35.1 to 0.36.0 in /docs/sphinx ( #1194 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 0.35.1 to 0.36.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.35.1...v0.36.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-12 08:21:14 -07:00
Bartłomiej Kocot
42fc8eddd2
Fix warnings during wrapper docs generation ( #1192 )
...
* Fix warnings during wrapper docs generation
* Fixes
2024-03-08 17:13:03 -08:00
zjing14
1837040a9c
Navi3 rel ( #1176 )
...
* wmma_op + unit test
* add arch limitation to wmma test
* change arch limitation
* Refactor + Add all type unit test(int4 compile failed)
* Add f32_16x16x16_bf16 unit test
* tempsave
* tempsave
* tempsave
* runtime bug, cannot find symbol
* workaround for incorrect HIP warpSize return value
* debugging
* tempsave
* Correctness OK, waiting for optimization
* Tidy up + format
* temp save
* temp save, reproduce the v_bfi_b32 issue
* add inline asm for wmmaop test
* tidy up
* clean some debug purpose code
* discard some codes
* clang format
* clang format
* compiler issue fixed + increase tile size
* navi3x_multipleD+example
* temp save
* workable
* batchedgemm[OK], groupconv[debug]
* groupconv: Sanity check[OK], Performance[Bad]
* navi3x_groupconv_need_optimization
* create necessary files
* save progress
* Add Inter-Row thread transfer
* save progress
* save debugging progress
* sanity check pass
* fix a host tensor bug and clean up flash-attn code
* format
* cancel unnecessary change
* cancel unnecessary change
* cancel unnecessary change
* temp save, add asm backend flag to amd_wmma
* Mat-A LDS Bypass sanity pass
* temp save
* gemm sanity fix
* Porting new blockwise gemm to flash attention
* Example branch provide to compiler team
* tempsave
* Fix a bug
* batched gemm ported
* conv A-skip lds ported
* Skip B-Lds real gemm
* Skip B Lds Gemm + MulD
* batched gemm, conv, skip b lds
* format
* Attn, skip b lds
* Change GridwiseOp nam
* fix a typo caused bug
* Skip A_Lds sanity pass, Skip B_Lds scratch occured
* Bug found, intra-row permute off caused
* bug found
* a fix
* disable buffer load due to incorrect 3rd dword
* update fmha config, no scratch generated
* update 3rd dword
* fmha config update
* FMHA, add support to gfx1101/gfx1102
* Merge origin dev (#2 )
* [Navi3x] Fix Gridwise_multiple_d operation (#649 )
* Add CMake Option "USE_OPT_NAVI3X"
* fix bug
* standardize docs (#655 )
* Separate bibtex requirement from rocm-docs-core (#656 )
* separate bibtex requirement from rocm-docs-core
* point requirements to source rocm-docs-core repo
* Add CMake Option "USE_OPT_NAVI3X" (#647 )
* Add CMake Option "USE_OPT_NAVI3X"
* remove navi3x opt compile option from cmake script
* Conv + quantization + tanh (#645 )
* Rename file. Prepare to support another activation
* Add comment for quantization
* Extract out_elementop
* Add tanh example
* Add conv + bias + tanh quantization instance
* Add missing parameter
* Refine cmake
* Add external api and client example
* Extract variable in example
* Fix the comment
---------
Co-authored-by: zjing14 <zhangjing14@gmail.com >
* Add a denorm test fix (#603 )
* Add type_convert implementations for bf16
* Add the fix for conv_fwd
* Add the fix for conv_bwd_data
* Add the fix for conv_bwd_weight
* Format
* Format
* Another format
* Add a macro to use workaround on MI200 only
* Format
---------
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
* simplify karg in device/grid of split-k op (#644 )
* simplify karg in device/grid split-k op
* fix mk_kn_mn instances
* add more instances
* use name from tensor layout
* fix 3rd dword of buffer source descriptor (#659 )
* add fp64 instances (#658 )
Co-authored-by: root <root@ctr-ubbsmc15.amd.com >
* Issue #666 : Revert "simplify karg in device/grid of split-k op (#644 )" (#665 )
This reverts commit bb5530af91 .
* Groupnorm + swish external api (#668 )
* Rename to proper naming
* Add example of groupnorm + swish
* Extract duplicate code in example
* Add groupnorm + swish instances
* Ractor instance generation, split into multiple cpp file
* Add external api and client example
* Refine profiler message
* Use ck math version of exp
* Refine problem size in example
* Add host version of exp
* add a marco to turn on/off denorm fix (off by default) (#673 )
* add a marco to turn off denorm fix by default
* expose the marco
---------
Co-authored-by: root <root@ctr-ubbsmc15.amd.com >
* fixed quant example (#672 )
Co-authored-by: root <root@ctr-ubbsmc15.amd.com >
* Add dependabot config and pin rocm-docs-core (#663 )
* [gtest] suppress unsafe buffer warn (#670 )
ref: https://github.com/ROCmSoftwarePlatform/MIOpen/pull/1912
* Add memory index guard in wmma device ops (#667 )
* Add more macros to turn on/off denorm fix (#678 )
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com >
* Fix a typo (#676 )
* Add (#677 )
* Allow using ROCm release candidate compilers. (#679 )
* enable use of rocm5.5 release candidate 4
* upgrade to ROCM5.5 RC5
* try fix the PUB_KEY error, remove the cmake-data package
* upgrade to latest cmake version
* use private dockerhub repo for rocm5.5 rc5
* add missing bracket
* add vector load check
* solve conflicts
---------
Co-authored-by: Sam Wu <sjwu@ualberta.ca >
Co-authored-by: Sam Wu <sam.wu2@amd.com >
Co-authored-by: rocking5566 <ChunYu.Lai@amd.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com >
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
Co-authored-by: root <root@ctr-ubbsmc15.amd.com >
Co-authored-by: Jun Liu <Liu.Jun@amd.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
* Disable SkipLDS & Align AIT api (#3 )
* fix layernorm, reduction Ops (#4 )
* [Navi3x] Fix Gridwise_multiple_d operation (#649 )
* Add CMake Option "USE_OPT_NAVI3X"
* fix bug
* standardize docs (#655 )
* Separate bibtex requirement from rocm-docs-core (#656 )
* separate bibtex requirement from rocm-docs-core
* point requirements to source rocm-docs-core repo
* Add CMake Option "USE_OPT_NAVI3X" (#647 )
* Add CMake Option "USE_OPT_NAVI3X"
* remove navi3x opt compile option from cmake script
* Conv + quantization + tanh (#645 )
* Rename file. Prepare to support another activation
* Add comment for quantization
* Extract out_elementop
* Add tanh example
* Add conv + bias + tanh quantization instance
* Add missing parameter
* Refine cmake
* Add external api and client example
* Extract variable in example
* Fix the comment
---------
Co-authored-by: zjing14 <zhangjing14@gmail.com >
* Add a denorm test fix (#603 )
* Add type_convert implementations for bf16
* Add the fix for conv_fwd
* Add the fix for conv_bwd_data
* Add the fix for conv_bwd_weight
* Format
* Format
* Another format
* Add a macro to use workaround on MI200 only
* Format
---------
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
* simplify karg in device/grid of split-k op (#644 )
* simplify karg in device/grid split-k op
* fix mk_kn_mn instances
* add more instances
* use name from tensor layout
* fix 3rd dword of buffer source descriptor (#659 )
* add fp64 instances (#658 )
Co-authored-by: root <root@ctr-ubbsmc15.amd.com >
* Issue #666 : Revert "simplify karg in device/grid of split-k op (#644 )" (#665 )
This reverts commit bb5530af91 .
* Groupnorm + swish external api (#668 )
* Rename to proper naming
* Add example of groupnorm + swish
* Extract duplicate code in example
* Add groupnorm + swish instances
* Ractor instance generation, split into multiple cpp file
* Add external api and client example
* Refine profiler message
* Use ck math version of exp
* Refine problem size in example
* Add host version of exp
* add a marco to turn on/off denorm fix (off by default) (#673 )
* add a marco to turn off denorm fix by default
* expose the marco
---------
Co-authored-by: root <root@ctr-ubbsmc15.amd.com >
* fixed quant example (#672 )
Co-authored-by: root <root@ctr-ubbsmc15.amd.com >
* Add dependabot config and pin rocm-docs-core (#663 )
* [gtest] suppress unsafe buffer warn (#670 )
ref: https://github.com/ROCmSoftwarePlatform/MIOpen/pull/1912
* Add memory index guard in wmma device ops (#667 )
* Add more macros to turn on/off denorm fix (#678 )
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com >
* Fix a typo (#676 )
* Add (#677 )
* Allow using ROCm release candidate compilers. (#679 )
* enable use of rocm5.5 release candidate 4
* upgrade to ROCM5.5 RC5
* try fix the PUB_KEY error, remove the cmake-data package
* upgrade to latest cmake version
* use private dockerhub repo for rocm5.5 rc5
* add missing bracket
* Disable SkipLDS & Align AIT api
* Update dependabot config (#682 )
Co-authored-by: samjwu <samjwu@users.noreply.github.com >
* update attn api
* solve type_convert bug + enable
---------
Co-authored-by: Sam Wu <sjwu@ualberta.ca >
Co-authored-by: Sam Wu <sam.wu2@amd.com >
Co-authored-by: rocking5566 <ChunYu.Lai@amd.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com >
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
Co-authored-by: root <root@ctr-ubbsmc15.amd.com >
Co-authored-by: Jun Liu <Liu.Jun@amd.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: samjwu <samjwu@users.noreply.github.com >
Co-authored-by: haocwang <Haocong.WANG@amd.com >
* fix typo
* Fix attention with causal mask
* multiple fix, try ait compile
* Add A/B not use LDS pipeline
* Clang format, Add gfx1101, gfx1102 support of FMHA example
* cancel change of format script
* 1. Enable 2-stage global Prefetch ( May cause VGPR spilling)
2. Enable FP16 accumulator blockwise_gemm
* clang-format
* 1. change blockwise gemm loopover direction from kmn to mnk ( ~1% improvement)
2. change kernel timing mode to 50 warmup + 50 timed repeat
* Update low level abstration of blockwise gemm wmma
* (2/5) bilinear gemm pass, perf bug: skip a lds has lower performance than skip b lds
* (3/5) batched gemm pass, perf bug: skip a lds has lower performance than skip b lds
* (4/5) grouped conv pass
* (5/5) attention pass, todo: debug lds perf bug
* AIT Attention API refactor (#8 )
* sanity pass
* sanity pass 2
* confirm significant performance regression.
* turn on all instances
* turn off instance format
* Fix bug & tunning & format
* DML meta, self_attn+cross_attn
* sanity pass
* remove useless flag
* update tile and problem size used in AIT attention
* bug fix in grouped conv supporting check
* deprecate inline asm wmma
* Bug fix: double lds skip
* clang-format
* Fix errors in
1. example, fmha
2. gridwise pipeline
3. deviceop, fmha, change some containers from vector to array
* part2 of previous commit
* clang format
* API fix of gridwisegemmpipeline
* separate array base and vector base attention tensor transformation
* fix gemm
* clang format
* add gemm fp16 instances
* Temp save
* fpAintB kernel compile pass
* Sanity pass.
* Temp save
* debug code enabled
* Fp16AInt8B_GEMM sanity
* MQA implementation
* GQA-4 example
* tempsave
* Compile pass
* New implementation of fp16Aint8B Gemm, Acheieve similar math throughput with native fp16 Gemm
* format
* Todo: fix gemm_bilinear_wmma instances compilation bug
* Solve a bug when K1=16
* remove unnecessary changes
* Remove tensor layout limitation to LDS usage in tesnor contraction
* update self-attention and cross-attention
* fix a typo of name
* Add arch limiter for fp8 gemm
* enable fp8 gemm_xdl for all gfx9 targets
* temporarily disable gemm_xdl_fp16_fp8 on MI100/200
* fix the cmake logic for gemm_xdl_fp16_fp8
* re-enable the gemm_xdl_fp16_fp8 on MI100/200
---------
Co-authored-by: aska-0096 <haocwang@amd.com >
Co-authored-by: Sam Wu <sjwu@ualberta.ca >
Co-authored-by: Sam Wu <sam.wu2@amd.com >
Co-authored-by: rocking5566 <ChunYu.Lai@amd.com >
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com >
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
Co-authored-by: root <root@ctr-ubbsmc15.amd.com >
Co-authored-by: Jun Liu <Liu.Jun@amd.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: samjwu <samjwu@users.noreply.github.com >
Co-authored-by: haocwang <Haocong.WANG@amd.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
2024-03-08 17:11:51 -08:00
Rostyslav Geyyer
363feb482d
Refactor tolerances for correctness check in gemm op ( #1188 )
...
* Refactor tolerances for correctness check
* Update tolerances
* Update host-side gemm
* Update reference gemm call
2024-03-08 12:05:05 -08:00
Lisa
0e28de9766
Update link ( #1186 )
2024-03-07 10:09:17 -08:00
yhuiYH
adb3615d1a
Update CODEOWNERS to use documentation group ( #1190 )
...
Also had to remove a name
2024-03-07 10:08:37 -08:00
dependabot[bot]
1ddc8a841a
Bump rocm-docs-core from 0.35.0 to 0.35.1 in /docs/sphinx ( #1187 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 0.35.0 to 0.35.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.35.0...v0.35.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-05 21:55:01 -08:00
Paul Fultz II
8eff4d62b6
Add host lib ( #1134 )
...
* Format
* Format
* Format
* Remove const
* Use the right template
* Format
* Format
* add row/col instances
* Add missing file
* fixed
* Format
* Updates
* Format
* fixed rrr layout
* Format
* Update test and embed modules
* Restore older version
* Update year
* Set -fPIC
* Format
* Use double for isnan
* rename host folder to codegen + minor fix
* add codegen CI test
* add option to build components without building CK
* fix the groovy syntax
* fix typo
* use the correct function for the codegen stage
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
2024-03-05 17:08:43 -08:00
Illia Silin
cf86621170
[CI] Add CI build and test stage on MI300. ( #1185 )
2024-03-05 10:42:16 -08:00
Rostyslav Geyyer
9ce18b045d
Fix example_gemm_xdl_fp8 ( #1183 )
2024-03-01 16:42:15 -08:00
Rostyslav Geyyer
acfb339238
Update clipping for fp8/bf8 conversion ( #1182 )
...
* Update clipping for fp8 conversion
* Add clipping for bf8 conversion
* Format
2024-03-01 10:30:38 -08:00
amoskvic
a776978cbe
Style improvement: improving type alias usage consistency in gemm-related client examples. Also copyright year update for all client examples. ( #1180 )
...
Co-authored-by: Arseny Moskvichev <amoskvic@amd.com >
2024-02-28 16:39:03 -08:00