Commit Graph

885 Commits

Author SHA1 Message Date
Sam Wu
4bb12a1bd8 Documentation Updates (#710)
* update documentation dependencies

add version number to docs

rename doc config directories

enable more doc formats on rtd

add license section in docs

[ROCm/composable_kernel commit: 3cff340423]
2023-05-18 11:08:38 -06:00
Bartłomiej Kocot
993c671395 Add contraction profiler and tests (#701)
* Add contraction profiler and tests

* Build and style fixes

* Allow to use any elementwise operator for ref_contraction

* Introduce profile_contraction_scale and profile_contraction_bilinear

* Make ref_contraction generic and extend interface tests

* Stylistic minor fixes

* Extend test_contraction_interface

[ROCm/composable_kernel commit: 642d5e9155]
2023-05-15 09:46:52 -05:00
rocking
e57089f861 Normalization/split k (#615)
[ROCm/composable_kernel commit: a1e344b1ae]
2023-05-11 07:15:02 -05:00
Rostyslav Geyyer
7d92b0fb64 Optimize bf16 conversion (#664)
* Add TypeConvert class and start refactoring

* Refactor TypeConvert as a struct

* Get back to template functions type_convert

* Add a type_convert_bf16_rtn, set rtz as default

* Clean up

* Add UnaryConvertPrecision struct for high-precision workloads

* Format

* Update type_convert to UnaryConvert on threadwise level

* Update UnaryConvertPrecision

* Format

* Fix chmod

* Add a flag to pick converion method

* Format

* Remove the added flag

* Merge elementwise op with type conversion

* Move type_convert to elemwise op, update the op

* Update type_convert_precision -> bf16_convert_rtn

* Clean up

* Update comments

* Update the CK_WORKAROUND_DENORM_FIX flag handling

* Update the unneeded op to work but warn user

* Remove the message

* Use a PassThrough instead of ConvertBF16RTN to calcaulate reference

* Format

* Add missing include

[ROCm/composable_kernel commit: b076a02ad2]
2023-05-04 10:25:47 -05:00
Illia Silin
a2d3ef1536 Fix the group of quantization_int8 kernels on MI300. (#695)
* replace amd_buffer_atomic_add with hip_atomic_add

* fix grouped_gemm_splitk kernels on mi300

* fix syntax

* revert experimental atomic_add changes

* fix the group of kernels from ticket 723 on MI300

---------

Co-authored-by: Jing Zhang <jizhan@amd.com>

[ROCm/composable_kernel commit: b8635a25b2]
2023-05-03 18:27:04 -05:00
Illia Silin
5406c5254e Fix grouped_gemm_splitk kernels on MI300. (#694)
* replace amd_buffer_atomic_add with hip_atomic_add

* fix grouped_gemm_splitk kernels on mi300

* fix syntax

* revert experimental atomic_add changes

---------

Co-authored-by: Jing Zhang <jizhan@amd.com>

[ROCm/composable_kernel commit: 4a51d2da9d]
2023-05-03 08:25:25 -07:00
Illia Silin
358f58f14b update daily build from rocm 5.4.3 to 5.5 (#693)
[ROCm/composable_kernel commit: 86e0190ec9]
2023-05-03 08:18:10 -07:00
zjing14
38cb16791b fixed init range (#691)
[ROCm/composable_kernel commit: f53ede26e5]
2023-05-02 08:30:23 -07:00
Illia Silin
da61da8b4a Syncing up from internal repo to enable MI300. (#690)
* enable gfx940

* switch between intrinsic mfma routines on mi100/200 and mi300

* fix mfma_int8 on MI300

* disable 2 int8 examples on MI300

* Update cmake-ck-dev.sh

* restore gitignore file

* modify Jenkinsfile to the internal repo

---------

Co-authored-by: Jing Zhang <jizha@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 4feebedd41]
2023-04-28 18:22:59 -05:00
Haocong WANG
1dc0de1c00 add vector load check (#680)
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 54c90aae13]
2023-04-26 15:58:57 -05:00
Jun Liu
aea315a7c4 [CK] suppress unsafe buffer warn (#687)
incomplete fix from https://github.com/ROCmSoftwarePlatform/composable_kernel/pull/670

So it does not only happen in gtest but also in CK code:

We need to fix them as a quality improvement, but for now suppressing this warning in immediate releases:
http://compiler-ci.amd.com/blue/rest/organizations/jenkins/pipelines/compiler-psdb-amd-stg-open/runs/2540/nodes/282/steps/3202/log/?start=0

e.g.
```
[2023-04-26T17:26:31.524Z] /jenkins/workspace/compiler-psdb-amd-stg-open/Libs/MIOpen/deps_hip/cget/build/tmp-a3db5da587a64213bde99fb856db1b43/composable_kernel-9084a068fb4f5fe7d58cc80e08b9769da1f64556/include/ck/utility/generic_memory_space_atomic.hpp:52:19: error: unsafe pointer arithmetic [-Werror,-Wunsafe-buffer-usage]
[2023-04-26T17:26:31.524Z]         atomicAdd(c_style_pointer_cast<float*>(p_dst) + 1, vx.template AsType<float>()[I1]);
[2023-04-26T17:26:31.524Z]                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
```
[2023-04-26T17:26:31.523Z] /jenkins/workspace/compiler-psdb-amd-stg-open/Libs/MIOpen/deps_hip/cget/build/tmp-a3db5da587a64213bde99fb856db1b43/composable_kernel-9084a068fb4f5fe7d58cc80e08b9769da1f64556/include/ck/utility/amd_inline_asm.hpp:62:20: error: 'p_a_half2' is an unsafe pointer used for buffer access [-Werror,-Wunsafe-buffer-usage]
[2023-04-26T17:26:31.523Z]     const half2_t* p_a_half2  = c_style_pointer_cast<const half2_t*>(&a);
[2023-04-26T17:26:31.523Z]     ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

[ROCm/composable_kernel commit: 7613c1d9b9]
2023-04-26 15:41:03 -05:00
Adam Osewski
d9fe87efbd Grouped Gemm + SplitK + simplified Kernel Args (#669)
* simplify karg in device/grid split-k op

* fix mk_kn_mn instances

* add more instances

* B2C with 3D grid for KSplit

* Remove unused code.

* Use default B2C (3D grid) in grid gemm v2r4r2.

* Device gemm splitk use B2C map.

* Device GroupedGemmXdlSplitKCShuffle

* Example for GroupedGemm Xdl SplitK

* Introduce Device GroupedGemmSplitK

* Fix updating kbatch size.

* Add instance mk-nk-mn

* Enable set kbatch in profiler.

* Add GGemmSplitK mk-kn-mn instances

* Add more instances & split into multiple files.

* minor fix

* tuning

* clean

* disabled failed instances

* use pipe v2

* Ignore arg on not supported arch.

* fix warning

---------

Co-authored-by: carlushuang <carlus.huang@amd.com>
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Jing Zhang <jizhan@amd.com>
Co-authored-by: root <root@ctr-ubbsmc15.amd.com>

[ROCm/composable_kernel commit: 8bb2bb4a05]
2023-04-24 15:43:36 -05:00
zjing14
f28c43b544 reduce inital number for half_t splitk (#685)
[ROCm/composable_kernel commit: 8b9cbba823]
2023-04-24 08:07:39 -05:00
rocking
cff08cbc72 Revise layout of group convolution (#675)
* [What] Remove pure conv int8 instance
[Why] We will never use pure int8 conv in AI, use int8 quantization instead

* Change layout

* Share the kernel parameter

* Support more type of NHWGC for group conv

* Revise client example of conv 2d, use NHWGC layout

* Add instance to cmake

* Revise layout of group conv quantization instance

* Revise layout of external api of group conv quantization

* Revise layout of group conv quantization client example

* Fix clang format

* Add comment to describe meaning of each parameter

[ROCm/composable_kernel commit: 3eecbfb6ec]
2023-04-23 23:40:00 -05:00
Illia Silin
55d16b3400 Put back the split-k gemm code. (#684)
* simplify karg in device/grid split-k op

* fix mk_kn_mn instances

* add more instances

* use name from tensor layout

---------

Co-authored-by: carlushuang <carlus.huang@amd.com>

[ROCm/composable_kernel commit: 903cd19ce3]
2023-04-21 19:37:00 -05:00
Illia Silin
9ed5ad0f21 Switch to the new rocm5.6 compiler. (#681)
* switch to the new rocm5.6 compiler and docker

* fix syntax

[ROCm/composable_kernel commit: 9afa44d40b]
2023-04-21 07:59:26 -07:00
Sam Wu
11168111ba Update dependabot config (#682)
Co-authored-by: samjwu <samjwu@users.noreply.github.com>

[ROCm/composable_kernel commit: 938a5e0e41]
2023-04-20 21:55:56 -06:00
Illia Silin
ebc7fabbe5 Allow using ROCm release candidate compilers. (#679)
* enable use of rocm5.5 release candidate 4

* upgrade to ROCM5.5 RC5

* try fix the PUB_KEY error, remove the cmake-data package

* upgrade to latest cmake version

* use private dockerhub repo for rocm5.5 rc5

* add missing bracket

[ROCm/composable_kernel commit: bb0b772da9]
2023-04-18 09:22:49 -07:00
rocking5566
ee4b893928 Add (#677)
[ROCm/composable_kernel commit: fd11a4a12a]
2023-04-17 10:12:10 -05:00
Haocong WANG
f0f697ae4a Fix a typo (#676)
[ROCm/composable_kernel commit: fc26d42a2e]
2023-04-15 21:57:34 -05:00
Rostyslav Geyyer
6e1df339c9 Add more macros to turn on/off denorm fix (#678)
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>

[ROCm/composable_kernel commit: 03eaee6ae6]
2023-04-15 21:56:07 -05:00
Haocong WANG
000176b5fc Add memory index guard in wmma device ops (#667)
[ROCm/composable_kernel commit: e85178b4ca]
2023-04-11 15:42:47 -05:00
Jun Liu
b4df986264 [gtest] suppress unsafe buffer warn (#670)
ref: https://github.com/ROCmSoftwarePlatform/MIOpen/pull/1912

[ROCm/composable_kernel commit: f532988713]
2023-04-11 15:41:49 -05:00
Sam Wu
e5a82c403a Add dependabot config and pin rocm-docs-core (#663)
[ROCm/composable_kernel commit: fd497f0e79]
2023-04-11 09:18:38 -06:00
zjing14
53b28d2146 fixed quant example (#672)
Co-authored-by: root <root@ctr-ubbsmc15.amd.com>

[ROCm/composable_kernel commit: c203bf6711]
2023-04-11 07:46:46 -05:00
zjing14
b18d739672 add a marco to turn on/off denorm fix (off by default) (#673)
* add a marco to turn off denorm fix by default

* expose the marco

---------

Co-authored-by: root <root@ctr-ubbsmc15.amd.com>

[ROCm/composable_kernel commit: c54f8bcc25]
2023-04-11 07:44:43 -05:00
rocking5566
356c1cc17b Groupnorm + swish external api (#668)
* Rename to proper naming

* Add example of groupnorm + swish

* Extract duplicate code in example

* Add groupnorm + swish instances

* Ractor instance generation, split into multiple cpp file

* Add external api and client example

* Refine profiler message

* Use ck math version of exp

* Refine problem size in example

* Add host version of exp

[ROCm/composable_kernel commit: ed3a2e5226]
2023-04-10 08:02:17 -05:00
Jun Liu
89d6f8a65f Issue #666: Revert "simplify karg in device/grid of split-k op (#644)" (#665)
This reverts commit 1108f64591.

[ROCm/composable_kernel commit: 3248387bbb]
2023-04-06 17:14:11 -07:00
zjing14
696991c923 add fp64 instances (#658)
Co-authored-by: root <root@ctr-ubbsmc15.amd.com>

[ROCm/composable_kernel commit: fde6d2742b]
2023-03-30 13:30:43 -05:00
Haocong WANG
37f95442f9 fix 3rd dword of buffer source descriptor (#659)
[ROCm/composable_kernel commit: 091570f594]
2023-03-29 19:03:55 -05:00
carlushuang
1108f64591 simplify karg in device/grid of split-k op (#644)
* simplify karg in device/grid split-k op

* fix mk_kn_mn instances

* add more instances

* use name from tensor layout

[ROCm/composable_kernel commit: bb5530af91]
2023-03-29 19:03:07 -05:00
Rostyslav Geyyer
15ac3fc064 Add a denorm test fix (#603)
* Add type_convert implementations for bf16

* Add the fix for conv_fwd

* Add the fix for conv_bwd_data

* Add the fix for conv_bwd_weight

* Format

* Format

* Another format

* Add a macro to use workaround on MI200 only

* Format

---------

Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: dbd8f94bef]
2023-03-29 15:05:32 -05:00
rocking5566
cbce8b77da Conv + quantization + tanh (#645)
* Rename file. Prepare to support another activation

* Add comment for quantization

* Extract out_elementop

* Add tanh example

* Add conv + bias + tanh quantization instance

* Add missing parameter

* Refine cmake

* Add external api and client example

* Extract variable in example

* Fix the comment

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 389e84a83b]
2023-03-29 14:50:23 -05:00
Haocong WANG
8a984b4e3f Add CMake Option "USE_OPT_NAVI3X" (#647)
* Add CMake Option "USE_OPT_NAVI3X"

* remove navi3x opt compile option from cmake script

[ROCm/composable_kernel commit: 4e097ad283]
2023-03-29 14:07:33 -05:00
Sam Wu
5a8db87383 Separate bibtex requirement from rocm-docs-core (#656)
* separate bibtex requirement from rocm-docs-core

* point requirements to source rocm-docs-core repo

[ROCm/composable_kernel commit: 88d474323b]
2023-03-27 17:14:36 -06:00
Sam Wu
2268a29786 standardize docs (#655)
[ROCm/composable_kernel commit: f80776d937]
2023-03-23 20:58:59 -07:00
Haocong WANG
84f096c844 [Navi3x] Fix Gridwise_multiple_d operation (#649)
* Add CMake Option "USE_OPT_NAVI3X"

* fix bug

[ROCm/composable_kernel commit: e5376be4ac]
2023-03-23 11:22:10 -05:00
Po Yen Chen
57c8d94bf7 Reduce group & batch of the tested convolutions (#648)
[ROCm/composable_kernel commit: fe96e8fbf2]
2023-03-22 10:49:11 -07:00
Illia Silin
b3c1e83276 Get rid of XDL parameters in WMMA kernel string. (#646)
* remove XDL parameters from WMMA kernel string

* get rid f two more parameters

[ROCm/composable_kernel commit: 36750a5763]
2023-03-22 08:05:48 -07:00
Dan Yao
a84d2f5d81 rtn in ternary way (#632)
* rtn in ternary way

* Check both flags to preserve NaN

* Format

* Rearrange flag1

* Apply suggestions from code review

Co-authored-by: Ronan Keryell <ronan@keryell.fr>

---------

Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
Co-authored-by: Ronan Keryell <ronan@keryell.fr>

[ROCm/composable_kernel commit: 8a659a2e4c]
2023-03-20 14:30:24 -05:00
ltqin
fc10856d4b workaround 637 (#640)
* add workaround 637

* format

* change id

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 6ae12434d2]
2023-03-20 11:49:31 -05:00
Rostyslav Geyyer
5c8eb78a25 Update cmake-ck-dev.sh script (#641)
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>

[ROCm/composable_kernel commit: fa998675fc]
2023-03-15 18:38:11 -05:00
rocking5566
6a1403d82d gemm/Conv xdlops + dlops quantization (#625)
* Add conv perlayer quantization

* Add gemm_dlops quantization

* Support int8 for innerproduct

* Refine gemm dlops int8 kernel parameter

* Support gfx908(MI100) and gfx90a(MI200)

* clang-format

* Rename example number

* Support different layout for d tensor

* Add conv dlops perchannel quantization example

* Move to example 40

* Extract the common code for different platform (dlops and xdlops)

* Move ot subfolder. Prepare to add other op of quantization

* Refine the quantization instance library

* Add conv dl instances and client example

* Remove unnecessary type

* Add gemm quantization instance

* Add external api and client example

* Refine num_bytes

* Separete different layout to different cpp

* Add more xdl instances

* Revert "Remove unnecessary type"

This reverts commit 820869182f.

* Remove CShuffleDataType in dlops
Let acc and CShuffleDataType be the same in xdlops

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 16dc18e0f9]
2023-03-15 15:29:40 -05:00
Adam Osewski
512ec3ac4d Device Op GroupedGemmMultipleD + example fp16 (#633)
* Pass shared mem pointer as pointer to void.

* Device Op GroupedGEMM Multiple D

* Example for grouped gemm multiple d.

* Add MI200 to supported archs.

---------

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: a2d5ca8e95]
2023-03-15 11:22:59 -05:00
Rostyslav Geyyer
6e6482b9cd Add layout check to IsSupportedArgument (#627)
* Add layout check to IsSupportedArgument

* Format

---------

Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: c10a6e8293]
2023-03-15 11:12:12 -05:00
Illia Silin
87113ad617 Update GetTypeString function to generate unique kernel IDs. (#638)
* make conv_fwd_bias_activation kernel id unique

* add more parameters to conv and gemm kernel names

* update GetTypeString for conv and gemm kernels

* fix two more kernel strings

[ROCm/composable_kernel commit: 14b3504d95]
2023-03-15 10:44:42 -05:00
Haocong WANG
459469f66a Fix arch limitation bug (#639)
[ROCm/composable_kernel commit: ea028ac65a]
2023-03-15 07:44:13 -07:00
Rostyslav Geyyer
b78f3ba805 Remove debug asserts (#629)
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>

[ROCm/composable_kernel commit: 5b57ab96a8]
2023-03-10 17:34:44 -06:00
Haocong WANG
9687ad0b61 [Navi3x] Multiple issue fix (#612)
* Change gridwise gemm mD blockwise gemm to naive

* RRR Gemm fix

* Fix RCR gemm bug

* Isolate wmma instructions

* Update amd_inline_asm.hpp

* Update amd_wmma.hpp

* Update amd_wmma.hpp

* fix syntax and update Jenkinsfile

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 087e310589]
2023-03-10 17:04:28 -06:00
carlushuang
ca7b3a4f58 fix a bug with non-dword-aligned offset when OOB, in case crash (#616)
Co-authored-by: zjing14 <zhangjing14@gmail.com>

[ROCm/composable_kernel commit: 76fcdc60e9]
2023-03-09 08:07:24 -06:00