incomplete fix from https://github.com/ROCmSoftwarePlatform/composable_kernel/pull/670
So it does not only happen in gtest but also in CK code:
We need to fix them as a quality improvement, but for now suppressing this warning in immediate releases:
http://compiler-ci.amd.com/blue/rest/organizations/jenkins/pipelines/compiler-psdb-amd-stg-open/runs/2540/nodes/282/steps/3202/log/?start=0
e.g.
```
[2023-04-26T17:26:31.524Z] /jenkins/workspace/compiler-psdb-amd-stg-open/Libs/MIOpen/deps_hip/cget/build/tmp-a3db5da587a64213bde99fb856db1b43/composable_kernel-0f98035df1cc5ba3e90ab03187e672b426a25b00/include/ck/utility/generic_memory_space_atomic.hpp:52:19: error: unsafe pointer arithmetic [-Werror,-Wunsafe-buffer-usage]
[2023-04-26T17:26:31.524Z] atomicAdd(c_style_pointer_cast<float*>(p_dst) + 1, vx.template AsType<float>()[I1]);
[2023-04-26T17:26:31.524Z] ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
```
[2023-04-26T17:26:31.523Z] /jenkins/workspace/compiler-psdb-amd-stg-open/Libs/MIOpen/deps_hip/cget/build/tmp-a3db5da587a64213bde99fb856db1b43/composable_kernel-0f98035df1cc5ba3e90ab03187e672b426a25b00/include/ck/utility/amd_inline_asm.hpp:62:20: error: 'p_a_half2' is an unsafe pointer used for buffer access [-Werror,-Wunsafe-buffer-usage]
[2023-04-26T17:26:31.523Z] const half2_t* p_a_half2 = c_style_pointer_cast<const half2_t*>(&a);
[2023-04-26T17:26:31.523Z] ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
* [What] Remove pure conv int8 instance
[Why] We will never use pure int8 conv in AI, use int8 quantization instead
* Change layout
* Share the kernel parameter
* Support more type of NHWGC for group conv
* Revise client example of conv 2d, use NHWGC layout
* Add instance to cmake
* Revise layout of group conv quantization instance
* Revise layout of external api of group conv quantization
* Revise layout of group conv quantization client example
* Fix clang format
* Add comment to describe meaning of each parameter
* simplify karg in device/grid split-k op
* fix mk_kn_mn instances
* add more instances
* use name from tensor layout
---------
Co-authored-by: carlushuang <carlus.huang@amd.com>
* enable use of rocm5.5 release candidate 4
* upgrade to ROCM5.5 RC5
* try fix the PUB_KEY error, remove the cmake-data package
* upgrade to latest cmake version
* use private dockerhub repo for rocm5.5 rc5
* add missing bracket
* Rename to proper naming
* Add example of groupnorm + swish
* Extract duplicate code in example
* Add groupnorm + swish instances
* Ractor instance generation, split into multiple cpp file
* Add external api and client example
* Refine profiler message
* Use ck math version of exp
* Refine problem size in example
* Add host version of exp
* Add type_convert implementations for bf16
* Add the fix for conv_fwd
* Add the fix for conv_bwd_data
* Add the fix for conv_bwd_weight
* Format
* Format
* Another format
* Add a macro to use workaround on MI200 only
* Format
---------
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
* Add conv perlayer quantization
* Add gemm_dlops quantization
* Support int8 for innerproduct
* Refine gemm dlops int8 kernel parameter
* Support gfx908(MI100) and gfx90a(MI200)
* clang-format
* Rename example number
* Support different layout for d tensor
* Add conv dlops perchannel quantization example
* Move to example 40
* Extract the common code for different platform (dlops and xdlops)
* Move ot subfolder. Prepare to add other op of quantization
* Refine the quantization instance library
* Add conv dl instances and client example
* Remove unnecessary type
* Add gemm quantization instance
* Add external api and client example
* Refine num_bytes
* Separete different layout to different cpp
* Add more xdl instances
* Revert "Remove unnecessary type"
This reverts commit 820869182f.
* Remove CShuffleDataType in dlops
Let acc and CShuffleDataType be the same in xdlops
---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>
* Pass shared mem pointer as pointer to void.
* Device Op GroupedGEMM Multiple D
* Example for grouped gemm multiple d.
* Add MI200 to supported archs.
---------
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
* make conv_fwd_bias_activation kernel id unique
* add more parameters to conv and gemm kernel names
* update GetTypeString for conv and gemm kernels
* fix two more kernel strings
* Grouped gemm + Gelu instances.
* Device Instance Factory for GroupedGemm+Gelu
* Client example
* Rangify fill helper functions.
* Fix name clash.
* Profiler for grouped_gemm+gelu
* No need to use full namespace name.
* Add check for MRaw divisible by vector load.
* Ugly fix for big errors.
* Add grouped_gemm+gelu to profiler CMakelists.
* Store in argument additional info.
* Information about Mraw, Nraw, Kraw values.
* Use FastGelu instead of Gelu.
* Change client ex to use FastGelu
* Remove relaxed error precision.
* Remove duplicate output elementwise-op
---------
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
* Modify Doxygen config to pick up include directories recursively
* Add DeviceMem struct to API Reference guide
* Add classes that are used in Flash Attention kernel
* Add a reference and config for generating bibliography
Co-authored-by: Philip Maybank <Philip.Maybank@amd.com>
* add new parallel stage on navi node
* dont run performance tests on navi, get rid of 9110 compiler
* only run navi build when not doing QA
* fix syntax
* use navi21 label
* dont stash profiler on navi nodes, scp deb package to ginger
* disable tests on navi nodes
* test posting a binary to ginger
* add sshpass and use it to copy deb package
* fix the scp example
* fix syntax
* debug the scp issues
* add jenkins user to docker
* dont try whoami
* change jenkins uid and add user with uid=1002
* try scp from the last stage on micimaster
* rename and stash the package, scp from micimaster