Po-Yen, Chen
990eed11b7
Handle the case while user specify all the strides
2022-08-19 16:32:37 -04:00
Po-Yen, Chen
7558d14442
Fix wrong program return value of GEMM examples
2022-08-19 16:29:48 -04:00
Po-Yen, Chen
1ce791ea05
Use more strict condition to add code in examples
2022-08-19 15:39:50 -04:00
Po-Yen, Chen
75a30f8b18
Mark Tensor<> special member functions as 'default'
2022-08-19 15:30:43 -04:00
Po-Yen, Chen
1626a6e376
Remove unnecessary copy ctor for Tensor<>
2022-08-19 15:27:21 -04:00
Po-Yen, Chen
cd395646fa
Fix compilation error in check_err()
2022-08-19 15:22:26 -04:00
Po-Yen, Chen
47770c857b
Allow unsigned integer arguments for check_err()
2022-08-19 15:19:34 -04:00
Po-Yen, Chen
3b0f97f6eb
Revert "Add type traits 'is_signed_integral<>'"
...
This reverts commit f2c148efae .
2022-08-19 15:14:12 -04:00
Po-Yen, Chen
103ae7d126
Use reinterpret_cast<>() for cross-type pointer conversion
2022-08-19 15:01:32 -04:00
Po-Yen, Chen
a177ad758f
Unify structured comment in examples
2022-08-19 14:57:21 -04:00
Po-Yen, Chen
e37f4ab9cc
Re-format common.hpp
2022-08-19 14:50:44 -04:00
Po-Yen, Chen
f7288bc2b1
Reuse same implementation code for most of GEMM examples
2022-08-19 14:47:09 -04:00
Po-Yen, Chen
ed51c0638b
Re-format template argument in example code
2022-08-19 14:31:46 -04:00
Po-Yen, Chen
5931c7ebe6
Move common codes together
2022-08-19 13:49:22 -04:00
Po-Yen, Chen
68a57e71e6
Move #include directives into new header
2022-08-19 13:24:00 -04:00
Po-Yen, Chen
42d75f356c
Sort include directives
2022-08-19 12:59:46 -04:00
Po-Yen, Chen
dd5b139401
Extract int4 example common codes
2022-08-19 12:57:36 -04:00
Po-Yen, Chen
3e2f37a148
Re-format GEMM instance template arguments
2022-08-19 12:02:57 -04:00
Po-Yen, Chen
c1fbabea04
Avoid too much generalizing check_err()
2022-08-19 11:59:21 -04:00
Po-Yen, Chen
4d4a659cd6
Use ""_uz to simplify example code
2022-08-19 11:54:51 -04:00
Po-Yen, Chen
3e2371c554
Align design with other PR
2022-08-19 11:44:08 -04:00
Po-Yen, Chen
503f07c1e0
Add constraint to check_err() input reference type
2022-08-19 11:34:19 -04:00
Po-Yen, Chen
2fb766e852
Simplify tensor usages in examples
2022-08-19 11:33:25 -04:00
Po-Yen, Chen
0d5025befe
Add #error directive to prevent compile sources with wrong setting
2022-08-19 10:51:30 -04:00
Po-Yen, Chen
625f95ade4
Remove debug messages
2022-08-19 10:05:44 -04:00
Po-Yen, Chen
84843aa36f
Avoid compilation error while disabling ck::int4_t support
2022-08-19 09:54:03 -04:00
Po-Yen, Chen
51d0c6794c
Remove constraint of Tensor<>::CopyAsType()
2022-08-19 05:31:04 -04:00
Po-Yen, Chen
c34f8411c4
Check converted Tensor<int4_t> with golden Tensor<int8_t>
2022-08-19 04:40:13 -04:00
Po-Yen, Chen
a83c006098
Allow comparing different-sized integral types in check_err()
2022-08-19 04:39:20 -04:00
Po-Yen, Chen
726c115393
Add type constraints for integer version check_err<>()
2022-08-19 03:48:20 -04:00
Po-Yen, Chen
f2c148efae
Add type traits 'is_signed_integral<>'
2022-08-19 03:47:22 -04:00
Po-Yen, Chen
463d15f9b5
Add constraint to Tensor<> templated methods
2022-08-19 03:27:41 -04:00
Po-Yen, Chen
f3f61f836b
Complete the int4 examples
2022-08-19 02:19:50 -04:00
Po-Yen, Chen
2dc3357a20
Fix typo in alias names
2022-08-19 01:41:20 -04:00
Po-Yen, Chen
79480f0aee
Re-use element-wise operation type alias
2022-08-19 01:39:46 -04:00
Po-Yen, Chen
dd849a8736
Re-use CopyAsType<>() to implement copy ctor
2022-08-19 01:02:36 -04:00
Po-Yen, Chen
e03cece9c4
Use different type for host tensors
2022-08-19 00:32:57 -04:00
Po-Yen, Chen
89a827cab9
Re-format source files
2022-08-19 00:32:24 -04:00
Po-Yen, Chen
cbbe2485b2
Allow conversion between Tensor<> specializations
2022-08-19 00:30:53 -04:00
Po-Yen, Chen
30ed3e218c
Add int4_t support for check_err()
2022-08-19 00:30:28 -04:00
Po-Yen, Chen
194faf7837
Distinguish user-side type from kernel-side type
2022-08-18 23:43:19 -04:00
Po-Yen, Chen
70c87970ec
Re-use pre-defined alias in int4 exmples
2022-08-18 23:29:38 -04:00
Po-Yen, Chen
4b153bd974
Add GEMM examples for int4
...
Currently the source files are just copied from int8 examples
2022-08-18 23:03:36 -04:00
Illia Silin
9efd033bee
restart the stages on MI200 in case of failures ( #366 )
...
* restart the stages on MI200
* fix the docker image storage issue
2022-08-18 14:54:47 -05:00
Adam Osewski
e00149ac67
int4 data type ( #364 )
...
* Introduce int4 data type.
* Add unit-tests for int4
* Compile int4 UT only when int4 enabled.
* clang-format
Co-authored-by: Adam Osewski <aosewski@amd.com >
2022-08-18 14:53:47 -05:00
Chao Liu
bac7df8faf
use scale ( #363 )
2022-08-17 10:38:00 -05:00
Anthony Chang
c961ce9226
Hotfix LDS data hazard in fused attention ( #360 )
...
* avoid LDS data hazard in gemm_softmax_gemm pipeline
* trivial refactors
* comments
* shrink blockwise gemm v2 thread buffer size
* reclaim A block lds space when during 2nd gemm
* amend
* amend
2022-08-15 12:04:20 -05:00
Qianfeng
53ea4713af
Batchnorm-forward and Batchnorm-infer Implemented using generic kernels ( #320 )
...
* Implement multiple-reduction in one kernel (kernels, device ops, examples)
* Add generic elementwise kernel and device interface
* Add generator for normal-distributed data initialization
* Add host refer implementation of batchnorm-forward and batchnorm-infer
* Add examples for implementing batchnorm-forward and batchnorm-infer using generic kernels
* Remove un-needed including in batchnorm example
* Renaming generic_elementwise to elementiwise in kernel and device classes/functions
* Change in gemm_layernorm examples to use DeviceElementwise instead of Device5AryElementwise
* Change in exampe 19_binary_elementwise to use DeviceElementwise instead of DeviceBinaryElementwise
* Change in device_cgemm_4gemm_xdl_cshuffle.hpp to use kernel_elementwise instead of kernel_binary_elementwise
* Add DeviceElementwiseBase and use it in device_normalize_instance.cpp
* Removing and renaming files
* Update to synchronize gemm_layernorm client example to the generic element-wise device op API
* Update to synchronize with the latest headers directory and HostTensorDescriptor interface renaming
* Merge two static member functions in device_elementwise.hpp
* Remove unary_elementwise_1d kernel and device
2022-08-15 10:11:02 -05:00
Chao Liu
5ee304595c
fix build issue ( #357 )
...
* fix build
* excludeexample_gemm_max_xdl_fp16 from testing due to random failure on gfx908
2022-08-13 15:58:31 -05:00
cloudhan
fb1cbf025b
Change all device operations to use add_instance_library ( #338 )
...
* Change all device operations to use add_instance_library to avoid duplicated cmake configuration.
* update DeviceMem
Co-authored-by: Chao Liu <chao.liu2@amd.com >
2022-08-13 12:17:58 -05:00