Commit Graph

806 Commits

Author SHA1 Message Date
who who who
2b3fd10f2b remove unused variable (#564)
* remove unused variable

* format code

[ROCm/composable_kernel commit: ba40c2ce9d]
2023-01-31 10:34:35 +08:00
Adam Osewski
dcc84da1cf Use defined seed for deterministic test runs. (#562)
Co-authored-by: Adam Osewski <aosewski@amd.com>

[ROCm/composable_kernel commit: 274108d6e6]
2023-01-30 13:03:59 -06:00
Adam Osewski
a203d3db7b Add more instances for irregular GEMM sizes. (#560)
Co-authored-by: Adam Osewski <aosewski@amd.com>

[ROCm/composable_kernel commit: 7494c1c611]
2023-01-26 13:42:20 -06:00
Qianfeng
2c1a324b99 Batchnorm inference instances, external API, client examples and gtests (#531)
* File renaming and class renaming for device element-wise operation

* Add batchnorm-infer instances, external API and client example

* Add batchnorm-infer profiler module and gtests

* Remove file device_elementwise_extension.hpp and move NormalizeInInfer operation to element_wise_operation.hpp

* Remove the using of class aliasing for DeviceElementwiseForBatchNormInfer

* Rename class and file due to conflict from device_elementwise_2d.hpp

* Fix namespace in batcnnorm_infer_nhwc client example

[ROCm/composable_kernel commit: a1b2441f8d]
2023-01-25 17:09:04 -06:00
Qianfeng
fc8fa0992f Use double for all scaling values and float-point constant values at the Device Op API (#557)
* Use double as alpha/beta values type in reduce device op api

* Use double as alpha/beta values type in softmax device op api

* Use double as alpha/beta values type in multiple-reduce device op api

* Use double as epsilon value type in normalization/elementwise-normalization device op api

[ROCm/composable_kernel commit: 52abc2f371]
2023-01-18 12:02:50 -06:00
Raman R jana
21a146fb2f Wavelet (inter-wave consumer-producer) GEMM (#310)
* wavelet gemm programming model support for CK

* GEMM pipeline update for wavelet progrmmaing model

* Updated wavelet programming pipeline

* fixes for global-write for math-wave

* fixed bug in global writes

* Updated comments for better readability

* fixed clang format errors

* added block_lds without barrier sync

* clean

* clean

* clean

* clean

* refactor

* prototype

4 layouts

fix default stride

all problem sizes

tidy

move file

update build script

restore old file

fix build

* refactor standalone test to use gemm test harness

* simplify gemm test

* update build script

* remove redundant

* early return when cmd arg doesn't match

* tidy

* report failure when result not validated

* tidy

* Add comment depicting B2C mapping pattern.

* Formatting & comments.

* Comparison with custom B2C mapping pattern.

* Example for wavelet gemm.

* Add wavelet to Gemm standalone test.

* Remove debug code.

* Remove dangling #endif directive.

Co-authored-by: root <Raman Jana>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Anthony Chang <ac.chang@outlook.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: 1cfa87608a]
2023-01-18 12:00:02 -06:00
ltqin
c87d5b6832 Add multiD Gemm client APIs (#534)
* start add example

* fix config

* fix showinfo bug

* add an elementop

* change to padding

* add xdl example

* change elementwiseop

* add instance

* add instance to profiler

* change file name

* fix deive not support issue

* add client example

* fix client gemm_add_multiply name

* change AddMultiply elementwiseop

* fix elementwiseop

* fix client example

* fix addmultiply op

* fix comments and fun name

Co-authored-by: letaoqin <letaoqin@amd.com>

[ROCm/composable_kernel commit: d66421fe34]
2023-01-18 11:53:56 -06:00
Illia Silin
b0c9e3340b fix a bug for 6-dim kernels (#555)
[ROCm/composable_kernel commit: 00ff30af8c]
2023-01-18 11:44:11 -06:00
who who who
2484caa010 add multi embeddings support (#542)
* add multi embeddings support

* fix format

* optimize sqrt

* add reduce operation

* change to elementwise op

* fix name

* rename

* run ci cd

* format example

* format code

* format code

[ROCm/composable_kernel commit: 147b7db561]
2023-01-18 11:32:12 -06:00
ltqin
26767954fd Add client API/examples for 3xGemm+Bias+Add+Permute{0, 2, 3, 1} (#550)
* add example

* fix example

* add instance for gemm permute

* add to client example

* change configs

* change instance file name

* formate

* change client example file name and remove example

[ROCm/composable_kernel commit: 55236709e2]
2023-01-18 10:52:52 -06:00
Qianfeng
46a0aceec1 Reduction external API and client examples (#493)
* Change to the DeviceReduce base class template to include all problem description information

* Add external api for reduction

* Add client example to test the reduction external api

* Spelling correction

* Re-implement the host_reduction to follow the DeviceReduce base API format

* Change the reduce profiler to call the external API for collecting device instances

* Rename reduce client example directory from 08_reduce to 12_reduce

* Remove (void) before the functional call

* Tiny update in reduce client example

* Tiny update in profile_reduce_impl.hpp

* Rename the reduce client example directory

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

[ROCm/composable_kernel commit: 80e0526741]
2023-01-16 22:18:06 -06:00
rocking5566
8d6f3a2b81 Gemm layernorm welford (#413)
* Add device op of gemm layernorm

* [What] Rename F to H
[Why] F and G prepare for welford tensor

* Add gridwise gemm + welford

* Extract template parameter

* Rename kernel. Prepare to add second half kernel

* Extract var

* Add second kernel for gemm+layernorm

* Move to the gemm_layernorm folder

* Rename F and G to mean and var

* Do not use snakeCurved, it makes determination of padding  for welford difficult

* Rewrite the device interface and rename some var

* Add welford count

* Update interface

* Sync code, prepare to test on MI200

* Clean the code

* Implement layernorm

* Add comment to mension hipFree

* Wrtie out the e for debug.
This could be remove and use h for instead

* 1. Allocate mean, var and count into by SetWorkSpacePointer.
2. Add GetWorkSpaceSize to calculate the space size

* Add gemm layernorm host code

* use reference layernorm

* Fix bug of blockwise welford for first kernel

* Fix bug of mean var padding for layernorm

* Use sgpr for shuffleM_index

* padding for GemmMeanVarCountGridDescriptor_M_NBlock

* Add layout parameter

* Check argument for gemm

* calculate max count for tail block

* Share E and H memory in device op

* Hard code the vector dim

* Refine the MakeDescriptor

* 1. Remove E parameter, because E is inside of device op
2. Check vector size

* [What] Rename MakeMeanVarDescriptor_M_N
[Why] Prepare to add count version of make descriptor

* Use 1D global memory for count

* Prevent redundant IO

* Update parameter

* Add pipeline v1/v2 selector

* Rename the example name

* Add base class for gemm layernorm

* Refine naming to distinguish naive and welford

* Add comment to explan in detail

* We don't need to pad in N dimension in gemm for mean/var/count. Set NPerTile 1

* Rewrite the 2st kernel, use multiple block along N dimension in layernorm kernel

* Share the vector size

* Refine var name

* [What] Force LayernormThreadSliceSize_N = vector size.
[Why] Memory coalesce

* Add comment

* Extract divisor out of the loop in reference layernorm

* Pad different size for E and H in layernorm kernel according to different block tile

* Refine naming

* Refine naming

* Prevent implicit cast

* [What] use ck::math::sqrt instead of __builtin_amdgcn_sqrtf
[Why] __builtin_amdgcn_sqrtf is only support float, double will cause casting

* Cast only constant

* Change of post shuffle thread descriptor

* Add EMeanVarDataType parameter.

* Merge the mean and var threadwise copy

* Add missing index

* Fix Typo

* Sync the variable with previous if

* 1. Declare e inside the host_gemm_layernorm()
2. Prevent implicit cast in reference code

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

[ROCm/composable_kernel commit: 7829d729fb]
2023-01-16 20:08:25 -06:00
Haocong WANG
9d5e41b586 [Navi3x-LWPCK-545] Block-wise GEMM + Real GEMM_WMMA_FP16 (#541)
* wmma_op + unit test

* add arch limitation to wmma test

* change arch limitation

* Refactor + Add all type unit test(int4 compile failed)

* Add f32_16x16x16_bf16 unit test

* tempsave

* tempsave

* tempsave

* runtime bug, cannot find symbol

* workaround for incorrect HIP warpSize return value

* debugging

* tempsave

* Correctness OK, waiting for optimization

* Tidy up + format

* temp save

* temp save, reproduce the v_bfi_b32 issue

* add inline asm for wmmaop test

* tidy up

* clean some debug purpose code

* discard some codes

* clang format

* clang format

* compiler issue fixed + increase tile size

[ROCm/composable_kernel commit: 919aeb1f52]
2023-01-16 20:06:01 -06:00
Illia Silin
dbc281041a Add a flag to enable/disable debug output in many kernels. (#549)
* add DEBUG_LOG macro to enable/disable debug output

* fix syntax

* fix syntax again

* fix syntax one more time

* remove balnk spaces

* use ifdefs

* add the Print argument

* move the definition of DEBUG_LOG to ck.hpp

* add the missign argument to Print()

[ROCm/composable_kernel commit: 715e8dd241]
2023-01-11 19:55:56 -06:00
Qianfeng
be8d157e6d Remove including of cmath (#551)
* Let cmath included when compiling host codes in math_v2.hpp

* Remove including of cmath in device_base.hpp and device_permute.hpp

[ROCm/composable_kernel commit: a17b041486]
2023-01-11 19:52:47 -06:00
zjing14
afa7c8eea1 Add MNK padding, M = 0 support into grouped_gemm (#539)
* add mnk padding, support m=0

* clean code

* clean code

Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com>

[ROCm/composable_kernel commit: 0345963eef]
2022-12-15 15:07:24 -06:00
Illia Silin
b0806dacbd disable the attention test that fails on MI100 (#540)
[ROCm/composable_kernel commit: 1115117503]
2022-12-15 10:20:21 -06:00
Qianfeng
990e8b78d2 Add interface GetTypeIdName() and GetTypeIdHashCode() for Device Op (#533)
[ROCm/composable_kernel commit: 10c72aced8]
2022-12-14 18:34:02 -06:00
Rostyslav Geyyer
9cdd223de5 Add padding device_gemm_add_add_fastgelu_xdl_c_shuffle instances to enable arbitrary problem size (#535)
* Add padding device_gemm_add_add_fastgelu_xdl_c_shuffle instances

* Add padding device_gemm_add_fastgelu_xdl_c_shuffle instances

* Add gemm_add_fastgelu profiler impl

* Add padding device_gemm_fastgelu_xdl_c_shuffle instances

* Add gemm_fastgelu profiler impl

[ROCm/composable_kernel commit: 9a1f2475e3]
2022-12-14 18:12:09 -06:00
Rostyslav Geyyer
46be71bf44 Add a docker hub doc file (#538)
[ROCm/composable_kernel commit: 74744cab3e]
2022-12-14 12:17:28 -08:00
arai713
2e92e52137 Gridwise elementwise 2d (#466)
* added 2d gridwise elementwise

* added 2d version of device elementwise

* added example file with updated device elementwise call

* added Cmake file

* changed NumDim into 2D

* fixed compiler issues

* fixed indexing for loop step

* fixed NumDim dimension error

* changed blockID to 2D

* updated Grid Desc

* updated kernel call

* fixed 2d thread indexing

* added dimensions for example file

* commented out unused code

* changed vector load

* removed extra code

* temporarily removing vector load on 2nd dim

* changed vector load back, still causing errors

* altered indexing

* changed isSupportedArgument for 2D

* changed indexing + do/while

* fixed isSupportedArgument

* changed dimension for debugging

* fixed

* added testing printouts

* testing change

* added variables to distribute threads through both dimensions

* testing changes

* integrated variable for thread distribution into device elementwise and added as parameter for gridwise elementwise

* removed most of the extraneous code, testing with different dimensions

* testing

* removed debugging print statements

* moved 2d elementwise permute into elementwise permute directory

* fixed formatting

* removed debugging comments from threadwise transfer

Co-authored-by: Jing Zhang <jizhan@amd.com>
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

[ROCm/composable_kernel commit: 0e5c264c3e]
2022-12-12 09:18:10 -06:00
Illia Silin
79c4b5d928 Make sure that GEMM sizes in K dimension are supported. (#527)
* apply new K-dimension check in gemm_xdl_cshuffle

* add K-dim check to gemm_xdl and batched_gemm_xdl

* fix syntax

* fix syntax

* clean-up the debug output

[ROCm/composable_kernel commit: d58b7f5155]
2022-12-08 11:48:43 -06:00
Po Yen Chen
e25360c38f Fix Grouped ConvBwdWeight test case failure (#524)
* Use smaller tensor size in test

* Use even more smaller tensor size

* Touch only failing test case inputs

[ROCm/composable_kernel commit: 614a7b1bb0]
2022-12-07 17:46:28 -06:00
Rostyslav Geyyer
7355f95afe Add padding device_gemm_xdl instances (#529)
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

[ROCm/composable_kernel commit: c7a4d36147]
2022-12-07 17:46:03 -06:00
guangzlu
db0a1032b1 modified half function in math_v2.hpp (#528)
Co-authored-by: Chao Liu <chao.liu2@amd.com>

[ROCm/composable_kernel commit: ce87b4f765]
2022-12-07 17:43:02 -06:00
Illia Silin
cbc6b1c823 Fix CI error. (#530)
* ignore .git folder when doing clang-format

* fix syntax

* add backslashes before quotes

* add path filter for several extensions

[ROCm/composable_kernel commit: d072790fe2]
2022-12-06 15:09:51 -06:00
Anthony Chang
96c07fc27d Fix bug where scaling may not be applied in some code path (#526)
* fix bug where scaling may not be applied in some code path

* more test

* revert accidental example code changes

[ROCm/composable_kernel commit: d156709432]
2022-12-02 11:43:34 -06:00
ltqin
621c12302f Add multiple d gridwise gemm on Navi21 for ResNet50 (#517)
* start add example

* add multiple d fp16 example

* device transfer elementwiseop to gridwise

* gridwise add multiple d

* change example for multiple d

* fix spill registers

* fix for passthrough element op

* fix int8 overflow

* change example file name

* add instance for dl multiple d

* example add DsDataType

* remove grouped_convolution_forward_dl.hpp

* add head file(was deleted before)

* fix not support device issue

* format

* remove passthrough check

Co-authored-by: letaoqin <letaoqin@amd.com>

[ROCm/composable_kernel commit: 23ecf0fa9e]
2022-12-02 11:42:31 -06:00
Haocong WANG
3baad464d0 [Navi3x-LWPCK-449] wmma_op + unit test (#484)
* wmma_op + unit test

* add arch limitation to wmma test

* change arch limitation

* Refactor + Add all type unit test(int4 compile failed)

* Add f32_16x16x16_bf16 unit test

* Remote int4 related

* delete deprecated test

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

[ROCm/composable_kernel commit: abf9cc6c5c]
2022-12-02 11:41:13 -06:00
Po Yen Chen
02db748e74 Modularize ckProfiler operations (#514)
* Re-structure ckProfiler source files

* Rename profiler.cpp to main.cpp

* Modularize ckProfiler operations

* Add description for profiler operations

* Use longer name to avoid name collision

* Use macro to delay expansion

* Use std::move() to avoid object copying

* Prohibit users from calling dtor

* Use macro to eliminate redundant code

* Make friend function hidden

* Add missing include directive <iostream>

* Fix wrong include directives

* Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test

Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>

[ROCm/composable_kernel commit: 8784a72e23]
2022-12-01 15:15:02 -06:00
rocking5566
8e868bf880 gemm, conv perchannel quantization (#503)
* Use gemm_multiple_D instead

* Add gemm bias relu quantization example

* Add pure gemm quantization example

* Add quantization of perchannel conv + bias + relu example

* Refine the code

* Rename multiplier to requant_scale

* Rename the folder

* Remove redundant comment

* Rename the file. Prepare to add perchannel

* Add conv perchannel instance

* Move to quantization folder

* Add conv perchannel client example

* Apply Rangify constructor of HostTensorDescriptor & Tensor<>

* Fix merge error

[ROCm/composable_kernel commit: ad541ad6b9]
2022-11-30 14:13:04 -06:00
Qianfeng
c3bb3db252 BatchNorm backward instance/external API/profiler/tests (#519)
* Refine the device batchnorm-backward base API templates and data type assignments

* Remove duplicated kernel file

* Add batchnorm backward instances and external API

* Add batchnorm-backward profiler and tests

* Add client example which uses batchnorm backward external API

* Merge test/batchnorm_fwd and test/batchnorm_bwd into one directory

* Loose the threshold for batchnorm-backward check_err()

[ROCm/composable_kernel commit: 63af525c06]
2022-11-30 13:32:20 -06:00
Anthony Chang
eae37a7b6f Fix split-k gemm test (#231)
* properly return error flag; reveals bug in split-k gemm

* fix bug in split k

* update split-k test case

Co-authored-by: Chao Liu <chao.liu2@amd.com>

[ROCm/composable_kernel commit: 236bd148b9]
2022-11-29 10:57:26 -06:00
fsx950223
26bbca370b fix GetTypeString
[ROCm/composable_kernel commit: 0e9c88cecf]
2022-11-29 14:18:10 +08:00
Qianfeng
c036714248 BatchNorm backward implementation (#461)
* Implemented batchnorm-backward Blockwise and Multiblock kernels

* Add batchnorm-backward device op

* Add batchnorm-backward host-reference op

* Add batchnorm-backward example

* Parameters renaming in batchnorm backward kernels and device op

* Change in the example to loose the threshold for ScaleDiff checking

* Add comments to explain the implementation of batchnorm-backward

* Parameters renaming again in batchnorm backward kernels

* Improve the expression calculation for performance

* Add batchnorm backward to README

* Add comments to explain inv-variance in batchnorm forward and backward

* Renaming the batchnorm forward training and inferring examples

* Add/update the comments for batchnorm-backward kernels

* Renaming again

* Add block_sync_lds between two consecutive blockwise reductions

* Move common expression 1/N out of the static_for loops

* Add dy_elementwise_op

* Renaming in backward example again

* Add checking for reduceDims in reference_batchnorm_backward

* Update to comments and codes format

* Rename in the comments

* Remove common expression out of the loop in reference_batchnorm_backward_nhwc_c

* Add block_sync_lds() between blockwise reduction again

* Fix comments again

* Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test

[ROCm/composable_kernel commit: 44789d992a]
2022-11-28 20:51:10 -06:00
Qianfeng
1a6febf03a Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test (#516)
[ROCm/composable_kernel commit: 5bf0475afd]
2022-11-28 14:33:00 -06:00
Qianfeng
144efbf9b6 BatchNorm forward instance/external api/profiler/tests/client example (#511)
* Update to device_batchnorm_forward base class to include all template parameters for problem description

* Add batchnorm forward instances and external api

* Add batchnorm forward profiler module which uses the external api

* Add some comments in batchnorm_forward example to explain the dimensions in lengths[]

* Replace the reference_batchnorm_forward_nhwc_c by generic reference_batchnorm_forward

* Improvement to the batchnorm infer base API

* Add batchnorm forward client example which shows using the batchnorm forward external API

* Add test for batchnorm forward

* Tuning the batchnorm profiler initialized values and error threshold

* Add support for bhalf_t in instances/external api/tests

* Add support for int8_t in instances/external api/tests

* Add support for double in instances/external api/tests

* Let ScaleDataType and BiasDataType be same as XDataType and YDataType when creating instances

* Checking before running best instance in batchnorm_fwd_nhwc client example

* Add checking for YElementwiseOp in batchnorm_forward external API

* Add more types in batchnorm forward profiler

* Add more test lengths

Co-authored-by: rocking5566 <ChunYu.Lai@amd.com>

[ROCm/composable_kernel commit: 4e6a5575be]
2022-11-24 18:02:27 -06:00
Adam Osewski
0f3d9639a8 Client examples AddFastGelu and FastGelu + instances. (#509)
* FastGelu support for more data types.

* AddFastGelu & FastGelu instances.

* Client example.

* clang-format

* Remove unused stride variable.

* Add new line at EOF.

Co-authored-by: Adam Osewski <aosewski@amd.com>

[ROCm/composable_kernel commit: 43a889b72e]
2022-11-19 22:08:26 -06:00
Anthony Chang
b607362c9a Work around develop validation failure (#513)
* workaround bf16 atten fwd issue on gfx908

* typo

[ROCm/composable_kernel commit: 892a8d769d]
2022-11-17 08:38:13 -08:00
guangzlu
3941d1e815 Add BF16 tests for batched_gemm_softmax_gemm_permute (#504)
* fixed bug in softmax reference & add bf16 examples for batched_gemm_scale_softmax_gemm

* added bf16 tests for batched_gemm_softmax_gemm_permute

* changed format of device_batched_gemm_softmax_gemm_permute_xdl_cshuffle_bf16_bf16_bf16_bf16_gmk_gnk_gno_gmo_instance.cpp

* changed format device_batched_gemm_softmax_gemm_permute_xdl_cshuffle_bf16_bf16_bf16_bf16_gmk_gnk_gno_gmo_instance.cpp

* aligned annotations

* modified CMakeLists for examples

* add common example code of fp16/bf16 version for batched_gemm_scale_softmax_gemm_xdl

* use macro to control the instances

* added macro control into instances

* clang-format some files

* changed error tolerance for bf16

* changed index for 10_elementwise_normalization

* fixed xdlops code bug in amd_xdlops.hpp

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

[ROCm/composable_kernel commit: 4c4c7328a6]
2022-11-15 16:30:23 -06:00
ltqin
3073d82b47 Add Conv Backward Data on Navi21 for ResNet50 (#499)
* start add example

* add device dl

* change launch kernel

* change init data method

* change example config

* add config valid check

* add instance for dl bwd

* add instance to ckProfiler

* reserver to profiler and cmakelist

* add instance to ckProfiler2

* change instance f32 config

* fix example return value

Co-authored-by: letaoqin <letaoqin@amd.com>
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

[ROCm/composable_kernel commit: db0eb1ea9c]
2022-11-15 16:22:20 -06:00
Po Yen Chen
2837e81e40 Avoid reporting unused member function error (#507)
[ROCm/composable_kernel commit: 7038723a46]
2022-11-14 19:54:37 -06:00
Po Yen Chen
e418b29268 Introduce ck::accumulate_n() (#439)
We can use this template to eliminate duplicated iterator computing
logics. By providing return type to ck::accumulate_n(), we can avoid
type conversion operations.

[ROCm/composable_kernel commit: 730204eed0]
2022-11-14 19:53:39 -06:00
Po Yen Chen
a8a4bdb756 Rangify STL algorithms (#438)
* Rangify STL algorithms

This commit adapts rangified std::copy(), std::fill() & std::transform()

* Re-write more std::copy() calls

* Re-write std::copy() calls in profiler

[ROCm/composable_kernel commit: dc663fae29]
2022-11-14 15:17:28 -06:00
Po Yen Chen
6b0cb67348 Rangify check_err() (#444)
* Rangify check_err()

By rangifying check_err(), we can not only compare values between
std::vector<>s, but also compare any ranges which have same value
type.

* Re-format example code

[ROCm/composable_kernel commit: b79bbbc22f]
2022-11-11 11:39:39 -06:00
Po Yen Chen
9d8396c05c Fix build errors on CI server (#506)
* Add missing ignore expression

* Add missing include directive

[ROCm/composable_kernel commit: 4382b41469]
2022-11-11 11:36:55 -06:00
Po Yen Chen
a4776782a5 Rangify constructor of HostTensorDescriptor & Tensor<> (#445)
* Rangify STL algorithms

This commit adapts rangified std::copy(), std::fill() & std::transform()

* Rangify check_err()

By rangifying check_err(), we can not only compare values between
std::vector<>s, but also compare any ranges which have same value
type.

* Allow constructing Tensor<> like a HostTensorDescriptor

* Simplify Tensor<> object construction logics

* Remove more unnecessary 'HostTensorDescriptor' objects

* Re-format example code

* Re-write more HostTensorDescriptor ctor call

[ROCm/composable_kernel commit: 4a2a56c22f]
2022-11-11 11:36:01 -06:00
Lauren Wrubleski
87ab07799d Add packages for examples and profiler (#502)
* Add packages for example and profiler

* correct TEST_NAME -> EXAMPLE_NAME

[ROCm/composable_kernel commit: 37f2e91832]
2022-11-10 13:19:33 -06:00
Po Yen Chen
acbe363156 Rangify FillUniformDistributionIntegerValue<> (#443)
Allow passing forward range to its call operator

[ROCm/composable_kernel commit: 6f0564f013]
2022-11-10 13:03:01 -06:00
guangzlu
4ff0f25c68 add client example for elementwise_normalization (#501)
* add client example for elementwise_normalization

* clang format elementwise_layernorm2d.cpp

* changed some naming to make it more understandable

* changed naming of input into ab_input

* fixed bug for threadwise_x_store

* add elementwise operation to reference

[ROCm/composable_kernel commit: 7045632885]
2022-11-10 12:30:36 -06:00