Commit Graph

657 Commits

Author SHA1 Message Date
Po Yen Chen
a670a5a092 Move literal ""_uz & ""_zu into namespace 'ck::literals' (#354)
* Move literal ""_uz & ""_zu into namespace 'literals'

* Move namespace 'literals' as 'ck::literals'
2022-08-12 17:48:35 -05:00
Rostyslav Geyyer
0c6ef7c14e Add example of conv_fwd_bias_relu_add for int4, int8, bfp16, fp16, and fp32 (#343)
* [LWPCK-359] Initial commit

* Working version for fp16, add results to readme

* Update according to PR #341

* Update results in readme

* Add fp32 example

* Add bf16 example

* Update fp16 and fp32 examples

* Add int8 example

* Add separate lengths and strides tensors for D tensors

Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
2022-08-12 15:30:27 -05:00
zjing14
35e49f2de6 add g; fixed strides (#355) 2022-08-12 15:22:39 -05:00
Illia Silin
de60d290b6 Build docker only once in CI, fix conv_bwd logfile names. (#353)
* build docker in separate stage

* build docker with only one prefix

* add parallel statement

* add docker repo url

* fix the name of perf_conv_bwd_data log file
2022-08-12 12:30:37 -05:00
Po Yen Chen
68b61504a3 Add examples for GEMM + AddAddFastGelu (data type: int8, bf16, fp32) (#340)
* Add always_false<> util to delay symbol resolution

* Use always_false<> to prevent trying instantiate unwanted method

* Add new specializations of AddAddFastGelu::operator() method

* Add GEMM + AddAddFastGelu examples for data types: int8, bf16, fp32

* Use floating point literal to simplify code

* Remove unnecessary capture in lambda expressions

* Extract fast GeLU calculation as standalone method

* Mark methods as 'constexpr'

* Add constraint for HostTensorDescriptor templated ctors

* Simplify HostTensorDescriptor ctor calls

* Add C++23 std::size_t literal suffix

* Use _uz suffix to shorten example code

* Remove unnecessary conversion to std::array<>

* Re-order include directives

* Remove C-style casting by literal suffix

* Remove unnecessary statements in main()

* Remove unused type parameter of always_false<>

* Remove unused include directive

* Exit main() by returning meaningful value

* Use 'if constexpr' to switch example flow

* Use std::is_same_v<> to shorten example code

* Add 'inline' specifier to literal functions

* Unify output methods in example

* Move common codes into .inc file

* Add type check in type_convert<>()

* Add type_convert<float>() before computation

* Merge AddAddFastGelu method specializations

* Remove always_false<>

* Add constraint to AddAddFastGelu::operator() parameter types
2022-08-11 17:31:28 -05:00
rocking5566
fdfd7eb597 ckProfiler for layernorm (#330)
* Refine parameter

* Add base class for layernorm

* Add layernorm instance

* Add layernorm to ckProfiler

* Remove redundant

* Add verification

* Fix compile error due to merge
2022-08-11 17:03:54 -05:00
zjing14
e08d68d25d Add batched/grouped_gemm contraction deviceOps (#349)
* convnd_fwd fp16 example

* update example

* update example

* update instance

* updating refernce conv

* update reference conv

* update conv fwd profiler

* update conv 1d and 3d instance

* update include path

* clean

* update profiler for conv bwd data and weight

* update conv bwd weight

* clean

* update conv example

* update profiler for conv bwd weight

* update ckprofiler for conv bwd data

* fix reference conv bwd data bug; update conv bwd data test

* update examples

* fix initialization issue

* update test for conv fwd

* clean

* clean

* remove test case too sensitive to error threshhold

* fix test

* clean

* fix build

* adding conv multiple d

* adding conv multiple D

* add matrix padder

* add gemm padding to convnd

* adding group conv

* update gemm multi-d

* refactor

* refactor

* refactor

* clean

* clean

* refactor

* refactor

* reorg

* add ds

* add bias

* clean

* add G

* adding group

* adding group

* adding group

* update Tensor

* clean

* update example

* update DeviceGemmMultipleD_Xdl_CShuffle

* update conv bwd-data and bwd-weight

* upate contraction example

* update gemm and batch gemm with e permute

* fix example build

* instance for grouped conv1d

* update example

* adding group conv instance

* update gemm bilinear instance

* update gemm+add+add+fastgelu instance

* update profiler

* update profiler

* update test

* update test and client example

* clean

* add grouped conv into profiler

* update profiler

* clean

* add test grouped conv, update all conv test to gtest

* update test

* change gemm_c_permute with contraction

* add grouped_contraction

* add contraction in group_gemm

* add example of grouped_gemm with contraction

* add example of grouped_contraction_bias_e_permute

* clean

* fixed ds

* add m3n2 m2n3 examples into gemm_bias_e_permute

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-08-10 12:20:29 -05:00
Illia Silin
aba7fefce7 Fix QA, allow switching compiler versions, fix google test compilation error. (#348)
* allow selecting compiler version

* fix typo

* add Wno-deprecated flag for google tests

* change git repo, fix qa log files names

* change the git clone syntax

* use Omkar's git credentials

* try to use jenkins as git user

* try using illsilin username for gerrit repo with ssh key

* try new gerrit authorization

* change ssh key syntax

* try another way of passing ssh key to docker

* add mount ssh in dockerfile

* create .ssh folder

* move ssh-keyscan to later

* get rid of npm call

* build first docker image on master

* check the contents of the .ssh folder

* try replacing omkars creds with gerrit creds

* use open repo, clean up changes

* get rid of ssh default argument
2022-08-08 13:49:14 -05:00
Chao Liu
146972f447 fix bug in gemm profiler (#344) 2022-08-07 12:23:32 -05:00
Chao Liu
75ab874e02 Update Group convolution (#341)
* add conv oddC

* update example

* update example

* fix bug in example

* fix bug in group conv example
2022-08-03 12:28:33 -05:00
Adam Osewski
fb0dc35861 CGEMM examples bf16, fp32, int8 (#332)
* Add int8 specialization for elementwise Add and Subtract.

* CGEMM examples bf16, fp32, int8

* Add convert reference output to CDataType.

* Skip BF16 data type during testing.

* Lower K value to get rid of accumulation error.

* Fix merge artifact.

* Fix changed function name: GetElementSpaceSize()

* Fix merge artifact.

Co-authored-by: Adam Osewski <aosewski@amd.com>
2022-08-02 14:52:27 -05:00
Illia Silin
984b3722bf Run CI on MI100 nodes only, run daily QA on MI200 nodes. (#339)
* turn on full qa only on gfx90a, use int initialization

* change script syntax

* update script parsing clinfo, throw exception if 0 devices

* fix syntax

* try using toBoolean for the QA conditions

* run regular CI on MI100 only, use MI200 only for daily QA

* evaluate when conditions before agent

* launch QA on develop branch and update profile_reduce script

* update test script

* update script

* remove false dependency from dockerfile

* try removing rbuild completely

Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>
2022-08-02 09:17:11 -05:00
Chao Liu
500fa99512 Clean up conv example, Instances, profiler and test (#324)
* convnd_fwd fp16 example

* update example

* update example

* update instance

* updating refernce conv

* update reference conv

* update conv fwd profiler

* update conv 1d and 3d instance

* update include path

* clean

* update profiler for conv bwd data and weight

* update conv bwd weight

* clean

* update conv example

* update profiler for conv bwd weight

* update ckprofiler for conv bwd data

* fix reference conv bwd data bug; update conv bwd data test

* update examples

* fix initialization issue

* update test for conv fwd

* clean

* clean

* remove test case too sensitive to error threshhold

* fix test

* clean

* fix build

* adding conv multiple d

* adding conv multiple D

* add matrix padder

* add gemm padding to convnd

* adding group conv

* update gemm multi-d

* refactor

* refactor

* refactor

* clean

* clean

* refactor

* refactor

* reorg

* add ds

* add bias

* clean

* add G

* adding group

* adding group

* adding group

* update Tensor

* clean

* update example

* update DeviceGemmMultipleD_Xdl_CShuffle

* update conv bwd-data and bwd-weight

* upate contraction example

* update gemm and batch gemm with e permute

* fix example build

* instance for grouped conv1d

* update example

* adding group conv instance

* update gemm bilinear instance

* update gemm+add+add+fastgelu instance

* update profiler

* update profiler

* update test

* update test and client example

* clean

* add grouped conv into profiler

* update profiler

* clean

* add test grouped conv, update all conv test to gtest

* update test
2022-07-29 18:19:25 -05:00
Illia Silin
85978e0201 comment out cron trigger (#334) 2022-07-22 13:52:10 -05:00
zjing14
d7d7829096 Batched Gemm with multiD (#329)
* add batched_gemm_multiD

* add ds

* rename file

* add batched_gemm_bias example

* add batch_strides into bmm_c_permute

* clean

* rename example_28 to example_29

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-07-22 09:33:50 -05:00
Illia Silin
d8415a96b3 Add full QA with verification option, few other changes. (#331)
* add verify flag and update scripts

* replace old check_error function with the new check_err

* fix syntax

* remove blank spaces

* remove empty line

* add check_err for tensors

* fix syntax

* replace tensors with vectors in check_err calls

* fix syntax

* remove blank spaces

* fix syntax

* add new line at end of file

* disable conv2d_bwd_weight test, add gpu check

* set check_gpu using export

* check GPU using runShell

* add definition of runShell

* fix script syntax

* reduce the number of threads, add full qa option

* run processing scripts in bash

* fix the branch and host names in performance scripts, add chronos

* replace parameterizedCron with cron

* archive the perf log files

* try to fix git call

* pass branch and host names as arguments into scripts

* fix script arguments

* fix script arguments

* process results on master

* fix pipeline

* add definition of gpu_arch

* run processing scripts in docker

* fix the brackets

* add agent master for the processing stage

* get rid of show_node_info call on master

* try using mici label instead of master, disable MI100 tests for now

* fix syntax

* simplify container for results processing

* remove node(master) from the process_results stage

* put all stages in original order

* change the agent label from master to mici for gfx908
2022-07-21 15:25:46 -05:00
zjing14
7959dad566 Grouped Gemm device with multiD grid (#319)
* replace gridwise_v2r3 with multiD

* adjust parameters

* add instances

* fixed test_grouped_gemm

* fix standalone softmax race condition around blockwise reduction

* fixed ci

* fixed comment: remove redundant workspace

* use instanceFactory

* add test layout

* add empty Ds

* add bias example

* use array

* sperate examples

Co-authored-by: Anthony Chang <ac.chang@outlook.com>
2022-07-21 10:07:01 -05:00
Anthony Chang
a11680cce6 fix standalone softmax race condition around blockwise reduction (#323) 2022-07-14 22:52:45 -05:00
rocking5566
7f21662089 Standalone layernorm (#315)
* Implement layernorm kernel and deviceOp

* verify gpu kernel with host code

* 1. Separate gamma aand beta from affine
2. Check if argument is valid

* clean

* Sync the naming

* Support sweep once mode if we can put k dimension data inside one block

* [What] Get length from upper length.
[Why] if we get length directly, we may get length after padding.

* We only use one block in K dimension.
Hence, we can simplify the indexing of global R/W.

* Use 1d descriptor for gamma and beta

* Add accElementwiseOp

* Extract layernorm host code

* Support different YVectorDim in GridwiseLayernorm

* Rename XSrcVectorDim to XYSrcVectorDim. Because we use same parameter in deviceOp

* Gamma and beta can share the VGPR.

* Add test for fp32 and fp16

* Fix bug of concurrency and add test case which may fail orignally

* Propagate NaN for layernorm

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-07-13 11:16:14 -05:00
Daming Feng
c5620ed0ca minor fix in gemm client example (#328) 2022-07-13 10:54:38 -05:00
Illia Silin
39acaea36d Add switch between compilers, make 9110 compiler default, add full QA scripts. (#322)
* adding scripts for full perf test suite

* uncomment the sql queries

* fix typo and chmod a+x for scripts

* dos2unix for all new scripts

* disable verification in full performance test

* fix reduction scripts, add gfrouped_gemm hotfix

* fix the grouped_gemm hotfix and only run reduction for fp16

* change compiler flag syntax

* fix syntax

* add predefinition of dockerArgs

* avoid redefinitions of dockerArgs

* add blank space at the end of dockerArgs

* try to build with release compiler

* adding spaces inside if condition

* limit the number of threads for building 9110 compiler

* change the way HIP_CLANG_PATH is set

* remove the export command

* change the conditional ENV syntax

* set HIP_CLANG_PATH at docker run time

* update scripts for full qa

* enable the sql write query

* fix typo

* remove a comment from a script
2022-07-13 09:27:43 -05:00
Po Yen Chen
639147432b GEMM pipeline v2 (#317)
* format

* improving pipeline

* fix typo

* format

* adding thread group

* adding thread group

* adding thread group

* adding gemm pipeline

* tweak

* refactor

* refactor

* add missing type convert

* refactor

* refactor

* refactor

* clean

* fix build

* refactor

* format

* clean up

* use remove_cvref_t

* clean

* use pipeline_v2 for gemm kernel

* Remove inconsistent indent

* Fix compilation errors due to incomplete merge process

* Add missing include directives

* Fix compilation errors in currently unused files

* Add license in newly added files

* Re-format touched files by clang-format-10

* Fix wrong template argument count of DeviceGemm<>

* Use language construct to choose between types

* Use language construct to choose GEMM example instance

* Fix compilation error due to interface change

* Re-use type alias to avoid duplication

* Unify type alias usage in source file

* Only use v2 pipeline in one gridwise GEMM type

* Remove no-longer used include directives

* Add static_assert() to check pipeline type requirements

* Revert "Add static_assert() to check pipeline type requirements"

This reverts commit f0985f0a13.

* clean

* clean

* clean

* clean

Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: shaojiewang <wsjmessi@163.com>
2022-07-08 15:55:14 -05:00
Shaojie WANG
763ca61581 add conv1d/3d bwd weight instances (#318)
* add conv1d/3d bwd weight instances

* add profiler code
2022-07-08 15:42:20 -05:00
Chao Liu
4fe9c393b8 N-D Tensor Contraction example, instance, and client example (#270)
* adding contraction

* add contraction example

* update examle

* update example

* format

* update readme

* clean header

* clean header

* contraction with multiple D

* rename

* fix naming issue; add instances for contraction+bilinear

* change assumed virtual layout of contraction; add client example

* update example

* update

* contraction+scale

* use type_convert

* rename
2022-07-07 14:31:11 -05:00
zjing14
334361cbde Batched Gemm with C Permute (#305)
* init commit

* add c_permute

* add mnk padding

* fixed comments

* Fixed comments

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-07-06 10:38:29 -05:00
Chao Liu
9e4429f9c3 Gemm+Bilinear (#316)
* refactor

* update example

* update example

* gemm bilinear

* clean

* update
2022-07-02 09:15:38 -05:00
guangzlu
8e374781d5 modified grouped gemm addressing method (#307)
* modified grouped gemm addressing method

* modified addressing method in device_grouped_gemm_xdl.hpp

Co-authored-by: root <root@dc-smc-13.amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-07-01 01:38:21 -05:00
Anthony Chang
63fd5da637 Single-kernel GEMM + layernorm (#263)
* dump lds content in appropriate precision type

* add squared add reduction op; allows sq sum

* initial stub from regular gemm impl

* layernorm example code & host verification

* initial layernorm implementation

* tidy up

* make C0 precision type consistent with C

* clang-tidy and additional comments

* tighten up example code

* account for extra flops/bytes from normalization

* clang-format

* c0 bias/beta/gamma now have its own precision type

* AccElemOp for gemm outputs prior to feeding to layernorm

* update workgroup mapping

* rename kernel template param to reflect its dual use

* use LDS mem pool for reduction workspace

* change cshuffle precision type to f16; clean up

* clang-format

* correct naming

* explicit cast

* fully implemented gemm + bias + activation + add + norm

* activation in correct order

* reflect reduction API's recent change

* amend

* clean up; add comment

* keep up with recent changes in reduction API

* format

* resolve merge conflicts

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-07-01 01:38:00 -05:00
zjing14
1c8126a4c2 add batch_stride into batched gemm (#314)
* add batch_stride

* fixed test

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-07-01 01:35:37 -05:00
Chao Liu
0dcb3496cf Improve external interface for GEMM and GEMM+add+add+fastgelu (#311)
* interface for GEMM and GEMM+add+add+fastgelu

* rename namespace

* instance factory

* fix build

* fix build; add GEMM client example

* clean
2022-06-30 22:11:00 -05:00
zjing14
fa9a0a5cfb Gemm + bias + c_permute (#312)
* init commit

* add desc

* finished c permute

* fixed vector lens
2022-06-30 19:55:09 -05:00
zjing14
ab6c82c984 Grouped Gemm ckProfiler hotfix (#313)
* add setWorkspace in profiler

* fix
2022-06-30 16:37:37 -05:00
Anthony Chang
93c99f3d87 Standalone sweep once softmax kernel w/ ckProfiler (#295)
* use 'sweep once' softmax kernel where applicable

* threadwise copy's dst buffer can specify invalid element value

* add int8 in/out float compute softmax support

give a bit of leeway for int absolute tolerance as there's a single data point of all test cases showing off-by-1 error

* format

* softmax inherits DeviceNormalization

* softmax profiler stub

* tighten up reference softmax interface

* example prints tensor dimension

* add fp32 to softmax profiler

* rename header

* hook with ckProfiler

* format

* resolve merge conflict

* resolve merge conflicts

* update normalization profiler help string

* resolve conflict

* typo

* remove residual

* softmax profiler: address feedback

* test for mixed precision input/output

* fully qualify ck::math::isnan

* add comment for device normalization interface

* revise wording

* constness for alpha/beta scaler pointer
2022-06-30 12:08:50 -05:00
Liam Wrubleski
eccf8773a6 Remove incorrect old packaging statement (#308) 2022-06-30 09:40:03 -05:00
rocking5566
12235112a1 external api for gemm + layernorm (#285)
* Extract base class for elementwise

* Refactor interface of DeviceGemmReduce. Do not use tuple in interface

* [What] Rename d into reduce in gemm + reduction related code
[Why] Prepare to add d term for add

* Unify base class of gemm + reduce and gemm + bias + add + reduce

* 1. Rename gemm_bias_add_reduce for external api
 2. Refine cmake

* Add normalize device operation

* [What] Reorder the argument
[Why] Because d0 is also the input of c.

* Add type string

* Add example of gemm_bias_add_layernorm  via external api

* Refactor example code

* clang-format

* Fix compile error

* clang-format

* Add external api for gemm_add_add_layernorm and normalize

* Add client example

* clang-format
2022-06-27 14:25:10 -05:00
Chao Liu
aebd211c36 External Interface (#304)
* add client example

* clean

* clean

* reorg

* clean up profiler

* reorg

* clea

* fix profiler

* function for getinstances

* update client example

* update client example

* update client example

* update

* update example

* update Jenkins file

* update cmake

* update Jenkins
2022-06-26 19:39:02 -05:00
Liam Wrubleski
b653c5eb2e Switch to standard ROCm packaging (#301)
* Switch to standard ROCm packaging

* Revert .gitignore changes

* install new rocm-cmake version

* update readme

Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-06-25 09:35:16 -05:00
Chao Liu
d3051d7517 add license in file (#303) 2022-06-24 23:32:43 -05:00
Chao Liu
d1db6a0c3e Absolute include path (#281)
* ad gelu and fast_gelu

* added GeLU and fast GeLU

* clean up

* add gemm+fastgelu example

* add gemm+gelu instances

* update profiler

* clean up

* clean up

* adding gemm+bias+activation

* clean

* adding bias

* clean

* adding gemm multiple d

* debugging

* add gemm bias add fastgelu

* rename, clean

* refactoring; add readme

* refactor

* refactor

* refactor

* refactor

* refactor

* refactor

* fix

* fix

* update example

* update example

* rename

* update example

* add ckProfiler

* clean

* clean

* clean

* clean

* add client app example

* update readme

* delete obselete files

* remove old client app

* delete old file

* cleaning

* clean

* remove half

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path for all examples

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* revert client app example

* clean build

* fix build

* temporary disable client test on Jenkins

* clean

* clean

* clean
2022-06-24 20:51:04 -05:00
Chao Liu
a49115b95e update license (#297)
* update license

* update license

* update license

* update license
2022-06-23 01:27:30 -05:00
Adam Osewski
a2edd7d802 Testing all fwd convolution specializations. (#259)
* UniforFill with integer values.

* Log tested instance type string.

* Add UT for all convolution specializations.

* debugging conv

* Fix dangling reference bug.

* Small refinements.

* Fix call to error checking function.

* Small refinements to tests.

* Configure error tolerance
* Change problem size.
* Remove OddC case from types that do not support it.

* Add helper traits for AccumulatorDataType.

* Print first 5 errs in check_err for integral types.

* Rename FillUniform to FillUniformDistribution

* Refactor

* Do not use typed tests.
* Instead use plain fixture class with templatized member functions.
* Initialize tensors with integer values.

* Refine test instances.

* Properly set accumulator data type.
* Add another "big" instance.

* Refactor convolution tests.

* Revert "debugging conv"

This reverts commit b109516455.

* Add pragma once + format + small refinement.

* Fix some unwanted changes.

* Clang-format

* Fix profile_convnd to use renamed tensor initializer.

* Add instances for ConvFWDND kernel case 2D

* Helpers to get ConvNDFwd 2D instances.

* Refactoring.

* Remove "small block" instance as it was generating compiler errors.
* Remove default template parameters values.

* Refine and fix test.

* Fix problem with default template parameter types.
* Adjust error thresholds for floating point values test.
* Use integer values initialization for instances test.
* Add tests for ConvNDFwd 2D case.

* Remove AccumulatorDataType type trait.

* Update unit-tests.

* Remove operator<< overload.

* Unlock conv1d/3d nd fwd instances.

* Enable skipping calculating reference using flag.

* Fix number of channels for first ResNet50 layer.

* Clang-format.

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-06-22 22:05:04 -05:00
Shaojie WANG
4634b12043 fix Issue 291 (#294)
* rename for typeconvert functor

* refine code
2022-06-21 17:10:56 -05:00
Anthony Chang
15c89e81f0 Standalone softmax kernel (#284)
* initial stub for standalone softmax

* start device_softmax_mk_to_mk as a wrapper to device_reduce_mk_to_m

* host softmax validates

* compiles; to implement beta scaling

* use NaN trick to efficiently ignore OOB values during sum of exponentials

* freeload device_reduce's utility functions

* clean up interface

* adding prior value (beta scaling)

* remove restriction related to perf considerations

* apply clang-format

* clean; disable diagnostics

* resolve conflicts

* add exp wrapper

* honor HostTensorDesc interface; allow implicit cast from different vector<T> type

* test softmax for fp16/fp32

* update readme

* amend commit NaN trick

* remove redundant param added during development

* format

* replace ScalarDataType with AccDataType

* separate out test programs by precision type

* move softmax sample code to its own folder

* format

* keep up with recent changes in reduction API

* remove extra header
2022-06-21 14:59:19 -05:00
Chao Liu
be60d60d7a Create MIT LICENSE (#229)
* Create LICENSE

* add contributors, add license into config.hpp

* update
2022-06-21 14:55:56 -05:00
Anthony Chang
1ae241092f bring up to date with the usage of __builtin_amdgcn_sched_barrier (#293) 2022-06-21 10:15:31 -05:00
Chao Liu
ccbd8d907b update readme and script (#290) 2022-06-20 23:34:32 -05:00
Chao Liu
56adf7e9cc GEMM with Multiple Source, GEMM+Bias+Add+FastGeLU example and ckProfiler (#241)
* ad gelu and fast_gelu

* added GeLU and fast GeLU

* clean up

* add gemm+fastgelu example

* add gemm+gelu instances

* update profiler

* clean up

* clean up

* adding gemm+bias+activation

* clean

* adding bias

* clean

* adding gemm multiple d

* debugging

* add gemm bias add fastgelu

* rename, clean

* refactoring; add readme

* refactor

* refactor

* refactor

* refactor

* refactor

* refactor

* fix

* fix

* update example

* update example

* rename

* update example

* add ckProfiler

* clean

* clean

* clean

* clean

* add comment

* use type_convert

* clean

* clean element wise op
2022-06-19 03:07:28 -05:00
Illia Silin
e4584d91ac Don't look up the /sys/module/amdgpu/version file. (#287)
* use pre-built docker instead of building a new one

* try docker.image.pull

* change syntax in docker.image()

* add 30 min timeout

* increase timeout to 3 hours

* move performance tests to first stage for testing

* set image variable to the new container name

* update image name

* check available images

* check available images in both places

* try different image name

* use image ID to refer to image

* run performance on gfx90a

* fix the gpu_arch labeling, add parameter

* move env vars out of stages

* add stand-alone performance script, MI200 tests, CU numbers

* dos2unix for run_perf_tests.sh

* try the new git credentials

* use env var for git credentials

* don't look up /sys/module/amdgpu/version

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-06-17 15:11:21 -05:00
Qianfeng
1f543bfa79 Regulate reduction accumulator operations and Element-wise operations (#274)
* Remove template from Reducton operation classes and add template to their operator() and GetIdentityValue() interfaces

* Change to unary elementwise operators and the reduce_unary_operator (class for mapping) and dependent variations in all host layers

* Remove the data type template parameter from reduce_binary_operator (class for mapping) and dependent variations in host layers

* Add InMemoryDataOperatonSupportedOnDataType to check the matching between data type and InMemoryDataOperation

* Use struct-scope operator template instantiation for binary and unary element-wise operations

* Change a few more elementwise operations to use template for operator()

* Tiny correction in Normalize operator

* Add static_assert to check the data type appliability for some reduction accumulator and element-wise operatons

* Correction in some examples with regard to using ReduceAccDataType

* Use static_assert for UnaryDivide

* Update to merged codes to use Element-wise operations and Reduction Accumulator operations correctly

* Tiny fix with regard to SetWorkSpacePointer()
2022-06-17 15:10:25 -05:00
Shaojie WANG
63cdd92398 use universal workspace pointer in bwd-weight (#286) 2022-06-17 14:11:20 -05:00