Commit Graph

71 Commits

Author SHA1 Message Date
Illia Silin
1dd455d633 Update docker (#744)
* update dockerfile to build rocm5.6 rc3

* fix couple of docker issues
2023-06-07 09:35:14 -07:00
Illia Silin
9afa44d40b Switch to the new rocm5.6 compiler. (#681)
* switch to the new rocm5.6 compiler and docker

* fix syntax
2023-04-21 07:59:26 -07:00
Illia Silin
bb0b772da9 Allow using ROCm release candidate compilers. (#679)
* enable use of rocm5.5 release candidate 4

* upgrade to ROCM5.5 RC5

* try fix the PUB_KEY error, remove the cmake-data package

* upgrade to latest cmake version

* use private dockerhub repo for rocm5.5 rc5

* add missing bracket
2023-04-18 09:22:49 -07:00
Illia Silin
e6cda9f8ff Change the CI workflow. (#611)
* add new parallel stage on navi node

* dont run performance tests on navi, get rid of 9110 compiler

* only run navi build when not doing QA

* fix syntax

* use navi21 label

* dont stash profiler on navi nodes, scp deb package to ginger

* disable tests on navi nodes

* test posting a binary to ginger

* add sshpass and use it to copy deb package

* fix the scp example

* fix syntax

* debug the scp issues

* add jenkins user to docker

* dont try whoami

* change jenkins uid and add user with uid=1002

* try scp from the last stage on micimaster

* rename and stash the package, scp from micimaster
2023-03-02 11:24:31 -06:00
Illia Silin
f73574ffdd Fix CI issues. (#572)
* switch to recent staging compiler as default for CI

* fix the baseline query

* roll back sqlalchemy to version 1.4.46
2023-02-06 13:15:45 -06:00
Illia Silin
7fc3ed761a Allow setting ROCM version, activate cchache, etc. (#462)
* enable ccache and decouple it from MIOpen ccache use

* fix the ccache check script

* use another method to get server name

* fix syntax

* add quotes around the server name variable

* use check_host as function

* change syntax

* fix syntax

* test if server name is parsed correctly

* try different syntax

* check the env var value

* test new check node function

* add ROCMVERSION parameter and fix script syntax

* fix script syntax

* add missing instances of rocm version

* install ccache in the docker image

* do not check GPU in clang format stage, clean up old code

* update defaults and clean up
2022-10-01 18:48:19 -05:00
Illia Silin
b882554758 Fix build issues, set new compiler default, etc. (#451)
* add an option to select specific compiler commit

* change the logic of forcing building a docker

* add check for compiler commit in dockerfile

* compiler check syntax fix

* change compiler selection logic

* fix the new compiler build issue

* set new compiler as default, update dev-requirements

* fix jenkins syntax

* fix docker syntax

* get rid of hipcc.pl editing in jenkinsfile

* fix the hipcc.pl in both places

* try to fix the 10738 compiler linking bug

* fix syntax

* use dockerhub to store images

* use newer amd-stg-open commit as default
2022-09-27 15:26:56 -05:00
Illia Silin
85b0920dc8 Build the CK targets only once. (#433)
* build CK only once, use deb package in all subsequent stages

* update jenkins file

* change prefix for build_CK stage

* update writing deb metadata to control file

* update ubuntu source for docker, script syntax for deb package metadata

* try different way to create deb metadata

* clean up DEBIAN before creating one

* fix the CI folder names, fix splitK qa

* use correct docker in all stages, separate tests for splitK verification and performance

* clean old comments, change dir before packaging

* use different package syntax

* change packaging syntax

* package with cmake

* remove unnecessary build prefix

* get rid of unnecessary paths

* change paths during unpacking

* change script syntax while unpacking

* get rid of unneccesary steps

* get rid of comments in the scripts

* use double quotes for scripts

* add ccache during build, try dpkg -x

* pull and install each package separately

* use full package names

* try to use stashing for packages

* change stash/unstash syntax

* move unstash out of shell, run tests on any gpu node

* unpack each package separately

* try re-using existing workspace

* merge the build and test stages, only stash ckProfiler

* merge the build and test stages, only stash zipped ckProfiler

* fix syntax

* add GPU check before build and test, rename docker to usual name
2022-09-21 14:30:13 -05:00
Illia Silin
b22ebd4485 Upgrade the OS and ROCM versions. (#411)
* upgrade the OS and ROCM versions in CK docker

* add cxx flags to link code with rocm5.2 and ck-9110 compiler

* rename the docker image

* run ONNX gemms using init=1
2022-09-13 10:39:14 -05:00
Illia Silin
aba7fefce7 Fix QA, allow switching compiler versions, fix google test compilation error. (#348)
* allow selecting compiler version

* fix typo

* add Wno-deprecated flag for google tests

* change git repo, fix qa log files names

* change the git clone syntax

* use Omkar's git credentials

* try to use jenkins as git user

* try using illsilin username for gerrit repo with ssh key

* try new gerrit authorization

* change ssh key syntax

* try another way of passing ssh key to docker

* add mount ssh in dockerfile

* create .ssh folder

* move ssh-keyscan to later

* get rid of npm call

* build first docker image on master

* check the contents of the .ssh folder

* try replacing omkars creds with gerrit creds

* use open repo, clean up changes

* get rid of ssh default argument
2022-08-08 13:49:14 -05:00
Chao Liu
75ab874e02 Update Group convolution (#341)
* add conv oddC

* update example

* update example

* fix bug in example

* fix bug in group conv example
2022-08-03 12:28:33 -05:00
Illia Silin
984b3722bf Run CI on MI100 nodes only, run daily QA on MI200 nodes. (#339)
* turn on full qa only on gfx90a, use int initialization

* change script syntax

* update script parsing clinfo, throw exception if 0 devices

* fix syntax

* try using toBoolean for the QA conditions

* run regular CI on MI100 only, use MI200 only for daily QA

* evaluate when conditions before agent

* launch QA on develop branch and update profile_reduce script

* update test script

* update script

* remove false dependency from dockerfile

* try removing rbuild completely

Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>
2022-08-02 09:17:11 -05:00
Illia Silin
39acaea36d Add switch between compilers, make 9110 compiler default, add full QA scripts. (#322)
* adding scripts for full perf test suite

* uncomment the sql queries

* fix typo and chmod a+x for scripts

* dos2unix for all new scripts

* disable verification in full performance test

* fix reduction scripts, add gfrouped_gemm hotfix

* fix the grouped_gemm hotfix and only run reduction for fp16

* change compiler flag syntax

* fix syntax

* add predefinition of dockerArgs

* avoid redefinitions of dockerArgs

* add blank space at the end of dockerArgs

* try to build with release compiler

* adding spaces inside if condition

* limit the number of threads for building 9110 compiler

* change the way HIP_CLANG_PATH is set

* remove the export command

* change the conditional ENV syntax

* set HIP_CLANG_PATH at docker run time

* update scripts for full qa

* enable the sql write query

* fix typo

* remove a comment from a script
2022-07-13 09:27:43 -05:00
Liam Wrubleski
b653c5eb2e Switch to standard ROCm packaging (#301)
* Switch to standard ROCm packaging

* Revert .gitignore changes

* install new rocm-cmake version

* update readme

Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-06-25 09:35:16 -05:00
Illia Silin
1085794df3 Add performance tests as a stage of CI. (#247)
* modify ckProfiler_gemm output

* fix syntax

* change ckProfiler output and return 0

* fix syntax

* output datatype

* fix syntax

* output datatype in another way

* fix syntax

* fix syntax

* test return values of ckProfiler

* add layout info and tests, make sure ckprofiler returns 0

* fix syntax

* change layout output

* fix syntax

* fix syntax again

* update script to process perf results

* rearrange jenkins stages

* fix typo

* add python packages to Docker file

* adding setuptools-rust package

* modify parsing for new test parameters

* test db credentials on jenkins

* fix syntax

* update python script to handle incomplete lines

* ungrade python to 3.8 and write the gemm_params table

* add sqlalchemy package to docker

* move perf data processing to master node

* move the master node inside a steps region

* add new stage for result processing

* move results processing to separate stage

* reduce number of tests to speedup debugging

* pass config to processPerfResults stage

* run script on master in a docker container

* replace show_node_info

* try loading docker on master node again

* use ansible node instead of master

* get rid of pymysql package

* try ssh connection using paramiko

* put back pymysql

* put the perf data processing back on the gpu node

* put back artifact definition

* archive the perf_log before parsing

* clean up jenkinsfile, fix parsing

* fix typo

* enable all perf tests

* put all stages in original order, finalize script

* fix gpu_arch version

* update parsing script

* remove obsolete file causing merge conflict
2022-05-24 11:14:50 -05:00
JD
cec69bc3bc Add host API (#220)
* Add host API

* manually rebase on develop

* clean

* manually rebase on develop

* exclude tests from all target

* address review comments

* update client app name

* fix missing lib name

* clang-format update

* refactor

* refactor

* refactor

* refactor

* refactor

* fix test issue

* refactor

* refactor

* refactor

* upate cmake and readme

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-05-12 09:21:01 -05:00
JD
97d8c5045e Add gfx90a CI stage for tests (#208)
* Add gfx90a CI stage

* upgrade to ROCm 5.1 and fix formatting
2022-04-29 10:36:19 -05:00
JD
7353ec0c25 Fix clang-format (#189)
* Fix clang-format filepath

* update docker and fix format
2022-04-21 17:02:15 -05:00
Qianfeng
e17c0d8008 Reduction in Composable Kernel (#82)
* Initial adding of generic reduction

* Initial adding of generic reduction ...

* Updates to make compiling done

* clang-format all files

* clang-format some files again

* Renaming in profiler/include/profile_reduce.hpp

* Updates and make BlockWise cases passed

* Updates and make ThreadWise and MultiBlockTwoCall cases passed

* Remove the support for MUL and NORM1 reduceOp from the profiler and the device instances

* Change to replace the dim0_max_vector_size/dim1_max_vector_size template argument in the device reduce classes

* format

* adding pooling

* added max and average pooling

* comment out cout and kernel timing

* Tiny simplification in profiler/reduce_profiler.cpp

* Add example for reduce_blockwise

* Tiny updates

* Change to pass the ElementWiseOp from device layer to kernel

* Fix the vectorDim and vectorSize in Device layer

* Enable vector load on both dim0 and dim1 for Threadwise method

* Tiny updates

* Change to let the user to pass the preUnaryOp and posUnaryOp

* Make pooling example work

* split device_reduce_instance into two libraries

* Tiny update

* Replace nanPropaOpt enum by boolean propagate_nan

* Simplification in DeviceReduce layer codes

* update build

* Change to clarify the difference between ck::half_t and half_float::half

* Renaming in all the reduction codes

* Add VectorSize as template parameter for device layer

* Add BetaIsZero as kernel template and as AccDataType for alpha

* print

* Small updates for pooling

* Updates for host_generic_reduction for reference

* Update to make AVG pooling pass

* Update to make MAX pooling with indices output pass

* fix

* add OutDst vector store to threadwise reduction and pooling

* tweak

* turn off check_indices that caused build issue

* refactor pooling

* clean up

* turn off check_indices for building issue for php-compiler

* add more tile size for odd C

* tweak conv for odd C

* update script

* clean up elementwise op

* add hack in reduction_operator.hpp to avoid compile error. To fix it, need to use element_wise_op in reduction op

* Add OutVectorSize as device and kernel tunable, also update to Elementwise Operations

* Move reduce operator mapping to host layer file reduction_operator_mapping.hpp from reduction_operator.hpp

* Change to the unary operators

* Move the definitions of unary operations to element_wise_operation.hpp

* re-org files

* Refine in device interfaces and multiblock kernels

* Split the reduction configurations into instances for specific methods

* Update in getTypeString() of device pool2d

* Renaming in host and kernel

* Tiny update in profiler/src/profiler.cpp

* Uncomment in device_operation/CMakeLists.txt to enable the building of all operations

* Make check_indices a templated function to remove some linking issue

* Renaming in the profiler reduce module

* Add support for double Reduction (but disable MultiblockAtomicAdd for double)

* Tiny correction of literal string

* Rename DevicePoolFwd to DevicePool2dFwd

* Split device_reduce_instance_xxx.cpp files according to the data types to speed up compiling

* Add comments for lists of configurations, lists of instances and references of add_reduce_instances_xxx

* Remove un-used header file gridwise_generic_reduction_wrapper_common.hpp

* Renaming and refining in the Reduction codes

* Tiny change in the unary operators

* Renaming symbols and files

* Renaming symbols in the kernels

* Move kernel kernel_set_buffer_value to separate file

* Add IndexDataType template parameter for kernels and use int32_t as index data type in device layer

* Tiny update in the kernels

* Remove definition of sqrtf()/isnan()/abs() for half_t due to some ADL issue

* Simplify a helper function in device layer

* Tiny adjustment in testing data initialization

* Renaming in kernel/device/host

* Add two testing scripts for reduction

* Refine the Unary operators in element_wise_operation.hpp

* Update in the reduce profiler module

* Update to the reduction testing scripts

* reduce compile parallelism

* change CI docker to rocm5.0

* remove unused variables

* fix build

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-03-05 16:46:51 -06:00
JD
992f71e371 Update test CMakeLists to add new tests automatically and add Jenkins stage for tests (#88)
* add docker file and make default target buildable

* add Jenkinsfile

* remove empty env block

* fix package stage

* remove render group from docker run

* clean up Jenkins file

* add cppcheck as dev dependency

* update cmake file

* Add profiler build stage

* add hip_version config file for reduction operator

* correct jenkins var name

* Build release instead of debug

* Update test CMakeLists.txt
reorg test dir
add test stage

* reduce compile threads to prevent compiler crash

* add optional debug stage, update second test

* remove old test target

* fix tests to return proper results and self review

* Fix package name and make test run without args

* change Dockerfile to ues rocm4.3.1

* remove parallelism from build

* Lower paralellism

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-03-03 16:59:42 -06:00
JD
2778e99758 Initial Setup for CI (#86)
* add docker file and make default target buildable

* add Jenkinsfile

* remove empty env block

* fix package stage

* remove render group from docker run

* clean up Jenkins file

* add cppcheck as dev dependency

* update cmake file

* Add profiler build stage

* add hip_version config file for reduction operator

* correct jenkins var name

* Build release instead of debug

* clean up

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2022-02-18 21:44:11 -06:00