* enable use of rocm5.5 release candidate 4
* upgrade to ROCM5.5 RC5
* try fix the PUB_KEY error, remove the cmake-data package
* upgrade to latest cmake version
* use private dockerhub repo for rocm5.5 rc5
* add missing bracket
* add new parallel stage on navi node
* dont run performance tests on navi, get rid of 9110 compiler
* only run navi build when not doing QA
* fix syntax
* use navi21 label
* dont stash profiler on navi nodes, scp deb package to ginger
* disable tests on navi nodes
* test posting a binary to ginger
* add sshpass and use it to copy deb package
* fix the scp example
* fix syntax
* debug the scp issues
* add jenkins user to docker
* dont try whoami
* change jenkins uid and add user with uid=1002
* try scp from the last stage on micimaster
* rename and stash the package, scp from micimaster
* enable ccache and decouple it from MIOpen ccache use
* fix the ccache check script
* use another method to get server name
* fix syntax
* add quotes around the server name variable
* use check_host as function
* change syntax
* fix syntax
* test if server name is parsed correctly
* try different syntax
* check the env var value
* test new check node function
* add ROCMVERSION parameter and fix script syntax
* fix script syntax
* add missing instances of rocm version
* install ccache in the docker image
* do not check GPU in clang format stage, clean up old code
* update defaults and clean up
* add an option to select specific compiler commit
* change the logic of forcing building a docker
* add check for compiler commit in dockerfile
* compiler check syntax fix
* change compiler selection logic
* fix the new compiler build issue
* set new compiler as default, update dev-requirements
* fix jenkins syntax
* fix docker syntax
* get rid of hipcc.pl editing in jenkinsfile
* fix the hipcc.pl in both places
* try to fix the 10738 compiler linking bug
* fix syntax
* use dockerhub to store images
* use newer amd-stg-open commit as default
* build CK only once, use deb package in all subsequent stages
* update jenkins file
* change prefix for build_CK stage
* update writing deb metadata to control file
* update ubuntu source for docker, script syntax for deb package metadata
* try different way to create deb metadata
* clean up DEBIAN before creating one
* fix the CI folder names, fix splitK qa
* use correct docker in all stages, separate tests for splitK verification and performance
* clean old comments, change dir before packaging
* use different package syntax
* change packaging syntax
* package with cmake
* remove unnecessary build prefix
* get rid of unnecessary paths
* change paths during unpacking
* change script syntax while unpacking
* get rid of unneccesary steps
* get rid of comments in the scripts
* use double quotes for scripts
* add ccache during build, try dpkg -x
* pull and install each package separately
* use full package names
* try to use stashing for packages
* change stash/unstash syntax
* move unstash out of shell, run tests on any gpu node
* unpack each package separately
* try re-using existing workspace
* merge the build and test stages, only stash ckProfiler
* merge the build and test stages, only stash zipped ckProfiler
* fix syntax
* add GPU check before build and test, rename docker to usual name
* upgrade the OS and ROCM versions in CK docker
* add cxx flags to link code with rocm5.2 and ck-9110 compiler
* rename the docker image
* run ONNX gemms using init=1
* allow selecting compiler version
* fix typo
* add Wno-deprecated flag for google tests
* change git repo, fix qa log files names
* change the git clone syntax
* use Omkar's git credentials
* try to use jenkins as git user
* try using illsilin username for gerrit repo with ssh key
* try new gerrit authorization
* change ssh key syntax
* try another way of passing ssh key to docker
* add mount ssh in dockerfile
* create .ssh folder
* move ssh-keyscan to later
* get rid of npm call
* build first docker image on master
* check the contents of the .ssh folder
* try replacing omkars creds with gerrit creds
* use open repo, clean up changes
* get rid of ssh default argument
* turn on full qa only on gfx90a, use int initialization
* change script syntax
* update script parsing clinfo, throw exception if 0 devices
* fix syntax
* try using toBoolean for the QA conditions
* run regular CI on MI100 only, use MI200 only for daily QA
* evaluate when conditions before agent
* launch QA on develop branch and update profile_reduce script
* update test script
* update script
* remove false dependency from dockerfile
* try removing rbuild completely
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>
* adding scripts for full perf test suite
* uncomment the sql queries
* fix typo and chmod a+x for scripts
* dos2unix for all new scripts
* disable verification in full performance test
* fix reduction scripts, add gfrouped_gemm hotfix
* fix the grouped_gemm hotfix and only run reduction for fp16
* change compiler flag syntax
* fix syntax
* add predefinition of dockerArgs
* avoid redefinitions of dockerArgs
* add blank space at the end of dockerArgs
* try to build with release compiler
* adding spaces inside if condition
* limit the number of threads for building 9110 compiler
* change the way HIP_CLANG_PATH is set
* remove the export command
* change the conditional ENV syntax
* set HIP_CLANG_PATH at docker run time
* update scripts for full qa
* enable the sql write query
* fix typo
* remove a comment from a script
* Switch to standard ROCm packaging
* Revert .gitignore changes
* install new rocm-cmake version
* update readme
Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
* modify ckProfiler_gemm output
* fix syntax
* change ckProfiler output and return 0
* fix syntax
* output datatype
* fix syntax
* output datatype in another way
* fix syntax
* fix syntax
* test return values of ckProfiler
* add layout info and tests, make sure ckprofiler returns 0
* fix syntax
* change layout output
* fix syntax
* fix syntax again
* update script to process perf results
* rearrange jenkins stages
* fix typo
* add python packages to Docker file
* adding setuptools-rust package
* modify parsing for new test parameters
* test db credentials on jenkins
* fix syntax
* update python script to handle incomplete lines
* ungrade python to 3.8 and write the gemm_params table
* add sqlalchemy package to docker
* move perf data processing to master node
* move the master node inside a steps region
* add new stage for result processing
* move results processing to separate stage
* reduce number of tests to speedup debugging
* pass config to processPerfResults stage
* run script on master in a docker container
* replace show_node_info
* try loading docker on master node again
* use ansible node instead of master
* get rid of pymysql package
* try ssh connection using paramiko
* put back pymysql
* put the perf data processing back on the gpu node
* put back artifact definition
* archive the perf_log before parsing
* clean up jenkinsfile, fix parsing
* fix typo
* enable all perf tests
* put all stages in original order, finalize script
* fix gpu_arch version
* update parsing script
* remove obsolete file causing merge conflict
* Initial adding of generic reduction
* Initial adding of generic reduction ...
* Updates to make compiling done
* clang-format all files
* clang-format some files again
* Renaming in profiler/include/profile_reduce.hpp
* Updates and make BlockWise cases passed
* Updates and make ThreadWise and MultiBlockTwoCall cases passed
* Remove the support for MUL and NORM1 reduceOp from the profiler and the device instances
* Change to replace the dim0_max_vector_size/dim1_max_vector_size template argument in the device reduce classes
* format
* adding pooling
* added max and average pooling
* comment out cout and kernel timing
* Tiny simplification in profiler/reduce_profiler.cpp
* Add example for reduce_blockwise
* Tiny updates
* Change to pass the ElementWiseOp from device layer to kernel
* Fix the vectorDim and vectorSize in Device layer
* Enable vector load on both dim0 and dim1 for Threadwise method
* Tiny updates
* Change to let the user to pass the preUnaryOp and posUnaryOp
* Make pooling example work
* split device_reduce_instance into two libraries
* Tiny update
* Replace nanPropaOpt enum by boolean propagate_nan
* Simplification in DeviceReduce layer codes
* update build
* Change to clarify the difference between ck::half_t and half_float::half
* Renaming in all the reduction codes
* Add VectorSize as template parameter for device layer
* Add BetaIsZero as kernel template and as AccDataType for alpha
* print
* Small updates for pooling
* Updates for host_generic_reduction for reference
* Update to make AVG pooling pass
* Update to make MAX pooling with indices output pass
* fix
* add OutDst vector store to threadwise reduction and pooling
* tweak
* turn off check_indices that caused build issue
* refactor pooling
* clean up
* turn off check_indices for building issue for php-compiler
* add more tile size for odd C
* tweak conv for odd C
* update script
* clean up elementwise op
* add hack in reduction_operator.hpp to avoid compile error. To fix it, need to use element_wise_op in reduction op
* Add OutVectorSize as device and kernel tunable, also update to Elementwise Operations
* Move reduce operator mapping to host layer file reduction_operator_mapping.hpp from reduction_operator.hpp
* Change to the unary operators
* Move the definitions of unary operations to element_wise_operation.hpp
* re-org files
* Refine in device interfaces and multiblock kernels
* Split the reduction configurations into instances for specific methods
* Update in getTypeString() of device pool2d
* Renaming in host and kernel
* Tiny update in profiler/src/profiler.cpp
* Uncomment in device_operation/CMakeLists.txt to enable the building of all operations
* Make check_indices a templated function to remove some linking issue
* Renaming in the profiler reduce module
* Add support for double Reduction (but disable MultiblockAtomicAdd for double)
* Tiny correction of literal string
* Rename DevicePoolFwd to DevicePool2dFwd
* Split device_reduce_instance_xxx.cpp files according to the data types to speed up compiling
* Add comments for lists of configurations, lists of instances and references of add_reduce_instances_xxx
* Remove un-used header file gridwise_generic_reduction_wrapper_common.hpp
* Renaming and refining in the Reduction codes
* Tiny change in the unary operators
* Renaming symbols and files
* Renaming symbols in the kernels
* Move kernel kernel_set_buffer_value to separate file
* Add IndexDataType template parameter for kernels and use int32_t as index data type in device layer
* Tiny update in the kernels
* Remove definition of sqrtf()/isnan()/abs() for half_t due to some ADL issue
* Simplify a helper function in device layer
* Tiny adjustment in testing data initialization
* Renaming in kernel/device/host
* Add two testing scripts for reduction
* Refine the Unary operators in element_wise_operation.hpp
* Update in the reduce profiler module
* Update to the reduction testing scripts
* reduce compile parallelism
* change CI docker to rocm5.0
* remove unused variables
* fix build
Co-authored-by: Chao Liu <chao.liu2@amd.com>
* add docker file and make default target buildable
* add Jenkinsfile
* remove empty env block
* fix package stage
* remove render group from docker run
* clean up Jenkins file
* add cppcheck as dev dependency
* update cmake file
* Add profiler build stage
* add hip_version config file for reduction operator
* correct jenkins var name
* Build release instead of debug
* Update test CMakeLists.txt
reorg test dir
add test stage
* reduce compile threads to prevent compiler crash
* add optional debug stage, update second test
* remove old test target
* fix tests to return proper results and self review
* Fix package name and make test run without args
* change Dockerfile to ues rocm4.3.1
* remove parallelism from build
* Lower paralellism
Co-authored-by: Chao Liu <chao.liu2@amd.com>