Commit Graph

4 Commits

Author SHA1 Message Date
Illia Silin
cc19431b2c Run CI on MI100 nodes only, run daily QA on MI200 nodes. (#339)
* turn on full qa only on gfx90a, use int initialization

* change script syntax

* update script parsing clinfo, throw exception if 0 devices

* fix syntax

* try using toBoolean for the QA conditions

* run regular CI on MI100 only, use MI200 only for daily QA

* evaluate when conditions before agent

* launch QA on develop branch and update profile_reduce script

* update test script

* update script

* remove false dependency from dockerfile

* try removing rbuild completely

Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>

[ROCm/composable_kernel commit: 984b3722bf]
2022-08-02 09:17:11 -05:00
Illia Silin
6ee5aa4d12 Add Benchmark test into CI (#226)
* add performance test to jenkins pipeline

* fix typo

* fix the syntax in conv_fwd_util.cpp

* fix the error message syntax spacing

* fix the error message syntax spacing again

* run profile_gemm and archive results

* fix typo

* try to figure out the paths

* try to figure out the paths one more time

* skip the copying step

* build ckProfiler release only once

* change directory using dir

* fix dir syntax

* change the gemm parameters

* do not pipe script output to file

* try running ckProfiler directly

* fix typo

* use set +e

* run profile_gemm.sh || true

* run multiple gemms and parse results

* fix typo in jenkinsfile

* fix syntax

* add new gemm sizes, update scripts

* put all jenkins steps in original order

Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>

[ROCm/composable_kernel commit: a3c910ac6c]
2022-05-08 02:44:18 -05:00
zjing14
35a48e1bc6 add args for packed gemm (#54)
[ROCm/composable_kernel commit: 567f5e9c5f]
2021-11-24 12:33:55 -06:00
Chao Liu
b827099a27 FP16 data in-register transpose (#41)
* start fixing 16bit data packing

* adding StaticTensor

* adding StaticTensor

* adding StaticTensor

* add missing constexpr

* adding static tensor

* adding static tensor

* adding transpose

* add inline asm for transpose 2x2 of half_t

* add general transpose_vectors(), but have unnecessary register initialization using v_mov

* fix unnecessary register initialization in transpose_vector by using more pass-by-reference

* add hardcoded logic for NHWC wrw

* improve asm for v_pack

* make ThreadwiseTensorSliceTransfer_v3r2 support any tensor

* tweak

* reorganize file

[ROCm/composable_kernel commit: b491ebf384]
2021-11-15 10:05:58 -06:00