* Few small fixes.
* New GroupedGemm instances (BF16)
* Unify and refactor GroupedGEMM device API.
* Adapt changes to new API.
* Adapt grouped gemm profiler.
* Accept multiple kbatches for grouped gemm profiler.
- delete obsolete two stage as it is now covered by grouped gemm
* Update unit test for grouped gemm.
* Fix thresholds for BF16 and F8. Unblock tests.
* Fix few instances.
* Multiple small fixes.
* Adapt to new API, check dynamic casting.
* Uncomment few data types in grouped gemm profiler.
* Fix call to SetDeviceArgs.
* Fix profile grouped gemm multiply tile loop.
* Fix grouped gemm tile loop kernel args in client examples.
* Review comments.
* experiment with config file
* experiment with version.h config
* add more info to version.h
* minor updates
* minor updates
* fix case where DTYPE is not used
* large amount of files but minor changes
* remove white space
* minor changes to add more MACROs
* fix cmakedefine01
* fix issue with CK internal conflict
* fix define and define value
* fix clang-format
* fix formatting issue
* experiment with cmake
* clang format v12 to be consistent with miopen
* avoid clang-format for config file
* properly split conv_nd_bwd_data instances
* split conv2d_fwd instance data types
* split the gemm, conv2d_fwd and batched_gemm_softamx_gemm
* split the tests by data types where possible
* filter examples by DTYPES
* split few remaining examples by DTYPES
* filter most instances by DTYPES
* add new lines at end of headers, fix grouped_gemm profiler
* fix syntax
* split the ckprofiler instances by DTYPES
* split the conv2d and quantization DL and XDL instances
* fix the splitting of conv2d DL instances
* split softmax and pool_fwd tests for fp16 and fp32 types
* fix syntax
* fix the dl_int8 quantization instances isolation
* Grouped gemm + Gelu instances.
* Device Instance Factory for GroupedGemm+Gelu
* Client example
* Rangify fill helper functions.
* Fix name clash.
* Profiler for grouped_gemm+gelu
* No need to use full namespace name.
* Add check for MRaw divisible by vector load.
* Ugly fix for big errors.
* Add grouped_gemm+gelu to profiler CMakelists.
* Store in argument additional info.
* Information about Mraw, Nraw, Kraw values.
* Use FastGelu instead of Gelu.
* Change client ex to use FastGelu
* Remove relaxed error precision.
* Remove duplicate output elementwise-op
---------
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
* Re-structure ckProfiler source files
* Rename profiler.cpp to main.cpp
* Modularize ckProfiler operations
* Add description for profiler operations
* Use longer name to avoid name collision
* Use macro to delay expansion
* Use std::move() to avoid object copying
* Prohibit users from calling dtor
* Use macro to eliminate redundant code
* Make friend function hidden
* Add missing include directive <iostream>
* Fix wrong include directives
* Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test
Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>
* add intrin_mfma_f64_16x16x4f64
* add example
* gemm reference add double data type
* chang init data
* fix M N PerXdlops
* fix ifdef
* add comparsion config
* add conv fwd example
* format log out
* change rc matrix egister layout
* reorganize example
* reorganize example 2
* format,because merge develop
* fix call impl adding acc data type
* lost ;
* add compiler warning
* change example tunning parameters
* add test for fp64
* add instance
* add test/gemm/gemm_fp64.cpp
* fix get name issue
* remove some tunning parameter
* fix conflict
* format
* use integer value for GEMM test
* add acc data type
* remove typeid because fp16
* fix streamconfig etc bug from merging develop
* format
* remove test_gemm_xdl_fp64
* add AccDataType
* AccDataType problem
Co-authored-by: qinletao <letaoqin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
* modify ckProfiler_gemm output
* fix syntax
* change ckProfiler output and return 0
* fix syntax
* output datatype
* fix syntax
* output datatype in another way
* fix syntax
* fix syntax
* test return values of ckProfiler
* add layout info and tests, make sure ckprofiler returns 0
* fix syntax
* change layout output
* fix syntax
* fix syntax again
* update script to process perf results
* rearrange jenkins stages
* fix typo
* add python packages to Docker file
* adding setuptools-rust package
* modify parsing for new test parameters
* test db credentials on jenkins
* fix syntax
* update python script to handle incomplete lines
* ungrade python to 3.8 and write the gemm_params table
* add sqlalchemy package to docker
* move perf data processing to master node
* move the master node inside a steps region
* add new stage for result processing
* move results processing to separate stage
* reduce number of tests to speedup debugging
* pass config to processPerfResults stage
* run script on master in a docker container
* replace show_node_info
* try loading docker on master node again
* use ansible node instead of master
* get rid of pymysql package
* try ssh connection using paramiko
* put back pymysql
* put the perf data processing back on the gpu node
* put back artifact definition
* archive the perf_log before parsing
* clean up jenkinsfile, fix parsing
* fix typo
* enable all perf tests
* put all stages in original order, finalize script
* fix gpu_arch version
* update parsing script
* remove obsolete file causing merge conflict
* init of grouped_gemm
* 2 gemm test
* perf test
* clean
* wrap desc into a struct
* test cast static_arr to pointer
* add ptr to GemmDesc
* add grouped gemm profiler
* fixed mem issue with unique_ptr
* clean
* clean
* finished ckprofiler
* Update README.md
* readme
* fixed readme
* add example
* improve code
* fixed comments: reserve, seperate ptr and gemm_shapes
* merge group and non-group
* fixed comments: replace push_back with emplace_back to avoid copy constructor
* fixed comments: unified blk2ctile; add test
* ci fix
* fixed ci
* fixed ci
* fixed ci