* [CK] Add command option instance_index and param_mask to run partial ck test
Many CK test are instance test. it will loop all instance in the instance library. It causes test often out-of-time if we run test on simulator/emulator.
This PR add option instance_index and param_mask to reduce the workload of instance test
instance_index: only run test 1 available instance with specified index.
param_mask: filter the embedded parameter with specified mask
* fix CI error
* fix clang format
---------
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
* experiment with config file
* experiment with version.h config
* add more info to version.h
* minor updates
* minor updates
* fix case where DTYPE is not used
* large amount of files but minor changes
* remove white space
* minor changes to add more MACROs
* fix cmakedefine01
* fix issue with CK internal conflict
* fix define and define value
* fix clang-format
* fix formatting issue
* experiment with cmake
* clang format v12 to be consistent with miopen
* avoid clang-format for config file
* properly split conv_nd_bwd_data instances
* split conv2d_fwd instance data types
* split the gemm, conv2d_fwd and batched_gemm_softamx_gemm
* split the tests by data types where possible
* filter examples by DTYPES
* split few remaining examples by DTYPES
* filter most instances by DTYPES
* add new lines at end of headers, fix grouped_gemm profiler
* fix syntax
* split the ckprofiler instances by DTYPES
* split the conv2d and quantization DL and XDL instances
* fix the splitting of conv2d DL instances
* split softmax and pool_fwd tests for fp16 and fp32 types
* fix syntax
* fix the dl_int8 quantization instances isolation
* Add NumReduceDim template parameter to DeviceSoftmax and Softmax client API to simplify instances collecting
* Move the generic kernel instance to be the first of the instance list for elementwise op of normalization
* Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax
* Add testing of GetGenericInstance() in client_example of Softmax
* Revert "Add testing of GetGenericInstance() in client_example of Softmax"
This reverts commit f629cd9a93.
* Revert "Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax"
This reverts commit a9f0d000eb.
* Support generic kernel instance to be the first instance returned by GetInstances() for GroupNorm
* Move generic kernel instance to separate tuple for elementwise op of normalization
* Remove un-used files for softmax instance
* Store generic kernel instance to separate tuple for softmax
* Add IsSupported checking for generic instance to client example of softmax
* Replace the get_device_normalize_from_mean_meansquare_instances() by the DeviceOperationInstanceFactory class for elementwise-normalization
* clang-format fix
* Remove int8 from softmax instances
---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>
* Re-structure ckProfiler source files
* Rename profiler.cpp to main.cpp
* Modularize ckProfiler operations
* Add description for profiler operations
* Use longer name to avoid name collision
* Use macro to delay expansion
* Use std::move() to avoid object copying
* Prohibit users from calling dtor
* Use macro to eliminate redundant code
* Make friend function hidden
* Add missing include directive <iostream>
* Fix wrong include directives
* Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test
Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>
* Add reduction across all dims cases.
* host softmax: handle all reduce
* Test cases when reduced dim is not innermost axis.
* Fix syntax.
* Test non innermost dim for fp32 and int8
* Group test suites wrt NumReduceDim.
* Additionally test failing cases.
* Throw error when Rank or NumReduceDims doesn't match arguments.
* Check reducedDims has correct values
* Move don't reuse DeviceReduceMultiblock IsSupportedArgument method.
Instead implement own. (in fact just get rid of one check to enable
reduction across inner dimensions).
* Reorganize unit tests to better cover use scenarios.
* Test input validation
* Test reduction of inner dimensions with custom op instances.
* Refactor fp32 and int8 unit tests.
* Fix FP32 instance template parameters.
* Add more instances.
* Instances with InSrcVectorDim=0.
* Do not initialize and copy data when arg not supported.
* ckProfiler Softmax use instance factory.
* Refactor device softmax IsSupported.
* Additionally add non-polymorphic api functions
* Split softmax instances into multiple files.
* Fix profiler.
* Reorganize tests to reuse profiler and cover edge cases.
* Clang-format
* I8 Softmax instances along with UT.
* Reuse type alias definitions from instance factory header.
* Clean included headers
* Fix variable names.
* Add missing checks in Argument constructor.
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Anthony Chang <ac.chang@outlook.com>
* use 'sweep once' softmax kernel where applicable
* threadwise copy's dst buffer can specify invalid element value
* add int8 in/out float compute softmax support
give a bit of leeway for int absolute tolerance as there's a single data point of all test cases showing off-by-1 error
* format
* softmax inherits DeviceNormalization
* softmax profiler stub
* tighten up reference softmax interface
* example prints tensor dimension
* add fp32 to softmax profiler
* rename header
* hook with ckProfiler
* format
* resolve merge conflict
* resolve merge conflicts
* update normalization profiler help string
* resolve conflict
* typo
* remove residual
* softmax profiler: address feedback
* test for mixed precision input/output
* fully qualify ck::math::isnan
* add comment for device normalization interface
* revise wording
* constness for alpha/beta scaler pointer
* initial stub for standalone softmax
* start device_softmax_mk_to_mk as a wrapper to device_reduce_mk_to_m
* host softmax validates
* compiles; to implement beta scaling
* use NaN trick to efficiently ignore OOB values during sum of exponentials
* freeload device_reduce's utility functions
* clean up interface
* adding prior value (beta scaling)
* remove restriction related to perf considerations
* apply clang-format
* clean; disable diagnostics
* resolve conflicts
* add exp wrapper
* honor HostTensorDesc interface; allow implicit cast from different vector<T> type
* test softmax for fp16/fp32
* update readme
* amend commit NaN trick
* remove redundant param added during development
* format
* replace ScalarDataType with AccDataType
* separate out test programs by precision type
* move softmax sample code to its own folder
* format
* keep up with recent changes in reduction API
* remove extra header