Commit Graph

5 Commits

Author SHA1 Message Date
Aviral Goel
004784ef98 chore(copyright) update library wide CMakeLists.txt copyright header template (#3313)
* chore(copyright) update library wide CMakeLists.txt files copyright header template

* Fix build

---------

Co-authored-by: Sami Remes <samremes@amd.com>
2025-11-28 13:49:54 -08:00
Adam Osewski
6d8614ee50 Softmax unit-test reduction across all and non innermost dims cases. (#406)
* Add reduction across all dims cases.

* host softmax: handle all reduce

* Test cases when reduced dim is not innermost axis.

* Fix syntax.

* Test non innermost dim for fp32 and int8

* Group test suites wrt NumReduceDim.

* Additionally test failing cases.

* Throw error when Rank or NumReduceDims doesn't match arguments.

* Check reducedDims has correct values

* Move don't reuse DeviceReduceMultiblock IsSupportedArgument method.
Instead implement own. (in fact just get rid of one check to enable
reduction across inner dimensions).

* Reorganize unit tests to better cover use scenarios.

* Test input validation
* Test reduction of inner dimensions with custom op instances.

* Refactor fp32 and int8 unit tests.

* Fix FP32 instance template parameters.

* Add more instances.

* Instances with InSrcVectorDim=0.

* Do not initialize and copy data when arg not supported.

* ckProfiler Softmax use instance factory.

* Refactor device softmax IsSupported.

* Additionally add non-polymorphic api functions

* Split softmax instances into multiple files.

* Fix profiler.

* Reorganize tests to reuse profiler and cover edge cases.

* Clang-format

* I8 Softmax instances along with UT.

* Reuse type alias definitions from instance factory header.

* Clean included headers

* Fix variable names.

* Add missing checks in Argument constructor.

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Anthony Chang <ac.chang@outlook.com>
2022-11-02 16:46:08 -06:00
Chao Liu
500fa99512 Clean up conv example, Instances, profiler and test (#324)
* convnd_fwd fp16 example

* update example

* update example

* update instance

* updating refernce conv

* update reference conv

* update conv fwd profiler

* update conv 1d and 3d instance

* update include path

* clean

* update profiler for conv bwd data and weight

* update conv bwd weight

* clean

* update conv example

* update profiler for conv bwd weight

* update ckprofiler for conv bwd data

* fix reference conv bwd data bug; update conv bwd data test

* update examples

* fix initialization issue

* update test for conv fwd

* clean

* clean

* remove test case too sensitive to error threshhold

* fix test

* clean

* fix build

* adding conv multiple d

* adding conv multiple D

* add matrix padder

* add gemm padding to convnd

* adding group conv

* update gemm multi-d

* refactor

* refactor

* refactor

* clean

* clean

* refactor

* refactor

* reorg

* add ds

* add bias

* clean

* add G

* adding group

* adding group

* adding group

* update Tensor

* clean

* update example

* update DeviceGemmMultipleD_Xdl_CShuffle

* update conv bwd-data and bwd-weight

* upate contraction example

* update gemm and batch gemm with e permute

* fix example build

* instance for grouped conv1d

* update example

* adding group conv instance

* update gemm bilinear instance

* update gemm+add+add+fastgelu instance

* update profiler

* update profiler

* update test

* update test and client example

* clean

* add grouped conv into profiler

* update profiler

* clean

* add test grouped conv, update all conv test to gtest

* update test
2022-07-29 18:19:25 -05:00
Anthony Chang
93c99f3d87 Standalone sweep once softmax kernel w/ ckProfiler (#295)
* use 'sweep once' softmax kernel where applicable

* threadwise copy's dst buffer can specify invalid element value

* add int8 in/out float compute softmax support

give a bit of leeway for int absolute tolerance as there's a single data point of all test cases showing off-by-1 error

* format

* softmax inherits DeviceNormalization

* softmax profiler stub

* tighten up reference softmax interface

* example prints tensor dimension

* add fp32 to softmax profiler

* rename header

* hook with ckProfiler

* format

* resolve merge conflict

* resolve merge conflicts

* update normalization profiler help string

* resolve conflict

* typo

* remove residual

* softmax profiler: address feedback

* test for mixed precision input/output

* fully qualify ck::math::isnan

* add comment for device normalization interface

* revise wording

* constness for alpha/beta scaler pointer
2022-06-30 12:08:50 -05:00
Anthony Chang
15c89e81f0 Standalone softmax kernel (#284)
* initial stub for standalone softmax

* start device_softmax_mk_to_mk as a wrapper to device_reduce_mk_to_m

* host softmax validates

* compiles; to implement beta scaling

* use NaN trick to efficiently ignore OOB values during sum of exponentials

* freeload device_reduce's utility functions

* clean up interface

* adding prior value (beta scaling)

* remove restriction related to perf considerations

* apply clang-format

* clean; disable diagnostics

* resolve conflicts

* add exp wrapper

* honor HostTensorDesc interface; allow implicit cast from different vector<T> type

* test softmax for fp16/fp32

* update readme

* amend commit NaN trick

* remove redundant param added during development

* format

* replace ScalarDataType with AccDataType

* separate out test programs by precision type

* move softmax sample code to its own folder

* format

* keep up with recent changes in reduction API

* remove extra header
2022-06-21 14:59:19 -05:00