Files
composable_kernel/example/22_cgemm
myamlak 7b1e2c379e Multi-kernel CGEMM (#230)
* Reference CGEMM + test stub

* Format.

* Incomplete simple implementation

* Library instances

* Sketch of tests

* Test fixes.

* Example added

* Cosmetics

* Add elementwise operation kernel and example

* Add comment

* Add template argument of dim . Prepare to support multiple dimension

* Rename example

* Support 1 dimension

* Add static assert

* Add comment

* Second auxiliary buffer added

* Extract pad

* Remove redundant argument

* Support any dimension for elementwise operation

* Remove line

* Let it be the multiple number of CU

* Move thread per block to the parameter of constructor

* Consuming binary ops to do A+B / A-B

* Fix + cosmetics + bf16 test commented out temporarily

* Format

* Enabling bf16 test

* Revert "Enabling bf16 test"

This reverts commit f497e2ba44.

* Fix + test reenabled

* fix build

* Revert "fix build"

This reverts commit d73102384b.

* post PR #235 merge fix

* amend

* Single workspace for cgemm + helper

* Perf calc fix

* Review remarks: static_cast

* Review remarks: binary ops templated

* Cleaning

* Removal of instances and their tests

* Review remarks from aosew addressed

* Review remark: unnecessary attribute

* Post-merge fixes

* Restrict 4gemm to PassThrough + bug fix

* Review remarks

* update licence

* change cgemm example to fp16

Co-authored-by: rocking <chunylai@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Anthony Chang <ac.chang@outlook.com>
2022-05-31 10:20:55 -05:00
..
2022-05-31 10:20:55 -05:00
2022-05-31 10:20:55 -05:00