Commit Graph

2 Commits

Author SHA1 Message Date
Rostyslav Geyyer
aa932445ea Add a gpu gemm reference kernel (#1528)
* Add a gpu gemm reference kernel

* Switch to gpu reference in gemm examples

* Remove redundant arguments

* Update all related examples

* Update more examples

* Try less threads per block

* Try even less threads per block

* Add support for all matrix layouts

* Increase block size

* Clean up

* Remove hardcoded strides

* Clean up

* Try a column-major case

* Revert back to row-major

* Run both CPU and GPU veriffication

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-10-08 11:05:28 -05:00
zjing14
38ada109ea add an example of customized type convert - bfp16_rtn (#869)
* add an example of customized bfp16_rtn

* fixed threadwise_copy

---------

Co-authored-by: Jing Zhang <jizha@amd.com>
2023-08-29 12:31:24 -05:00