Chao Liu b8b2d0a6d1 DL GEMM fp32/fp16/int8 (#41)
* add threadwise copy the copy a tensor in one copy, added kpack to DL GEMM

* add kpack into fwd v4r5 nchw fp32
2021-07-04 22:50:29 -05:00
2021-07-04 22:50:29 -05:00
2021-03-25 13:51:11 -05:00
2021-07-01 14:33:00 -05:00
2018-10-08 22:49:58 -05:00
2021-07-01 14:33:00 -05:00
Description
[DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror
MIT 234 MiB
Languages
C++ 93.1%
Python 4.5%
CMake 1.5%
Shell 0.5%
Pawn 0.2%