Commit Graph

2 Commits

Author SHA1 Message Date
zjing14
35a48e1bc6 add args for packed gemm (#54)
[ROCm/composable_kernel commit: 567f5e9c5f]
2021-11-24 12:33:55 -06:00
Chao Liu
b827099a27 FP16 data in-register transpose (#41)
* start fixing 16bit data packing

* adding StaticTensor

* adding StaticTensor

* adding StaticTensor

* add missing constexpr

* adding static tensor

* adding static tensor

* adding transpose

* add inline asm for transpose 2x2 of half_t

* add general transpose_vectors(), but have unnecessary register initialization using v_mov

* fix unnecessary register initialization in transpose_vector by using more pass-by-reference

* add hardcoded logic for NHWC wrw

* improve asm for v_pack

* make ThreadwiseTensorSliceTransfer_v3r2 support any tensor

* tweak

* reorganize file

[ROCm/composable_kernel commit: b491ebf384]
2021-11-15 10:05:58 -06:00