Chao Liu
b491ebf384
FP16 data in-register transpose (#41)
* start fixing 16bit data packing
* adding StaticTensor
* adding StaticTensor
* adding StaticTensor
* add missing constexpr
* adding static tensor
* adding static tensor
* adding transpose
* add inline asm for transpose 2x2 of half_t
* add general transpose_vectors(), but have unnecessary register initialization using v_mov
* fix unnecessary register initialization in transpose_vector by using more pass-by-reference
* add hardcoded logic for NHWC wrw
* improve asm for v_pack
* make ThreadwiseTensorSliceTransfer_v3r2 support any tensor
* tweak
* reorganize file
2021-11-15 10:05:58 -06:00
..
2021-10-06 11:12:36 -05:00
2021-10-19 18:42:34 -05:00
2021-11-15 10:05:58 -06:00
2021-10-06 11:12:36 -05:00
2021-10-27 09:39:18 -05:00
2021-10-19 18:42:34 -05:00
2021-10-27 09:39:18 -05:00
2021-10-19 18:42:34 -05:00
2021-10-27 09:39:18 -05:00
2021-08-11 00:08:42 +00:00
2021-11-14 11:28:32 -06:00
2021-08-30 22:49:17 -05:00
2021-11-14 11:28:32 -06:00
2021-08-10 23:45:36 +00:00
2021-08-11 09:42:53 -05:00
2021-10-06 11:12:36 -05:00
2021-10-06 11:12:36 -05:00
2021-10-06 11:12:36 -05:00
2021-10-06 11:12:36 -05:00
2021-10-06 11:12:36 -05:00
2021-10-06 11:12:36 -05:00
2021-10-06 11:12:36 -05:00
2021-10-06 11:12:36 -05:00
2021-08-11 00:08:42 +00:00
2021-08-11 00:08:42 +00:00
2021-08-11 00:08:42 +00:00
2021-08-11 00:08:42 +00:00
2021-08-11 00:08:42 +00:00
2021-11-14 11:28:32 -06:00
2021-10-27 09:39:18 -05:00