Commit Graph

2203 Commits

Author SHA1 Message Date
Ville Pietilä
9d5e5f7188 Remove obsolete packed cast tensor slice transfers. 2025-09-03 11:00:41 +00:00
Ville Pietilä
70d57ca8b9 Remove separate packed cast step. 2025-09-03 10:57:16 +00:00
Ville Pietilä
d56d7bc821 Use fused packed cats. 2025-09-02 12:00:21 +00:00
Ville Pietilä
d22ec6633a Fused packed cast improvements. 2025-09-01 14:57:42 +00:00
Ville Pietilä
c69539fe3c Optimize LDS write order for packed cast. 2025-08-28 15:01:55 +00:00
Ville Pietilä
9f66d9fbca Packed cast improvement. 2025-08-27 14:32:00 +00:00
Ville Pietilä
b82e68c45f Bug fix. 2025-08-27 14:30:13 +00:00
Ville Pietilä
f1a7cfba26 Packed cast imrovement. 2025-08-27 14:23:05 +00:00
Ville Pietilä
90b90dff08 Vectorized packed cast. 2025-08-27 13:29:38 +00:00
Ville Pietilä
54302c6f77 Remove obsolete version of the packed cast. 2025-08-27 11:28:55 +00:00
Ville Pietilä
481df169f2 Add packed cast to gridwise gemm multi d. 2025-08-27 11:27:03 +00:00
Ville Pietilä
6092643e9b Performat packed cast implementation. 2025-08-27 10:10:18 +00:00
Ville Pietilä
cfbd669455 WIP: Vectorized access. 2025-08-26 12:43:38 +00:00
Ville Pietilä
2302ea9bc6 Add the vectorized option for packed cast. 2025-08-26 12:31:01 +00:00
Ville Pietilä
905cfb6623 Use thread scratch buffer in bf16 conversion. 2025-08-26 09:33:50 +00:00
Ville Pietilä
a26f66171b Add more unit tests. 2025-08-26 09:15:50 +00:00
Ville Pietilä
1b858936cb WIP: Packed cast using threadwise sratch. 2025-08-25 15:31:50 +00:00
Ville Pietilä
4e7f9f7908 Fix packed cast tensor slice transfer. 2025-08-22 10:26:59 +00:00
Ville Pietilä
de93a48b04 Add back the separate packed cast step. 2025-08-20 11:31:07 +00:00
Ville Pietilä
6fbe1895f1 Code clean-up. 2025-08-20 08:19:13 +00:00
Ville Pietilä
b9694086f6 Remove obsolete test. 2025-08-20 08:03:25 +00:00
Ville Pietilä
0d34572f20 Code clean-up. 2025-08-19 18:37:01 +00:00
Ville Pietilä
79fbb63d57 Improve tensor slice transfer tests. 2025-08-19 13:53:44 +00:00
Ville Pietilä
19439dc88a Consolidate tensor slice transfer tests. 2025-08-18 12:10:29 +00:00
Ville Pietilä
b48ae7e447 Add perf test. Fix packed bf16 cast implementation. 2025-08-18 11:37:14 +00:00
Ville Pietilä
6b2b5e7c7c Small optimization to the packed cast. 2025-08-18 06:39:26 +00:00
Ville Pietilä
d7c681f2f2 Add more tests for packed cast. 2025-08-18 06:39:04 +00:00
Ville Pietilä
c0b8f66674 Add packed cast pipeline into gridwise gemm xdlops bwd weight. 2025-08-18 06:10:14 +00:00
Ville Pietilä
00a3ce734a Integrate new packed cast threadwise tensor slice transfer into gridwise gemm pipelines. 2025-08-15 12:06:44 +00:00
Ville Pietilä
6374e16a43 Improve tensor slice transfer test. 2025-08-15 11:18:10 +00:00
Ville Pietilä
51af3d7bac Fix a bug in the packed cast threadwise transfer. 2025-08-15 10:41:06 +00:00
Ville Pietilä
8bf579a191 Improve sequence test. 2025-08-15 07:51:40 +00:00
Ville Pietilä
62c66a7d9c WIP: packed bf16 cast v3. 2025-08-14 12:39:18 +00:00
Ville Pietilä
938ff298b4 Add more unit tests. 2025-08-14 11:33:02 +00:00
Ville Pietilä
3ecc8aae74 Add unit test for vectorized packed cast. 2025-08-14 08:41:35 +00:00
Ville Pietilä
bf47c623b3 Add unit tests for vector_type. 2025-08-14 06:20:36 +00:00
Ville Pietilä
ade741dd45 WIP: PackedCast v3. 2025-08-13 15:13:35 +00:00
Ville Pietilä
11baf3de0c Added an integration test for a tensor slice transfer. 2025-08-13 13:35:07 +00:00
Ville Pietilä
50e318e072 Fix logging. 2025-08-12 15:53:00 +00:00
Ville Pietilä
ae4c727bc5 Add packed bf16 cast for universal GEMM. 2025-08-12 15:52:49 +00:00
Ville Pietilä
cee7644c85 Working version 2 of the packed cast. 2025-08-12 12:46:01 +00:00
Ville Pietilä
6148d1c75f WIP: Packed cast v2. 2025-08-11 15:18:30 +00:00
Ville Pietilä
e701f8fac1 Time kernels in testing. 2025-08-11 10:16:37 +00:00
Ville Pietilä
b9a8dbc720 Fix analysis script. 2025-08-11 10:08:32 +00:00
Ville Pietilä
39e7ae88e3 Performance analysis script. 2025-08-11 09:54:16 +00:00
Ville Pietilä
c675563468 Addlogging and specific unit tests for bf16 and gfx950. 2025-08-08 08:59:12 +00:00
Ville Pietilä
c47b80580d Fix build issues when __gfx950__ macro is enabled. 2025-08-08 08:01:42 +00:00
Ville Pietilä
4b8a559da9 Fixed packed_cast implementation for slice access. 2025-08-06 11:04:29 +00:00
Ville Pietilä
44202b9d32 WIP: Integration of packed cast into gridwise_gemm_xdl_cshuffle_conv_v3. 2025-08-05 15:12:36 +00:00
Ville Pietilä
e92c0bf68e Initial integaration of packed cast. 2025-08-04 15:34:35 +00:00