Ville Pietilä
|
9d5e5f7188
|
Remove obsolete packed cast tensor slice transfers.
|
2025-09-03 11:00:41 +00:00 |
|
Ville Pietilä
|
70d57ca8b9
|
Remove separate packed cast step.
|
2025-09-03 10:57:16 +00:00 |
|
Ville Pietilä
|
d56d7bc821
|
Use fused packed cats.
|
2025-09-02 12:00:21 +00:00 |
|
Ville Pietilä
|
d22ec6633a
|
Fused packed cast improvements.
|
2025-09-01 14:57:42 +00:00 |
|
Ville Pietilä
|
c69539fe3c
|
Optimize LDS write order for packed cast.
|
2025-08-28 15:01:55 +00:00 |
|
Ville Pietilä
|
9f66d9fbca
|
Packed cast improvement.
|
2025-08-27 14:32:00 +00:00 |
|
Ville Pietilä
|
b82e68c45f
|
Bug fix.
|
2025-08-27 14:30:13 +00:00 |
|
Ville Pietilä
|
f1a7cfba26
|
Packed cast imrovement.
|
2025-08-27 14:23:05 +00:00 |
|
Ville Pietilä
|
90b90dff08
|
Vectorized packed cast.
|
2025-08-27 13:29:38 +00:00 |
|
Ville Pietilä
|
54302c6f77
|
Remove obsolete version of the packed cast.
|
2025-08-27 11:28:55 +00:00 |
|
Ville Pietilä
|
481df169f2
|
Add packed cast to gridwise gemm multi d.
|
2025-08-27 11:27:03 +00:00 |
|
Ville Pietilä
|
6092643e9b
|
Performat packed cast implementation.
|
2025-08-27 10:10:18 +00:00 |
|
Ville Pietilä
|
cfbd669455
|
WIP: Vectorized access.
|
2025-08-26 12:43:38 +00:00 |
|
Ville Pietilä
|
2302ea9bc6
|
Add the vectorized option for packed cast.
|
2025-08-26 12:31:01 +00:00 |
|
Ville Pietilä
|
905cfb6623
|
Use thread scratch buffer in bf16 conversion.
|
2025-08-26 09:33:50 +00:00 |
|
Ville Pietilä
|
a26f66171b
|
Add more unit tests.
|
2025-08-26 09:15:50 +00:00 |
|
Ville Pietilä
|
1b858936cb
|
WIP: Packed cast using threadwise sratch.
|
2025-08-25 15:31:50 +00:00 |
|
Ville Pietilä
|
4e7f9f7908
|
Fix packed cast tensor slice transfer.
|
2025-08-22 10:26:59 +00:00 |
|
Ville Pietilä
|
de93a48b04
|
Add back the separate packed cast step.
|
2025-08-20 11:31:07 +00:00 |
|
Ville Pietilä
|
6fbe1895f1
|
Code clean-up.
|
2025-08-20 08:19:13 +00:00 |
|
Ville Pietilä
|
b9694086f6
|
Remove obsolete test.
|
2025-08-20 08:03:25 +00:00 |
|
Ville Pietilä
|
0d34572f20
|
Code clean-up.
|
2025-08-19 18:37:01 +00:00 |
|
Ville Pietilä
|
79fbb63d57
|
Improve tensor slice transfer tests.
|
2025-08-19 13:53:44 +00:00 |
|
Ville Pietilä
|
19439dc88a
|
Consolidate tensor slice transfer tests.
|
2025-08-18 12:10:29 +00:00 |
|
Ville Pietilä
|
b48ae7e447
|
Add perf test. Fix packed bf16 cast implementation.
|
2025-08-18 11:37:14 +00:00 |
|
Ville Pietilä
|
6b2b5e7c7c
|
Small optimization to the packed cast.
|
2025-08-18 06:39:26 +00:00 |
|
Ville Pietilä
|
d7c681f2f2
|
Add more tests for packed cast.
|
2025-08-18 06:39:04 +00:00 |
|
Ville Pietilä
|
c0b8f66674
|
Add packed cast pipeline into gridwise gemm xdlops bwd weight.
|
2025-08-18 06:10:14 +00:00 |
|
Ville Pietilä
|
00a3ce734a
|
Integrate new packed cast threadwise tensor slice transfer into gridwise gemm pipelines.
|
2025-08-15 12:06:44 +00:00 |
|
Ville Pietilä
|
6374e16a43
|
Improve tensor slice transfer test.
|
2025-08-15 11:18:10 +00:00 |
|
Ville Pietilä
|
51af3d7bac
|
Fix a bug in the packed cast threadwise transfer.
|
2025-08-15 10:41:06 +00:00 |
|
Ville Pietilä
|
8bf579a191
|
Improve sequence test.
|
2025-08-15 07:51:40 +00:00 |
|
Ville Pietilä
|
62c66a7d9c
|
WIP: packed bf16 cast v3.
|
2025-08-14 12:39:18 +00:00 |
|
Ville Pietilä
|
938ff298b4
|
Add more unit tests.
|
2025-08-14 11:33:02 +00:00 |
|
Ville Pietilä
|
3ecc8aae74
|
Add unit test for vectorized packed cast.
|
2025-08-14 08:41:35 +00:00 |
|
Ville Pietilä
|
bf47c623b3
|
Add unit tests for vector_type.
|
2025-08-14 06:20:36 +00:00 |
|
Ville Pietilä
|
ade741dd45
|
WIP: PackedCast v3.
|
2025-08-13 15:13:35 +00:00 |
|
Ville Pietilä
|
11baf3de0c
|
Added an integration test for a tensor slice transfer.
|
2025-08-13 13:35:07 +00:00 |
|
Ville Pietilä
|
50e318e072
|
Fix logging.
|
2025-08-12 15:53:00 +00:00 |
|
Ville Pietilä
|
ae4c727bc5
|
Add packed bf16 cast for universal GEMM.
|
2025-08-12 15:52:49 +00:00 |
|
Ville Pietilä
|
cee7644c85
|
Working version 2 of the packed cast.
|
2025-08-12 12:46:01 +00:00 |
|
Ville Pietilä
|
6148d1c75f
|
WIP: Packed cast v2.
|
2025-08-11 15:18:30 +00:00 |
|
Ville Pietilä
|
e701f8fac1
|
Time kernels in testing.
|
2025-08-11 10:16:37 +00:00 |
|
Ville Pietilä
|
b9a8dbc720
|
Fix analysis script.
|
2025-08-11 10:08:32 +00:00 |
|
Ville Pietilä
|
39e7ae88e3
|
Performance analysis script.
|
2025-08-11 09:54:16 +00:00 |
|
Ville Pietilä
|
c675563468
|
Addlogging and specific unit tests for bf16 and gfx950.
|
2025-08-08 08:59:12 +00:00 |
|
Ville Pietilä
|
c47b80580d
|
Fix build issues when __gfx950__ macro is enabled.
|
2025-08-08 08:01:42 +00:00 |
|
Ville Pietilä
|
4b8a559da9
|
Fixed packed_cast implementation for slice access.
|
2025-08-06 11:04:29 +00:00 |
|
Ville Pietilä
|
44202b9d32
|
WIP: Integration of packed cast into gridwise_gemm_xdl_cshuffle_conv_v3.
|
2025-08-05 15:12:36 +00:00 |
|
Ville Pietilä
|
e92c0bf68e
|
Initial integaration of packed cast.
|
2025-08-04 15:34:35 +00:00 |
|