Bartłomiej Kocot
cbc8335964
Improve XDL to WMMA porting for grouped conv fwd ( #3456 )
...
Refactors the way the number of XDL (matrix multiply-accumulate) instructions per wave is calculated and used in the grouped convolution forward implementations, especially to better support WMMA (Wave Matrix Multiply-Accumulate) instructions and 16x16 tiles.
The changes use MXdlPerWave instead of NXdlPerWave to increase number of waves per M dim.
2025-12-19 15:58:51 -07:00
..
2025-12-19 15:58:51 -07:00
2025-12-11 09:50:00 +02:00
2025-12-19 15:58:51 -07:00
2025-12-18 13:12:15 -07:00
2025-12-05 07:44:10 -08:00
2025-12-05 07:44:10 -08:00
2025-11-20 17:40:55 -08:00
2025-11-20 17:40:55 -08:00
2025-11-20 17:40:55 -08:00
2025-11-20 17:40:55 -08:00
2025-12-11 09:50:00 +02:00
2025-12-11 09:50:00 +02:00
2025-11-20 17:40:55 -08:00
2025-12-11 09:50:00 +02:00
2025-12-11 09:50:00 +02:00
2025-11-20 17:40:55 -08:00
2025-12-11 08:25:29 -08:00
2025-11-20 17:40:55 -08:00
2025-12-19 15:58:51 -07:00
2025-12-11 09:50:00 +02:00
2025-11-20 17:40:55 -08:00
2025-12-14 12:49:12 -08:00
2025-12-14 12:49:12 -08:00
2025-12-14 12:49:12 -08:00
2025-12-14 12:49:12 -08:00
2025-12-14 12:49:12 -08:00
2025-12-14 12:49:12 -08:00
2025-12-05 07:44:10 -08:00
2025-12-13 15:33:41 +01:00
2025-12-13 15:33:41 +01:00
2025-12-13 15:33:41 +01:00
2025-12-08 10:32:56 +01:00
2025-12-13 15:33:41 +01:00
2025-12-11 09:50:00 +02:00
2025-12-08 10:32:56 +01:00
2025-12-08 10:32:56 +01:00
2025-12-08 10:32:56 +01:00
2025-12-13 15:33:41 +01:00
2025-12-13 15:33:41 +01:00