Files
composable_kernel/example
joyeamd 94d47b1680 fmha hdim256 vectorize improve (#2086)
For hdim 256, will not have vectorized buffer load when seqlen % 256 != 0 and hdim % 256 = 0; this commit tries to solve this condition.
2025-04-16 09:21:04 +08:00
..
2024-05-10 09:41:39 -07:00
2025-04-09 10:06:42 -07:00