mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-06-29 19:28:33 +00:00
We used memcpy to implement a bitcast of the opaque type amdgcn_buffer_rsrc. However, hip's implementation of memcpy did not allow the compiler to infer that the result of the copy of a uniform value was also uniform. This resulted in a waterfall loop over every value that the copy could take (and a loss in performance). When we use __builtin_memcpy, the optimizer correctly handles the uniform copy. Solves SWDEV-537500