mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-06-29 11:16:59 +00:00
[buffer store/load] Replace memcpy with __builtin_memcpy in cast_to_amdgpu_buffer_rsrc_t
We used memcpy to implement a bitcast of the opaque type amdgcn_buffer_rsrc. However, hip's implementation of memcpy did not allow the compiler to infer that the result of the copy of a uniform value was also uniform. This resulted in a waterfall loop over every value that the copy could take (and a loss in performance). When we use __builtin_memcpy, the optimizer correctly handles the uniform copy. Solves SWDEV-537500
This commit is contained in:
@@ -92,7 +92,7 @@ CK_TILE_DEVICE __amdgpu_buffer_rsrc_t cast_to_amdgpu_buffer_rsrc_t(int32x4_t res
|
||||
{
|
||||
__amdgpu_buffer_rsrc_t as_rsrc;
|
||||
static_assert(sizeof(res) == sizeof(as_rsrc) && "Size of buffer resource should match");
|
||||
memcpy(&as_rsrc, &res, sizeof(res));
|
||||
__builtin_memcpy(&as_rsrc, &res, sizeof(res));
|
||||
return as_rsrc;
|
||||
}
|
||||
#endif
|
||||
|
||||
Reference in New Issue
Block a user