[buffer store/load] Replace memcpy with __builtin_memcpy in cast_to_amdgpu_buffer_rsrc_t

We used memcpy to implement a bitcast of the opaque type
amdgcn_buffer_rsrc. However, hip's implementation of memcpy did not
allow the compiler to infer that the result of the copy of a uniform value
was also uniform.

This resulted in a waterfall loop over every value that the copy could
take (and a loss in performance).

When we use __builtin_memcpy, the optimizer correctly handles the
uniform copy.

Solves SWDEV-537500
This commit is contained in:
Juan Manuel Martinez Caamaño
2025-06-16 17:32:50 +02:00
parent a0f4db8d9c
commit 1810074bd7

View File

@@ -92,7 +92,7 @@ CK_TILE_DEVICE __amdgpu_buffer_rsrc_t cast_to_amdgpu_buffer_rsrc_t(int32x4_t res
{
__amdgpu_buffer_rsrc_t as_rsrc;
static_assert(sizeof(res) == sizeof(as_rsrc) && "Size of buffer resource should match");
memcpy(&as_rsrc, &res, sizeof(res));
__builtin_memcpy(&as_rsrc, &res, sizeof(res));
return as_rsrc;
}
#endif