gpuCalloc*()
Previous `gpuCalloc*()` creates a new stream for each allocation, which messes the timeline up in profiler traces. Now `GpuStreamPool` allows reusing the temporal streams.