- prepare_workspace_async: allocate pinned host staging before enqueuing
the dq_acc memset. If pinned_host_alloc throws, no stream work has
been issued yet, so the workspace is left cleanly un-prepared rather
than half-initialized.
- pack_workspace_host catch: note that the H2D queued after the
callback will copy indeterminate metadata if the catch fires (kernel
will produce wrong results); unlikely since pack only throws on
precondition violations.
- schedule_pin_staging_release: std::move pin_staging_ into the heap
shared_ptr; the next line in prepare_workspace_async overwrites it,
so the extra atomic inc/dec from a copy is wasted.