[rocm-libraries] ROCm/rocm-libraries#6379 (commit b38b056)

[ck] Clamp negative kernel execution elapsed time to zero
 (#6379)

## Motivation

hipEventElapsedTime can return a small negative value on Windows when
timing a very fast kernel launch on the null stream. This caused
consumers of launch_and_time_kernel to receive a negative elapsed time,
which they reasonably treat as an error, breaking otherwise-correct
kernel executions.

## Technical Details

After calling hipEventElapsedTime, a clamp is applied in
launch_and_time_kernel before the result is returned, avoiding the
return of a physically impossible elapsed time.

The negative value from hipEventElapsedTime has been observed on
Windows. For kernels that complete in well under a millisecond, the HIP
event timestamps can alias such that the computed difference is a small
negative number (observed: ~-1.78 ms). No HIP error is reported by any
surrounding call (hipEventRecord, hipEventSynchronize, hipGetLastError),
confirming the kernel itself executed successfully.

## Test Plan

- Recompile CK and validate no kernel execution reports a negative
elapsed time during hipTensor tests.
- Pass the CI/CD pre-checking tests for CK.

## Test Result

- All tests passing

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
This commit is contained in:
Estevan Vedovelli
2026-04-14 16:15:17 +00:00
committed by assistant-librarian[bot]
parent 14f7834a23
commit f6eb5f0a6a

View File

@@ -70,6 +70,11 @@ float launch_and_time_kernel(const StreamConfig& stream_config,
hip_check_error(hipEventElapsedTime(&total_time, start, stop));
// hipEventElapsedTime can return a small negative value on Windows for a
// very fast kernel. Clamp to zero, as negative elapsed time is never physical.
if(total_time < 0)
total_time = 0;
hip_check_error(hipEventDestroy(start));
hip_check_error(hipEventDestroy(stop));