mirror of
https://github.com/microsoft/mscclpp.git
synced 2026-05-11 17:00:22 +00:00
tests/ep: LL bench combine uses recv_tokens×hidden for payload bytes
Each local expert sends one copy per dispatched token back to its owner, so the bytes actually on the wire during combine match dispatch. The previous num_tokens×hidden under-counted by ~num_topk×, making combine BW look artificially low next to dispatch.
This commit is contained in:
@@ -256,10 +256,12 @@ def main():
|
||||
comb_us = start_ev.elapsed_time(end_ev) * 1e3 / iters
|
||||
|
||||
# Dispatch payload: recv_tokens × hidden × bf16 (received on this rank).
|
||||
# Combine payload: num_tokens × hidden × bf16 (sent from each local expert
|
||||
# back to the owning rank; one token's worth of bytes per reduction).
|
||||
# Combine payload: recv_tokens × hidden × bf16 as well -- each local expert
|
||||
# sends one copy per dispatched token back to its owner, so the bytes on
|
||||
# the wire match dispatch. Using num_tokens × hidden here would under-count
|
||||
# the actual send payload by ~num_topk×.
|
||||
disp_bytes = recv_tokens * hidden * 2
|
||||
comb_bytes = num_tokens * hidden * 2
|
||||
comb_bytes = recv_tokens * hidden * 2
|
||||
disp_bw = disp_bytes / (disp_us * 1e-6) / 1e9
|
||||
comb_bw = comb_bytes / (comb_us * 1e-6) / 1e9
|
||||
|
||||
|
||||
Reference in New Issue
Block a user