See the new thrust/sort/basic.cu benchmark for usage.
Other notable changes:
- Updated summary column names:
- Cold GPU -> GPU Time
- Cold CPU -> CPU Time
- Hot GPU -> Batch GPU
- Removed CPU timings from measure_hot
- They'd been hidden for a while, and aren't really useful.
- Moved the throughput calcs to measure_cold
- `timer` will disable `hot` timings, still want throughput
- `cold` timings make more sense for throughput, global BW numbers
are meaningless if the data is sitting in L2.