This example demonstrates using cuda.bench and cuda.bench.results
to implement simple auto-tuning, demonstrated on selecting of
tile shape hyperparameter for naive stencil kernel implemented
in numba-cuda.
- Add license header to each example file.
- Fixed broken runs caused by type declarations.
- Fixed hang in throughput.py when --run-once by doing a
manual warm-up step, like in auto_throughput.py