* json_printer.cu changed to use write-out buffer of 4KB
The json_printer::do_process_bulk_data_float64 used to write
out one float32 value at a time. This PR introduces a buffer of 4KB
that is being filled with values until full, and then written out.
The 4KB value aligns with system memory page size and seems
appropriate for relatively small datasizes of duration measurements.
* Add explicit static cast from std::size_t to std::streamsize
The explcit cast avoids narrowing error.
* Factor out writing array out to binary file into standalone function
This function is templated based on buffer-size. The function can be
reused to also write-out frequence samples in the future.
Provided more context to the command stated in the readme, and
changed so as to not hard-code installation paths of NVBench,
and checkout path of pybind11.
1. Download and include CPM.cmake, version 0.42.0
2. Use CPM.make to get Pybind11
3. Update to use pybind11=3.0.0
4. Also use CPM to configure/build nvbench
Add nvbench.State methods to get Python dictionary representing
axis values of benchmark configuration state represents.
get_axis_values_as_string gives a string of space-separated
name=values pairs.
Make it explicit in README that we build and locally install NVBench first,
and then build Python package use the library as a dependency.
The nvbench library is installed into Python layout alongside the native
extension.
Add comments stating the need to keep implementation and Python stub
file in sync to both files. In the stub file to comment documents
use of mypy's stubgen to generate stubs and calls to compare that against
current stubs. It also calls out the need to keep docstrings and
doctring examples in sync with implementation.
- Add license header to each example file.
- Fixed broken runs caused by type declarations.
- Fixed hang in throughput.py when --run-once by doing a
manual warm-up step, like in auto_throughput.py
Removed use of __all__ per PR feedback. Emit warnings.warn if
version information could not be retrieved from the package metadata,
e.g., package has been renamed by source code was not updated.
Introduce get_int64_or_default method, and counterparts for
float64 and string.
Provided names for Python arguments.
Tried generating Python stubs automatically with
```
stubgen -m cuda.nvbench._nvbench
```
Gave up on this, since it does not include doc-strings.
It would be nice to compare auto-generated _nvbench.pyi with
__init__.pyi for discrepancies though.
Edit wheel.packages metadata to include namespace package "cuda".
Updated README to remove the work-around of setting PYTHONPATH,
as it is no longer necessary.
Change example to illustrate timing CPU work.
First example does only CPU work (sleeps), use CPU-only timer.
Second examples does both CPU and GPU work (sleeps in either case).
Use cold-run timer with/without sync tag to measure both CPU and GPU times.