Qianfeng
a1b2441f8d
Batchnorm inference instances, external API, client examples and gtests ( #531 )
...
* File renaming and class renaming for device element-wise operation
* Add batchnorm-infer instances, external API and client example
* Add batchnorm-infer profiler module and gtests
* Remove file device_elementwise_extension.hpp and move NormalizeInInfer operation to element_wise_operation.hpp
* Remove the using of class aliasing for DeviceElementwiseForBatchNormInfer
* Rename class and file due to conflict from device_elementwise_2d.hpp
* Fix namespace in batcnnorm_infer_nhwc client example
2023-01-25 17:09:04 -06:00
arai713
0e5c264c3e
Gridwise elementwise 2d ( #466 )
...
* added 2d gridwise elementwise
* added 2d version of device elementwise
* added example file with updated device elementwise call
* added Cmake file
* changed NumDim into 2D
* fixed compiler issues
* fixed indexing for loop step
* fixed NumDim dimension error
* changed blockID to 2D
* updated Grid Desc
* updated kernel call
* fixed 2d thread indexing
* added dimensions for example file
* commented out unused code
* changed vector load
* removed extra code
* temporarily removing vector load on 2nd dim
* changed vector load back, still causing errors
* altered indexing
* changed isSupportedArgument for 2D
* changed indexing + do/while
* fixed isSupportedArgument
* changed dimension for debugging
* fixed
* added testing printouts
* testing change
* added variables to distribute threads through both dimensions
* testing changes
* integrated variable for thread distribution into device elementwise and added as parameter for gridwise elementwise
* removed most of the extraneous code, testing with different dimensions
* testing
* removed debugging print statements
* moved 2d elementwise permute into elementwise permute directory
* fixed formatting
* removed debugging comments from threadwise transfer
Co-authored-by: Jing Zhang <jizhan@amd.com >
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
2022-12-12 09:18:10 -06:00
Po Yen Chen
dc663fae29
Rangify STL algorithms ( #438 )
...
* Rangify STL algorithms
This commit adapts rangified std::copy(), std::fill() & std::transform()
* Re-write more std::copy() calls
* Re-write std::copy() calls in profiler
2022-11-14 15:17:28 -06:00
arai713
685860c2a9
Tensor permutation ( #479 )
2022-10-18 23:24:19 -05:00