* Few small fixes.
* New GroupedGemm instances (BF16)
* Unify and refactor GroupedGEMM device API.
* Adapt changes to new API.
* Adapt grouped gemm profiler.
* Accept multiple kbatches for grouped gemm profiler.
- delete obsolete two stage as it is now covered by grouped gemm
* Update unit test for grouped gemm.
* Fix thresholds for BF16 and F8. Unblock tests.
* Fix few instances.
* Multiple small fixes.
* Adapt to new API, check dynamic casting.
* Uncomment few data types in grouped gemm profiler.
* Fix call to SetDeviceArgs.
* Fix profile grouped gemm multiply tile loop.
* Fix grouped gemm tile loop kernel args in client examples.
* Review comments.
* add bf16 gemms for gfx11/gfx12
* reduce the input values in test_gemm
* add int8 wmma gemm instances for gfx11/gfx12
* add example gemm_wmma_int8
* fix bug in gemm_wmma_int8 test
* increase bf16 gemm test tolerance
* update the dates and clean-up commented-out instances
* Calculate generic relative threshold pool3dfwd
* Calculate absolute error threshold pool3d fwd
* Generic threshold calculation take max input for relative error pool3dfwd
* Remove max possible value for error calculation at runtime
* Remove debug print in pool3dfwd
* Pool3d fwd adjusted types in generic threshold calculation
* Generic threshold calculation take into account number of accumulations and accdatatype
* Generic threshold fix final error formula
* Generic threshold calculation - num of accs fix
* Generic threshold calculation - adjust absolute error
* Generic threshold calculation - OutDataType in absolute error
* Add non_native_vector_type
* Add a test
* Add non-native vector type
* Fix CTOR
* Fix non-native vector type of 1
* Fix CTORs
* Use vector_type to cover non-native implementation as well
* Update the test
* Format
* Format
* Fix copyright years
* Remove BoolVecT so far
* Add AsType test cases
* Update assert error message
* Remove redundant type
* Update naming
* Add complex half type with tests
* Add tests for vector reshaping
* Add missing alignas
* Update test/data_type/test_custom_type.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
* Compare custom types to built-in types
* Add default constructor test
* Add an alignment test
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
Without this change, the following diagnostic is generated:
a template argument list is expected after a name prefixed by the template
keyword [-Wmissing-template-arg-list-after-template-kw]
See C++17 spec [temp.names] p5.
* Legacy support: customized filesystem
* Update cmakefile for python alternative path
* fix build issues
* CK has no boost dependency
* More fixes to issues found on legay systems
* fix clang format issue
* Check if blob is correctly generated in cmake
* fix the python issues
* add a compiler flag for codegen when using alternative python
* use target_link_options instead of target_compile_options
---------
Co-authored-by: illsilin <Illia.Silin@amd.com>
* revert ckprofiler change
* temp save
* Add test and test pass
* test pass
* Fix bug inside rotating buffer when tensor is not packed
* bug fix
* clang format
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
* Enable CMakePresets build
* Verify Convolution, Scaling and ReLU algorithms.
* Add tensor element-wise scale and type cast operation.
* Reduction implemented but does not work.
* Exploration of Reduction functionality.
* Completed example for Convolution scaled with ReLu activation and AMAX reduction.
* WIP: Add required instances for convolution.
* WIP: Create client example. Implement convolution stage.
* Add elementwise instances.
* Add elementwise scale + convert example.
* Add reduction instances.
* WIP: Client example for AMAX reduction.
* WIP: Add instances for multistage reduction.
* WIP: Implementation of multistage reduction.
* Refactoring.
* Clean up.
* Add CMakePresets.json
* Guard off FP8 instances when the data type is not available.
* Add example for Scaled FP8 Convolution with AMAX reduction.
* Refactor CombConvScaleRelu instances.
* Add CombConvScale instances.
* Add client example for Scaled FP8 Convolution with AMAX reduction.
* Cleanup.
* Set RNE fp8 conversion as a default
* Update f8 tests
* Disable failing test on gfx11
* Update bf8 tests
* Add a flag
* Fix the flag
* Raise flag for gfx10 as well
* Temp commit for tolerance testing
* Update tolerances
* Support 64 bit indexing
* Add new grouped conv fwd kernel for large tensors
* Add instances large tensor
* Fixes for transform conv to gemm
* Fixes
* fixes
* Remove not needed instances
* examples fixes
* Remove not need ds arrays
* Fix tests
* Add 2GB check in gridwise dl
* Fixes