composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-04-19 22:39:03 +00:00

Files

Andriy Roshchenko c3515f277c Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. (#1473 )

* Enable CMakePresets build

* Verify Convolution, Scaling and ReLU algorithms.

* Add tensor element-wise scale and type cast operation.

* Reduction implemented but does not work.

* Exploration of Reduction functionality.

* Completed example for Convolution scaled with ReLu activation and AMAX reduction.

* WIP: Add required instances for convolution.

* WIP: Create client example. Implement convolution stage.

* Add elementwise instances.

* Add elementwise scale + convert example.

* Add reduction instances.

* WIP: Client example for AMAX reduction.

* WIP: Add instances for multistage reduction.

* WIP: Implementation of multistage reduction.

* Refactoring.

* Clean up.

* Add CMakePresets.json

* Guard off FP8 instances when the data type is not available.

* Add example for Scaled FP8 Convolution with AMAX reduction.

* Refactor CombConvScaleRelu instances.

* Add CombConvScale instances.

* Add client example for Scaled FP8 Convolution with AMAX reduction.

* Cleanup.

2024-08-21 15:22:41 -07:00

binary_element_wise_operation.hpp

Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. (#1412 )

2024-07-24 15:49:55 -05:00

combined_element_wise_operation.hpp

Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. (#1473 )

2024-08-21 15:22:41 -07:00

element_wise_operation.hpp

[GEMM] F8 GEMM, performance optimized. (#1384 )

2024-07-19 22:06:52 +08:00

quantization_operation.hpp

Conv + quantization + tanh (#645 )

2023-03-29 14:50:23 -05:00

unary_element_wise_operation.hpp

Replace the using of __expf by __ocml_exp_f32 to work-around the test_softmax_rank4 failure (#1394 )

2024-07-17 09:15:05 -07:00