Our copyright were were mostly correct, but we inconsistently used (C) instead of (c) like the rest of the CK code. This PR fixes that (using lowercase c) and adds a missing copyright header to one file.
[ROCm/composable_kernel commit: f38c3de9f9]
Grouping transfer operations per tensor makes it easier to
constrain on and operate with the transfer operations. As an
example, we can now deduplicate the logic for translating
the transfer operations from the ck-builder interface to the old
ck interface for the A and B tensors.
[ROCm/composable_kernel commit: 245c6011cf]
Proposed changes
Improve the forward convolution builder implementation and addressed leftover feedback left from PR #3138. Main changes
Refactored tests such that they reflect better the builder pattern. The templates and types for the convolution algorithm concepts are created via factory that facilitates programmatic creation of the device op instances.
Moved tests into anonymous namespace.
The convolution factory had lot of if-else constructs when CK Builder types were converted into CK library types. I had initially trouble in using static_assert in the default branch of switch as the static_assert was evaluated at compile time even for valid types. However, if we change the static_assert to throw "<error message>", it will result in a compile-time error only if the default branch is actually hit. This assumes that the function is consteval. Hence, changed all conversions in the convolution factory to use switch, which is more intuitive.
Removed the explicit device op definition from convolution signature and the corresponding predicate file. The device ops are defined by the corresponding concepts. This allowed to remove lot of boilerplate code from the convolution factory.
Adde inheritance and convolution algorithm specialization to handle device ops that are specialization of a more generic ones. The large tensor support is more naturally expressed by this pattern.
Added support for the FP8 data type.
* WIP: Builder for expected test results.
* Improve ckb fwd conv instance tests.
* clang-format
* Change if-else statements into switch in conv factory.
* Fix clang-formatting.
* Removed unnecessary includes.
* Added missing copyright.
* Remove explicit device op flag from from convolution signature.
* Add missing concept.
* Fix build.
* clang-format
* Add test for building conv fwd FP8 instances.
* Add missing header to instance traits.
* Clean-up recently added instances.
* Introduce inheritance and specialization.
* Use builder to build conv algorithm templates and types.
* clang-format
* Fix conv description tests.
---------
Co-authored-by: John Shumway <john.shumwayjr@gmail.com>
[ROCm/composable_kernel commit: 7d57bc169f]
* Add device operation to conv signature. Use unions to hold conv layouts and device operations.
* Add predicates for all device op instances.
* Use the device op signature for validation.
* Fix ckb CMakeLists.txt file for tests.
* Fix building CK Builder instance traits after the introduction of direct load template parameter in CK.
* Fix clang-formatting.
* add device_grouped_conv_fwd_dl_multiple_d_nhwc_kyxc_nhwk
* Add full DL configurability with Option A implementation
- Added 5 DL descriptor structs (39 configurable parameters)
- Added 10 C++20 concepts for type-safe validation
- Updated factory to read all parameters from descriptors
- Updated test helper to populate all descriptors
- All tests passing (13/13 including 3 new DL tests)
* Add factory and test support for DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
- Add factory specialization for Large_Tensor device operation (conv_factory.hpp lines 1145-1265)
- Add macro collision workaround using pragma push/pop (conv_factory.hpp lines 43-51)
- Add test helper function run_test_DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
- Add builder test file test_ckb_conv_fwd_2d_large_tensor_fp16.cpp with 2 test cases
- Update CMakeLists.txt to include new test file
- Reuse existing ConvAlgorithm_DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle descriptor
- Map all 42 template parameters identical to regular XDL CShuffle
- All 15 builder tests passing including 2 new Large_Tensor tests
Completes Task 350: All 4 forward convolution device operations now supported in CK Builder.
* Update copyright headers to new format
- Change copyright format to: Copyright (C) Advanced Micro Devices, Inc., or its affiliates.
- Reorder headers: Copyright first, then SPDX-License-Identifier
- Updated files:
* experimental/builder/test/conv/test_ckb_conv_fwd_2d_dl_fp16.cpp
* experimental/builder/test/conv/test_ckb_conv_fwd_2d_large_tensor_fp16.cpp
* experimental/builder/include/ck_tile/builder/device_op_types.hpp
* fix c++ 18 format
* Fix clang-format-18 error in device_op_types.hpp
---------
Co-authored-by: Ville Pietilä <ville.pietila@amd.com>
Co-authored-by: Ville Pietilä <188998872+vpietila-amd@users.noreply.github.com>
[ROCm/composable_kernel commit: 5f3cae3e28]
* Update copyright messages.
Copyright messages should no longer include a year. This PR updates all 38 source files to the new format.
* Switch to (C) from unicode copyright symbol.
The unicodein comments was causing compilation errors.
[ROCm/composable_kernel commit: 0be0288f58]
* Add device operation to conv signature. Use unions to hold conv layouts and device operations.
* Add predicates for all device op instances.
* Use the device op signature for validation.
* Fix ckb CMakeLists.txt file for tests.
* Fix building CK Builder instance traits after the introduction of direct load template parameter in CK.
* Fix clang-formatting.
* Add factory for DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle device op.
* Add conv factory for DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
* Rename elements per wave per shuffle member in the epilogue concept.
* clang-format
* Add concepts and types for optional device op template parameters.
* Add optional compute, direct load, and loop scheduler arguments to conv factory.
* Add number of groups to merge template parameter.
* clang-format.
[ROCm/composable_kernel commit: 3ae3992c18]
Generalize the current convolution factory in CK Builder to be able to build instances of any relevant convolution device operation. The main changes are:
* Added new enums FwdGroupConvDeviceOperation, BwdDataGroupConvDeviceOperation, and * BwdWeightGroupConvDeviceOperation that contain the device operations for which the builder should be able to build instances.
* Create a union structure GroupConvDeviceOp that can represent a single value of the fwd, bwd weight, or bwd data device operations. This would be more naturally represented by std::variant object, but we cannot use std::variant in NTTPs because it is not a structural object.
* Introduced a new member device_operation in the ConvSignatureDescriptor concept that assumes GroupConvDeviceOp value.
* Added predicates to be used in creation ConvFactory specialization for the different device operation. When we add support for a new device operation, we'll just create a new ConvFactory specialization with appropriate predicates.
* Changed handling of the convolution layouts (GroupConvLayout1D, GroupConvLayout2D, GroupConvLayout3D) to use the union based handling, i.e., there's now a GroupConvLayout union struct that can hold a single value of the 1D, 2D, or 3D layouts. This simplifies the handling of the different layouts as we get rid of templatized convolution signature.
These code changes allow developers to work more easily in parallel when adding new device operations.
* Fix building CK Builder instance traits after the introduction of direct load template parameter in CK.
* Fix clang-formatting.
[ROCm/composable_kernel commit: b387249fd9]