mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-03 05:01:25 +00:00
[CK_TILE] Epilogue chaining (Lwpck 3373) (#2773)
* Epilogue chainer * epilogue chainer with context to share state in between epilogues * chain-able epilogues for cshuffle * clang-format * rebase related changes - Added separate chainer test - clang format * comment resolutions * clang-format * Policy based chaining - basic Policy structure to control blanket looping and barrier placement. - to be extended for fine grianed control - to be modified to move possible auto-compute values and SFC access count to policy * Refactoring as per spec - Introduced epilogue schedule, graph - modified chainer to function with graph and schedule * minor_changes - made functions to overload in the epilogue_graph file * clang-format * Documentation and Comments - Added comments to files - Noted changes in changelog - Added README to explain the chainer and current status, exact use steps to be added * Comment resolutions - README modified with the suggested changes - Comment fixed accordingly * major refactoring - modified the chainer files to match the new design - updated comments - updated readme - multi-d example shocases use of the chainer * minor cleanup * tensor and rowcol quant chainer epilogue - added scalarepilogue for tensor quant - added schedule for tensorquant - modified quant example to use chainer and appropriate schedules * Refactor epilogue chainer: generalize ops and standardize context interface Address review comments. Changes: - Rename CastToLdsOp to CastAndStoreToLdsOp for clarity - Standardize context member names (working_tile, out_tile, aux_windows) - Update README documentation with correct operation names - Clean up parameter naming in epilogue_chainer.hpp (OutWindow, AccTile, AuxWindows) - common_epilogue_ops.hpp: General-purpose ops (ScaleScalarOp, CastAndStoreToLdsOp, LoadFromLdsOp, ElementwiseOp, StoreOp, MoveWindowsOp) - cshuffle_epilogue_chainer_ops.hpp: CShuffle-specific context and slice operations - epilogue_chainer.hpp: Cleaned up parameter naming for generality - Removed test files that are no longer needed. These were added for intermediate use * update cshuffle chainer ops file w.r.t cshuffle_epilogue.hpp updates & add chainer to quant gemm example * fix compile errors - CI uses c++17 while the code had c++20 features --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
bfac64953f
commit
15e81397a4
61
include/ck_tile/ops/epilogue/chainer/README.md
Normal file
61
include/ck_tile/ops/epilogue/chainer/README.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# CK Tile Epilogue Chainer
|
||||
|
||||
## Overview
|
||||
|
||||
The Epilogue Chainer provides a modular epilogue processing framework through scheduler-defined operation graphs.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Core Design Principle
|
||||
The chainer follows a **Scheduler-Graph-Node** architecture with shared context:
|
||||
- **Scheduler**: Defines operation graphs and creates a shared context
|
||||
- **Graph**: Composes multiple operations into sequential processing units
|
||||
- **Node**: Wraps individual epilogue operations with their arguments
|
||||
|
||||
### EpilogueChainer
|
||||
The `EpilogueChainer` struct serves as the modular epilogue processing facilitator. It delegates to schedulers for context creation and schedule generation, then processes the resulting operation graphs.
|
||||
|
||||
### EpilogueNode
|
||||
Individual epilogue operations are wrapped in `EpilogueNode` structures that capture required arguments at construction time and automatically forward them during processing. Supports both parameterized and parameter-free operations.
|
||||
|
||||
### EpilogueGraph
|
||||
The `EpilogueGraph` composes multiple nodes into sequential processing units that iterate over multiple accesses if needed, running all operations in order for each iteration.
|
||||
|
||||
## Files
|
||||
|
||||
### Core Infrastructure
|
||||
- `epilogue_chainer.hpp` - General chainer, node, and graph infrastructure
|
||||
- `common_epilogue_ops.hpp` - Epilogue operations usable with any epilogue type
|
||||
|
||||
### CShuffle Implementation
|
||||
- `cshuffle_epilogue_chainer_ops.hpp` - CShuffle-specific problem, context, and slice operations
|
||||
- `cshuffle_epilogue_schedule.hpp` - CShuffle scheduler with pre-built schedules
|
||||
|
||||
## Usage
|
||||
|
||||
### Common Operations (common_epilogue_ops.hpp)
|
||||
These operations work with any context that provides the standardized interface:
|
||||
- `ScaleScalarOp` - Scale working-tile by scalar values
|
||||
- `CastAndStoreToLdsOp<DstType>` - Cast working-tile and store to LDS
|
||||
- `LoadFromLdsOp<Pattern>` - Load output tile from LDS with sync
|
||||
- `ElementwiseOp<Func, NumAux>` - Apply elementwise operation with auxiliary tensors
|
||||
- `StoreOp<MemOp>` - Store output tile to global memory
|
||||
- `MoveWindowsOp<SFC, NumAux>` - Advance windows to next position
|
||||
|
||||
### CShuffle-Specific Operations (cshuffle_epilogue_chainer_ops.hpp)
|
||||
These operations are specific to CShuffle epilogue:
|
||||
- `CShuffleSliceOp` - Slice accumulator tile based on distribution
|
||||
- `CShuffleScaleWindowOp` - Scale using tensor windows with shuffle distribution
|
||||
|
||||
### Context Interface
|
||||
Operations communicate through a shared context with standardized members:
|
||||
- `working_tile`: Tile for intermediate computations
|
||||
- `out_tile`: Output tile
|
||||
- `aux_windows`: Tuple of auxiliary tensor windows
|
||||
- `lds_write_window`: Window for writing to LDS
|
||||
- `lds_read_window`: Window for reading from LDS
|
||||
|
||||
### Schedule Tags
|
||||
- `DefaultScheduleTag` - Standard: Slice → CastStore → Load → ApplyD → Store → Move
|
||||
- `RowColQuantScheduleTag` - With window scaling
|
||||
- `TensorQuantScheduleTag` - With scalar scaling
|
||||
Reference in New Issue
Block a user