[CK_TILE] Epilogue chaining (Lwpck 3373) (#2773)

* Epilogue chainer

* epilogue chainer with context to share state in between epilogues
* chain-able epilogues for cshuffle

* clang-format

* rebase related changes

- Added separate chainer test
-  clang format

* comment resolutions

* clang-format

* Policy based chaining

- basic Policy structure to control blanket looping and barrier
placement.

- to be extended for fine grianed control

- to  be modified to move possible auto-compute values and SFC  access
count to policy

* Refactoring as per spec

- Introduced epilogue schedule, graph
- modified chainer to function with graph and schedule

* minor_changes

- made functions to overload in the epilogue_graph file

* clang-format

* Documentation and Comments

- Added comments to files
- Noted changes in changelog
- Added README to explain the chainer and current status, exact use
steps to be added

* Comment resolutions

- README modified with the suggested changes
- Comment fixed accordingly

* major refactoring

- modified the chainer files to match the new design
- updated comments
- updated readme
- multi-d example shocases use of the chainer

* minor cleanup

* tensor and rowcol quant chainer epilogue

- added scalarepilogue for tensor quant
- added schedule for tensorquant
- modified quant example to use chainer and appropriate schedules

* Refactor epilogue chainer: generalize ops and standardize context interface

Address review comments.

Changes:
- Rename CastToLdsOp to CastAndStoreToLdsOp for clarity
- Standardize context member names (working_tile, out_tile, aux_windows)
- Update README documentation with correct operation names
- Clean up parameter naming in epilogue_chainer.hpp (OutWindow, AccTile,
AuxWindows)
- common_epilogue_ops.hpp: General-purpose ops (ScaleScalarOp,
CastAndStoreToLdsOp,
  LoadFromLdsOp, ElementwiseOp, StoreOp, MoveWindowsOp)
- cshuffle_epilogue_chainer_ops.hpp: CShuffle-specific context and slice
operations
- epilogue_chainer.hpp: Cleaned up parameter naming for generality
- Removed test files that are no longer needed. These were added for
intermediate use

* update cshuffle chainer ops file w.r.t cshuffle_epilogue.hpp updates & add chainer to quant gemm example

* fix compile errors

- CI uses c++17 while the code had c++20 features

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
This commit is contained in:
Yashvardhan Agarwal
2025-12-18 11:02:02 +02:00
committed by GitHub
parent bfac64953f
commit 15e81397a4
9 changed files with 1244 additions and 42 deletions

View File

@@ -0,0 +1,61 @@
# CK Tile Epilogue Chainer
## Overview
The Epilogue Chainer provides a modular epilogue processing framework through scheduler-defined operation graphs.
## Architecture
### Core Design Principle
The chainer follows a **Scheduler-Graph-Node** architecture with shared context:
- **Scheduler**: Defines operation graphs and creates a shared context
- **Graph**: Composes multiple operations into sequential processing units
- **Node**: Wraps individual epilogue operations with their arguments
### EpilogueChainer
The `EpilogueChainer` struct serves as the modular epilogue processing facilitator. It delegates to schedulers for context creation and schedule generation, then processes the resulting operation graphs.
### EpilogueNode
Individual epilogue operations are wrapped in `EpilogueNode` structures that capture required arguments at construction time and automatically forward them during processing. Supports both parameterized and parameter-free operations.
### EpilogueGraph
The `EpilogueGraph` composes multiple nodes into sequential processing units that iterate over multiple accesses if needed, running all operations in order for each iteration.
## Files
### Core Infrastructure
- `epilogue_chainer.hpp` - General chainer, node, and graph infrastructure
- `common_epilogue_ops.hpp` - Epilogue operations usable with any epilogue type
### CShuffle Implementation
- `cshuffle_epilogue_chainer_ops.hpp` - CShuffle-specific problem, context, and slice operations
- `cshuffle_epilogue_schedule.hpp` - CShuffle scheduler with pre-built schedules
## Usage
### Common Operations (common_epilogue_ops.hpp)
These operations work with any context that provides the standardized interface:
- `ScaleScalarOp` - Scale working-tile by scalar values
- `CastAndStoreToLdsOp<DstType>` - Cast working-tile and store to LDS
- `LoadFromLdsOp<Pattern>` - Load output tile from LDS with sync
- `ElementwiseOp<Func, NumAux>` - Apply elementwise operation with auxiliary tensors
- `StoreOp<MemOp>` - Store output tile to global memory
- `MoveWindowsOp<SFC, NumAux>` - Advance windows to next position
### CShuffle-Specific Operations (cshuffle_epilogue_chainer_ops.hpp)
These operations are specific to CShuffle epilogue:
- `CShuffleSliceOp` - Slice accumulator tile based on distribution
- `CShuffleScaleWindowOp` - Scale using tensor windows with shuffle distribution
### Context Interface
Operations communicate through a shared context with standardized members:
- `working_tile`: Tile for intermediate computations
- `out_tile`: Output tile
- `aux_windows`: Tuple of auxiliary tensor windows
- `lds_write_window`: Window for writing to LDS
- `lds_read_window`: Window for reading from LDS
### Schedule Tags
- `DefaultScheduleTag` - Standard: Slice → CastStore → Load → ApplyD → Store → Move
- `RowColQuantScheduleTag` - With window scaling
- `TensorQuantScheduleTag` - With scalar scaling