Files
composable_kernel/include/ck_tile/ops/epilogue/chainer
Yashvardhan Agarwal 15e81397a4 [CK_TILE] Epilogue chaining (Lwpck 3373) (#2773)
* Epilogue chainer

* epilogue chainer with context to share state in between epilogues
* chain-able epilogues for cshuffle

* clang-format

* rebase related changes

- Added separate chainer test
-  clang format

* comment resolutions

* clang-format

* Policy based chaining

- basic Policy structure to control blanket looping and barrier
placement.

- to be extended for fine grianed control

- to  be modified to move possible auto-compute values and SFC  access
count to policy

* Refactoring as per spec

- Introduced epilogue schedule, graph
- modified chainer to function with graph and schedule

* minor_changes

- made functions to overload in the epilogue_graph file

* clang-format

* Documentation and Comments

- Added comments to files
- Noted changes in changelog
- Added README to explain the chainer and current status, exact use
steps to be added

* Comment resolutions

- README modified with the suggested changes
- Comment fixed accordingly

* major refactoring

- modified the chainer files to match the new design
- updated comments
- updated readme
- multi-d example shocases use of the chainer

* minor cleanup

* tensor and rowcol quant chainer epilogue

- added scalarepilogue for tensor quant
- added schedule for tensorquant
- modified quant example to use chainer and appropriate schedules

* Refactor epilogue chainer: generalize ops and standardize context interface

Address review comments.

Changes:
- Rename CastToLdsOp to CastAndStoreToLdsOp for clarity
- Standardize context member names (working_tile, out_tile, aux_windows)
- Update README documentation with correct operation names
- Clean up parameter naming in epilogue_chainer.hpp (OutWindow, AccTile,
AuxWindows)
- common_epilogue_ops.hpp: General-purpose ops (ScaleScalarOp,
CastAndStoreToLdsOp,
  LoadFromLdsOp, ElementwiseOp, StoreOp, MoveWindowsOp)
- cshuffle_epilogue_chainer_ops.hpp: CShuffle-specific context and slice
operations
- epilogue_chainer.hpp: Cleaned up parameter naming for generality
- Removed test files that are no longer needed. These were added for
intermediate use

* update cshuffle chainer ops file w.r.t cshuffle_epilogue.hpp updates & add chainer to quant gemm example

* fix compile errors

- CI uses c++17 while the code had c++20 features

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
2025-12-18 10:02:02 +01:00
..

CK Tile Epilogue Chainer

Overview

The Epilogue Chainer provides a modular epilogue processing framework through scheduler-defined operation graphs.

Architecture

Core Design Principle

The chainer follows a Scheduler-Graph-Node architecture with shared context:

  • Scheduler: Defines operation graphs and creates a shared context
  • Graph: Composes multiple operations into sequential processing units
  • Node: Wraps individual epilogue operations with their arguments

EpilogueChainer

The EpilogueChainer struct serves as the modular epilogue processing facilitator. It delegates to schedulers for context creation and schedule generation, then processes the resulting operation graphs.

EpilogueNode

Individual epilogue operations are wrapped in EpilogueNode structures that capture required arguments at construction time and automatically forward them during processing. Supports both parameterized and parameter-free operations.

EpilogueGraph

The EpilogueGraph composes multiple nodes into sequential processing units that iterate over multiple accesses if needed, running all operations in order for each iteration.

Files

Core Infrastructure

  • epilogue_chainer.hpp - General chainer, node, and graph infrastructure
  • common_epilogue_ops.hpp - Epilogue operations usable with any epilogue type

CShuffle Implementation

  • cshuffle_epilogue_chainer_ops.hpp - CShuffle-specific problem, context, and slice operations
  • cshuffle_epilogue_schedule.hpp - CShuffle scheduler with pre-built schedules

Usage

Common Operations (common_epilogue_ops.hpp)

These operations work with any context that provides the standardized interface:

  • ScaleScalarOp - Scale working-tile by scalar values
  • CastAndStoreToLdsOp<DstType> - Cast working-tile and store to LDS
  • LoadFromLdsOp<Pattern> - Load output tile from LDS with sync
  • ElementwiseOp<Func, NumAux> - Apply elementwise operation with auxiliary tensors
  • StoreOp<MemOp> - Store output tile to global memory
  • MoveWindowsOp<SFC, NumAux> - Advance windows to next position

CShuffle-Specific Operations (cshuffle_epilogue_chainer_ops.hpp)

These operations are specific to CShuffle epilogue:

  • CShuffleSliceOp - Slice accumulator tile based on distribution
  • CShuffleScaleWindowOp - Scale using tensor windows with shuffle distribution

Context Interface

Operations communicate through a shared context with standardized members:

  • working_tile: Tile for intermediate computations
  • out_tile: Output tile
  • aux_windows: Tuple of auxiliary tensor windows
  • lds_write_window: Window for writing to LDS
  • lds_read_window: Window for reading from LDS

Schedule Tags

  • DefaultScheduleTag - Standard: Slice → CastStore → Load → ApplyD → Store → Move
  • RowColQuantScheduleTag - With window scaling
  • TensorQuantScheduleTag - With scalar scaling