From 788890ac837c77295702c0362db23012dc06bf92 Mon Sep 17 00:00:00 2001 From: Tianxing Wu Date: Fri, 2 Jan 2026 14:23:15 +0200 Subject: [PATCH] Update example/ck_tile/42_unified_attention/README.md Co-authored-by: spolifroni-amd --- example/ck_tile/42_unified_attention/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/example/ck_tile/42_unified_attention/README.md b/example/ck_tile/42_unified_attention/README.md index 03674f4d8e..27cfdea23b 100644 --- a/example/ck_tile/42_unified_attention/README.md +++ b/example/ck_tile/42_unified_attention/README.md @@ -17,7 +17,7 @@ The kernel template is `unified_attention.hpp`, this is the grid-wise op in old There are 2 template parameters for this kernel template. * `FmhaPipeline` is one of the block_tile_pipeline(under `include/ck_tile/tile_program/block_tile_pipeline`) which is a performance critical component. Indeed, we did a lot of optimization and trials to optimize the pipeline and may still workout more performance pipeline and update into that folder. People only need to replace this pipeline type and would be able to enjoy the benefit of different performant implementations (stay tuned for updated pipeline(s)). -* `EpiloguePipeline` will modify and store out the result in the last phase. People usually will do lot of post-fusion at this stage, so we also abstract this concept. Currently we didn't do much thing at the epilogue stage but leave the room for future possible support. +* `EpiloguePipeline` is the last stage of the pipeline. It modifies and stores the result. Post-fusion can be done at this stage though the example only returns the result. ## codegen To speed up compile time, we instantiate the kernels into separate file. In this way we can benefit from parallel building from CMake/Make system. This is achieved by `generate.py` script. Besides, you can look into this script to learn how to instantiate a kernel instance step by step, which is described in `FMHA_FWD_KERNEL_BODY` variable.