Gino Lu
fb75da2467
sparse_attn: wire -mask and -attention_sink (block-map prune + attn mask)
2026-05-19 23:22:00 -04:00
Gino Lu
d939c3b4fc
sparse_attn: split-launch dispatch + 3-mode PV-skip
...
- Per-head pv_threshold via head_remap LUT (CLI: -pv_threshold_per_head);
sentinel 1e30 routes to kEnablePVSkip=false bucket
- kEnablePVSkip bool → PVSkipMode enum {kNone, kPerWarp, kPerBlock};
new kPerBlock matches upstream sm80 (LDS vote, V loads unconditional).
CLI: -pv_mode={none,warp,block}, default warp
- README: PV-skip modes section + MI300X 3-curve sparsity chart
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com >
2026-05-19 21:45:23 -04:00
Gino Lu
0f8b58ac88
sparse_attn: R25 Step 1 A1 — per-warp PV-skip (paper Algorithm 1) + V0 instantiation
...
Preserve the R25 Step 1 "A1 / redesign D" state before redesigning toward "B"
(per-CTA PV-skip matching upstream shipped reference). This snapshot lets us
restore A1 if the B redesign fails.
A1 redesign D pipeline (per-warp, arithmetic-only PV-skip, wrapped in
`if constexpr (kEnablePVSkip)`):
- include/ck_tile/ops/sparse_attn/pipeline/block_fmha_pipeline_qr_ks_vs_async_sparge.hpp
- include/ck_tile/ops/sparse_attn/kernel/fmha_fwd_sparge_kernel.hpp
V0 instantiation wiring (per gino_tmp/R25/programmer/v0_instance/REPORT.md):
- example/ck_tile/50_sparse_attn/codegen/ops/fmha_fwd_sparge.py
- example/ck_tile/50_sparse_attn/fmha_fwd_trek.hpp
- example/ck_tile/50_sparse_attn/sparge_blockmap_trek.hpp
- example/ck_tile/50_sparse_attn/sparge_blockmap_inst.cpp
- example/ck_tile/50_sparse_attn/codegen/cpp_symbol_map.py
- example/ck_tile/50_sparse_attn/CMakeLists.txt
- example/ck_tile/01_fmha/CMakeLists.txt
- example/ck_tile/50_sparse_attn/test_sparge.cpp (-pv_skip_compile=0|1 CLI)
This commit excludes all *_REVIEW.{hpp,cpp} mirror files (left untracked) and
all build artefacts. _vsa.hpp / _jenga.hpp are not modified.
Tag: R25-step1-A1-paper-aligned points at this commit.
2026-05-18 06:13:38 -04:00
Gino Lu
7103eacc99
refactor(sparse_attn): caller-owned workspace + dtype-aware sizing
...
Replace process-lifetime lazy hipMalloc K-stats workspace with a caller-owned
buffer; expose sparge_blockmap_get_workspace_size() / compute_workspace_layout()
host helpers. Split the combined sparge_blockmap_fwd into stage launchers
(sparge_kstats_fwd_oneshot + sparge_blockmap_only_fwd_oneshot) so the chained
launch is timed end-to-end.
Make pooled_k storage dtype follow KDataType (fp16/bf16) instead of fp32 to halve
workspace footprint and match dense-FMHA precision. Tighten per-head superparam
pointers to required (non-null) and assert N_k <= 256 in jenga MakeKargs to
document the 256-bool LDS staging cap. Drop the obsolete VSA extra-LDS staging.
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com >
2026-05-17 02:34:23 -04:00
Gino Lu
b00e5449c8
sparse_attn: split KStats kernel, add README + perf charts
...
- Split SpargeKStatsKernel/Pipeline out of BlockMap (Kernel A produces
per-block K stats workspace consumed by Kernel B), removing redundant
K-stat recomputation across Q-blocks.
- Add example/ck_tile/50_sparse_attn/README.md (status vs upstream pinned
to ae5b629, unported items, usage, references).
- Add example/ck_tile/50_sparse_attn/docs/{speedup_vs_sparsity,kernel_breakdown}.png
+ reusable plot_sparge_perf.py (b=2 h=32 s=16384 d=128 fp16 perf snapshot).
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com >
2026-05-05 03:13:24 -04:00
Gino Lu
d1d457b82a
Add sparge gpu pipeline in tile_example_sparge_vsa_sparse_attn
2026-04-13 03:34:08 -04:00
jiangyon.ren
4d2f8c111e
[CK_TILE][FMHA] Add sparse attention VSA ( #3341 )
...
* add sparse attention VSA
* fix the pre-commit
* Add jenga test and pre-commit
* add bf16 for vsa
* add jenga support bf16
* remove lse arg
* split kernel code to block & kernel
* fix the pre-commit
* fix the pre-commit
* fix the copyrights
* fix the copyright
* fix the copyright & rename block to pipeline
* fix the copyright and pipeline
* remove lse & dropout & add fmt
* fix the jenga&VSA code review
* remove the useless code & resolved the comments
* remove useless code
* remove useless code
* Clean up code
* Remove more unused code
* Re-format .hpp
* Refactor codegen scripts
---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
Co-authored-by: asleepzzz <hanwen.chang@amd.com >
2026-01-31 00:59:47 +08:00