Files
composable_kernel/example
juuso-oskari 397febf42c CK-UA int32 overflow protection: add explicit template parameter for large caches
Add CachePtrInt32OverflowPossible template parameter (default false) to all
unified attention kernel traits. This enables dual kernel variants:
- Small cache (false): compile-time elimination of overflow checks for <100K blocks
- Large cache (true): runtime overflow checking with pointer rebasing for >=100K blocks

Key changes:
- Add CachePtrInt32OverflowPossible as 14th template parameter to UnifiedAttentionPipelineProblem
- Pass parameter through all kernel traits: decode, decode_small, decode_tiny, decode_bs32
- Implement overflow checking in pipeline with if constexpr for zero overhead when disabled
- Update dispatch macros with _SMALL_CACHE and _LARGE_CACHE variants
- Create instance files for both small and large cache variants (narrow, _s, _m tiers)
- Remove old MAX_NUM_BLOCKS inference logic (num_kv_heads is runtime, cannot infer)

Python calculates overflow possibility based on actual cache size and passes
it explicitly via cache_ptr_int32_overflow_possible parameter.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
2026-05-08 10:15:38 +00:00
..
2026-01-14 07:31:45 -08:00