Qianfeng Zhang
|
be3b27edd3
|
Add support of softmax in hstu attention
|
2026-06-23 09:27:59 +00:00 |
|
Qianfeng Zhang
|
ec4f174ac4
|
Add template parameter to gemm_0 MakeCBlockTile() for the need of defining PcompBlockTileType
|
2026-06-23 09:27:59 +00:00 |
|
Qianfeng Zhang
|
da50eea674
|
Move scaling by attn_scale to inside the main-loop
|
2026-06-23 09:27:59 +00:00 |
|
Qianfeng Zhang
|
3639fb8e38
|
Let IsTokenPairInsideMask() return bool type
|
2026-06-23 09:27:59 +00:00 |
|
Qianfeng Zhang
|
a71c996049
|
Add instances to consider for adding softmax support
|
2026-06-23 09:27:59 +00:00 |
|
Qianfeng Zhang
|
0abb52004a
|
Remove K0 from tile setting since it is not used
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
de198549ad
|
Change to pipeline so that it is easier to add support of using softmax
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
7e79736df7
|
Remove using IGLP method for instruction scheduling for kUseLocal true path
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
08f50c2c51
|
Fix in GetQKBlockGemm()
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
5199dbd027
|
Simplify the warp_gemm definitions in GetQKBlockGemm and GetKVBlockGemm
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
b0710e8871
|
Remove useless constant statement in the kernel
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
2c861d541f
|
Remove un-necessary HSTU_CHECK() callings
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
bc5616f1dc
|
Add HSTU_CHECK() and use it in example codes
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
f7f90c539e
|
Smalle update in reference hstu attention
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
9c4d76d96b
|
Detach HstuBlockMask from pipeline definition and construct the HstuBlockMask type in the kernel according to window_size
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
8313b34543
|
Unify the license statements on all the source files
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
2668bb3aee
|
Remove using MakeKargsImpl() to simplify the hstu kernel
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
a8c62920bf
|
Clarify the using of kSubQKHeaddim and kQKHeaddim so that less regular hdim (eg. 96, 160) can be efficiently supported
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
7a948eee1d
|
Use set_slice_tilie() to replace direct thread_buffer assignment
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
260ef2fdf2
|
Use xor transform to implement Q/K Lds descriptor for kKpack == 8 cases
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
56dab1298f
|
Remove selectable VLayout for simplifying the codes since hdim is always fatest dimension
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
e1ebd780c1
|
Using separate settings for gfx942 and gfx950
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
40091814c6
|
Tiny change in pipeline BlockGemm definition to adapt to the latest merging with develop branch
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
ee06d0b4fc
|
Add norm_dist parameter for hstu example to select either normal or uniform distribution to initialize data
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
da5db1773d
|
Tiny fix in HstuBlockMaskWithLocal::GetTileRangeAlongX()
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
446b62ad82
|
Adjust the atol and rtol and fix the check_err() using in example_hstu_attention.cpp
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
dde0729405
|
Add simple handling for max_atten_seqlen bigger than max_uih_len situations
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
c649b9b049
|
Tiny fix and comments in HstuBlockMaskWithLocal::IsFullTimeInsideMask()
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
de47bfe752
|
Update HstuBlockMaskWithLocal::GetTileRangeAlongX, add comments and test cases for causal == false
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
4ad55eab4d
|
Update to support min_full_attn_seqlen be bigger than max_uih_len
|
2026-06-23 09:27:58 +00:00 |
|
Qianfeng Zhang
|
0b54f1f43d
|
Add attn_scale MakeKargs() parameter support and update in example, reference codes
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
6364b641b8
|
Replace the integer max_seqlen by float scale_p as kernel/pipeline parameter
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
6784c0be5a
|
Use __builtin_amdgcn_sched_barrier(0x1) to prevent the compiler from unexpected codes arrangement
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
492c724b14
|
Fix added case in test_hstu_attention.sh
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
d3ed6ac473
|
Update in GetTileRangeAlongX to consider for non-causal+local_size>0 situation and add test case to test_hstu_attention.sh
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
7d698c2b78
|
Add three scripts for verification of jagged causal cases
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
14c955aade
|
Fix in GetTileRangeAlongX() and IsFullTileInsideMask() of HstuBlockMaskWithLocal
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
27019a61a0
|
Adjust the codes related to calculate i_m0 in the kernel
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
056166bbeb
|
[Performance] Use separate workgroups to handle seqlen scope [max_uih_len - minfull_attn_seqlen, seqlen]
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
b57939ff64
|
Fix comments in test_pytorch_hstu_mask.py scripts
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
47c4a0c2ec
|
Change the seqlen_q dim padding setting for o_dram and bias_dram
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
7efc0e226a
|
Correct some comments
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
5b295efe1e
|
Re-arrange the codes section for using sched_group_barrier
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
34edc4391c
|
Fix in using sched_group_barrier()
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
1caef1fb89
|
Move store_tile() caled before the current iteration
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
45a189d73d
|
Revert "Disable support of hdim64 amnd hdim256 for quick compiling and testing"
This reverts commit ed062f93e2.
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
09aa41ba9c
|
Disable support of hdim64 amnd hdim256 for quick compiling and testing
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
140af31e86
|
Fix bug in generate_instances.py and re-generate the instances
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
71b0641d75
|
Re-org the kernel parameters in HstuAttentionFwdBatchModeBaseKargs and HstuAttentionFwdJaggModeBaseKargs
|
2026-06-23 09:20:58 +00:00 |
|
Qianfeng Zhang
|
d131327aff
|
Remove num_target from HstuBlockMask class member since it overlaps the meaning of max_uih_len
|
2026-06-23 09:20:58 +00:00 |
|