Commit Graph

3749 Commits

Author SHA1 Message Date
Qianfeng Zhang
6cf17bc827 Miscellaneous small updates and corrections 2026-06-28 13:59:44 +00:00
Qianfeng Zhang
52eff34d21 Add scripts for testing hstu attention fwd with drop-out 2026-06-28 13:59:44 +00:00
Qianfeng Zhang
35234a492d Fix in kernel and HstuBlockMask interface for kHasDropout == true 2026-06-28 13:59:43 +00:00
Qianfeng Zhang
83676066af Fix in GetSmemSizeDropout() 2026-06-28 13:59:43 +00:00
Qianfeng Zhang
95d3ecfb14 Update for considering kHasDropout used with minfull_attn_seqlen > 0 2026-06-28 13:59:43 +00:00
Qianfeng Zhang
4250341a92 Fix identified issues for implementing kHasDropout == true 2026-06-28 13:59:43 +00:00
Qianfeng Zhang
c3df01a519 Add instances and enable dispatching for kHasDropout == true 2026-06-28 13:59:43 +00:00
Qianfeng Zhang
124f072158 Add p_drop parameter to example and drop-out logic to the reference code 2026-06-28 13:59:43 +00:00
Qianfeng Zhang
58fbfe766e Update to kernel and host interface for generating random numbers 2026-06-28 13:59:43 +00:00
Qianfeng Zhang
8a4ec6382d Add kernel and host interface for generating random numbers 2026-06-28 13:59:43 +00:00
Qianfeng Zhang
d5c872f504 Removing the class derivation to simplify struct HstuAttentionFwdCommonDropoutKargs 2026-06-24 09:03:39 +00:00
Qianfeng Zhang
0720bc48be Fix using multiplies 2026-06-23 10:20:17 +00:00
Qianfeng Zhang
ad769baea4 Fix make_kernel() template parameters 2026-06-23 09:38:37 +00:00
Qianfeng Zhang
0b0684aff2 Revert "[ck_tile] Add get_partition_index_v2 which uses warp_id in vgpr and to be used by tile_windows on lds-based tensor_view"
This reverts commit 2b0f3791a6.
2026-06-23 09:31:31 +00:00
Qianfeng Zhang
1805c985a0 Update to example_hstu_attention_fwd.cpp 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
dc1d433351 Rename generate_instances.py to generate_fwd_instances.py 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
503a017c42 Remove the using of kSubQKHeaddim 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
a81f32331f Add restriction on the relationship between HstuAttention<xxx>Problem and HstuAttention<xxx>TileSettingClass 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
79ebc7479d Add static_assert() in HstuAttentionFwdTileSettingClass 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
7a67ae4dd3 Update to the comments in reference_hstu_attention_bwd.hpp 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
7d2e575fed Fix the comments in reference_hstu_attention_fwd.hpp 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
d07b37c097 Remove un-used element-wise functions passed through pipelines' operator() interfaces 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
cc7e216fa6 Rename GetKVBlockGemm to GetPVTBlockGemm 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
06547efd90 Remove the kHasBias==true instances to save building time 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
f41777fbd2 Renaming BUILD_HSTU_FOR_GFX95_ONLY to BUILD_HSTU_FOR_GFX95 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
295136e48b Update the README.md according to the summary by claude code 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
a1ad9fc312 Fix the using of num_targets[] in run_group_hstu_attention 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
62627db768 Update to the comments in reference_hstu_attention_bwd.hpp 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
08873f0d50 Renaming in the dispatching codes and generate_instances.py scripts 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
c7de3af246 Split hstu_attention_util.hpp into host_util.hpp and kernel_util.hpp 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
89d6f5aa92 Remove un-needed includings from some hpp and cpp files 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
6f4f3eac48 Tiny update in generate_instances.py 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
673207ce59 Fix header file mapping bug in generate_instances.py 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
f73341de37 Some renaming in kernel and pipeline 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
42a3bfbab7 Update and fix for leeked changes and make the scripts be able to test/benchmark kStoreLSE cases 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
b17b41a1e6 Enable the kernel dispatching path from is_training & use_softmax to kStoreLSE 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
4414019296 Add instances and kStoreLSE template in dispatcher class to support outputting lse for fwd training 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
8f83a2841f Set lse tensor dim strides in example 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
02e5c23f9c Replace template kUseSoftmax/kStoreLSE by boolean parameters in reference fwd codes to save compiling time 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
bd126618d1 Add support for outputing lse in the example and reference hstu attention forward implementation 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
fa0a9c1656 Add support for preparing lse_dram_window in hstu fwd kernel 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
fd9af72c9f Kernel use types declared in the problem rather than the pipeline 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
ee5bd0ebba Tiny simplification with defining the Bias related Kargs 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
f41b0176d3 Add parameters used by storing lse in the fwd and fwd_splitkv_combine kernel to prepare for supporting training 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
270d073c88 Move num_splits/o_acc_ptr/l_acc_ptr out from HstuAttention<xxx>FwdParams struct 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
b2561b88e4 Add kStoreLSE template parameter to the problems 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
2a86bfb6f5 Implement host reference operator for hstu attention backward 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
ec58f92f05 Rename the reference interfaces and the files 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
7d317adf37 Update to MakeLSEaccDramTileDistribution trying to assign more threads to MThreadPerWarp so that block_tile_reduce_sync() work on less KThreadPerWarp 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
30b5d7bd01 Use buffer_view to create lse_acc_dram_naive so that out_of_boundary loading value can be specified (be -inf) 2026-06-23 09:27:59 +00:00