Commit Graph

1351 Commits

Author SHA1 Message Date
Qianfeng Zhang
cc7e216fa6 Rename GetKVBlockGemm to GetPVTBlockGemm 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
06547efd90 Remove the kHasBias==true instances to save building time 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
f41777fbd2 Renaming BUILD_HSTU_FOR_GFX95_ONLY to BUILD_HSTU_FOR_GFX95 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
295136e48b Update the README.md according to the summary by claude code 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
a1ad9fc312 Fix the using of num_targets[] in run_group_hstu_attention 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
62627db768 Update to the comments in reference_hstu_attention_bwd.hpp 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
08873f0d50 Renaming in the dispatching codes and generate_instances.py scripts 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
c7de3af246 Split hstu_attention_util.hpp into host_util.hpp and kernel_util.hpp 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
89d6f5aa92 Remove un-needed includings from some hpp and cpp files 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
6f4f3eac48 Tiny update in generate_instances.py 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
673207ce59 Fix header file mapping bug in generate_instances.py 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
f73341de37 Some renaming in kernel and pipeline 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
42a3bfbab7 Update and fix for leeked changes and make the scripts be able to test/benchmark kStoreLSE cases 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
b17b41a1e6 Enable the kernel dispatching path from is_training & use_softmax to kStoreLSE 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
4414019296 Add instances and kStoreLSE template in dispatcher class to support outputting lse for fwd training 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
8f83a2841f Set lse tensor dim strides in example 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
02e5c23f9c Replace template kUseSoftmax/kStoreLSE by boolean parameters in reference fwd codes to save compiling time 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
bd126618d1 Add support for outputing lse in the example and reference hstu attention forward implementation 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
fa0a9c1656 Add support for preparing lse_dram_window in hstu fwd kernel 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
fd9af72c9f Kernel use types declared in the problem rather than the pipeline 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
ee5bd0ebba Tiny simplification with defining the Bias related Kargs 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
f41b0176d3 Add parameters used by storing lse in the fwd and fwd_splitkv_combine kernel to prepare for supporting training 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
270d073c88 Move num_splits/o_acc_ptr/l_acc_ptr out from HstuAttention<xxx>FwdParams struct 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
b2561b88e4 Add kStoreLSE template parameter to the problems 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
2a86bfb6f5 Implement host reference operator for hstu attention backward 2026-06-23 09:28:00 +00:00
Qianfeng Zhang
ec58f92f05 Rename the reference interfaces and the files 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
7d317adf37 Update to MakeLSEaccDramTileDistribution trying to assign more threads to MThreadPerWarp so that block_tile_reduce_sync() work on less KThreadPerWarp 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
30b5d7bd01 Use buffer_view to create lse_acc_dram_naive so that out_of_boundary loading value can be specified (be -inf) 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
9a7cc5b4a3 Use partition_index parameter for all get_x_indices_from_distributed_indices() calls 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
9fbe96ab76 Update to the cross_attention test/bench scripts 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
24329f15d1 Add implementation of hstu fwd splitkv for softmax path 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
d2bc927242 Fix the calling context for type_context in scale_tile_in_scalar()/scale_tile_in_pack 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
ea5af27b62 Re-format the .hpp/.cpp files using clang-format-18 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
bd8a87301b Fix potential bug in kernel host interface BlockSize() 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
f73d7d2f8a More consideration in MakeOaccDramTileDistribution() in splitkv_combine pipeline policy 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
e48bcff488 Use inline-assembly based v_pk_mul_f32 to scale tile pcomp_tile in non-softmax pipeline on gfx950 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
c6dfe030d0 Add -fno-slp-vectorize option for building hstu kernels on gfx950 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
1f4319ce91 Use include <...> format to refer to header files from ck_tile 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
a97c7a75ce Mark low probability branch as unlikely in the softmax pipelines 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
68bbcac775 Use type_convert to convert float constant to CompDataType 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
d243b275da Implement conditional softmax rescale in trload with_softmax pipeline 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
671c65e178 Implement conditional softmax rescale in non-trload with_softmax pipeline 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
cfbd8a342a Renaming the test_hstu_attention_seqlen_kv.sh to test_hstu_cross_attention.sh 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
62bf2296c6 Remove exposing kUseTrLoad as template parameter of pipeline problem 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
67f9461b42 Simplification in the cross_attention testing/benchmarking scripts 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
bd46155431 Remove max_target 3200 cases from cross_attention testing and benchmarking 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
f99ed6225b Clarify the using the max_seqlen and max_seqlen_q 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
6f2a73b17d Add scripts for testing/benchmarking cross_attention cases 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
8a7f0a8e99 Clarify the using of group_max_seqlens[] and group_input_max_uih_seqlens[] parameters for group attention example 2026-06-23 09:27:59 +00:00
Qianfeng Zhang
c0922a6cb8 Add implementation of fwd splitkv on no_softmax path 2026-06-23 09:27:59 +00:00