Support fp8 dynamic quantization for fmha (#3206)

* Support qscale for dynamic quant, remove static quant

* Support hdim=256

* Remove bias test case for fp8

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: asleepzzz <hanwen.chang@amd.com>

[ROCm/composable_kernel commit: 5948dbffe4]
This commit is contained in:
rocking
2025-11-24 16:28:25 +08:00
committed by GitHub
parent dd7a2d199f
commit ca1a0da0c3
17 changed files with 369 additions and 280 deletions

View File

@@ -598,6 +598,8 @@ struct HostTensor
typename Data::size_type size() const { return mData.size(); }
T max() const { return *std::max_element(mData.begin(), mData.end()); }
// return a slice of this tensor
// for simplicity we just copy the data and return a new tensor
auto slice(std::vector<size_t> s_begin, std::vector<size_t> s_end) const