GEMM Blockscale ABQuant Optimization (#3620)

* GEMM Blockscale ABQuant Optimization

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix precommit error

* clean

* Fix

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Ding, Yi <yi.ding@amd.com>
This commit is contained in:
kensclin
2026-01-23 01:39:38 +08:00
committed by GitHub
parent 9e049a32a1
commit 31a35ecab4
7 changed files with 161 additions and 51 deletions

View File

@@ -19,13 +19,13 @@ template <typename TileDistributedSpan_, // tile_distributed_span<...>
>
CK_TILE_DEVICE void sweep_tile_span(TileDistributedSpan_, const F& f)
{
using DstrSpan = remove_cvref_t<TileDistributedSpan_>;
using DstrSpanImpl = typename remove_cvref_t<TileDistributedSpan_>::Impl;
static_ford<typename DstrSpan::Impl>{}([&](auto dstr_idx_impl) {
constexpr auto dstr_idx = detail::make_tile_distributed_index(dstr_idx_impl);
f(dstr_idx);
});
if constexpr(DstrSpanImpl::size() == 0) // handle the 0-dim span case
f(detail::make_tile_distributed_index(sequence<>{}));
else
static_ford<DstrSpanImpl>{}(
[&](auto dstr_idx_impl) { f(detail::make_tile_distributed_index(dstr_idx_impl)); });
}
// unpacked span, this version support span with unpack(multi-arg) functor