Add v3 support for Groupd fwd conv+bias+clamp & ckProfiler (#2463)

* Add logging to IsSupported.

* Less casting in AddClamp

* Conv+bias+clamp instances & profiler BF16

* Fix 3D instances & run just 1x for verification.

* :Run just once for verification conv fwd.

* ckProfiler conv fwd clampwq

* Remove exec bit & formatting

* Add support for MultiD for grouped conv fwd v3.

* Enable 2Lds.

* clean

* align instances

* align instances

* profiler fixes

* Fixes

* fix

* fix

---------

Co-authored-by: Adam Osewski <root@quanta-ccs-aus-f01-19.cs-aus.dcgpu>
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
This commit is contained in:
Adam Osewski
2025-07-25 10:34:31 +02:00
committed by GitHub
parent b01a27ff22
commit c8eb2f995c
6 changed files with 1098 additions and 301 deletions

View File

@@ -379,10 +379,10 @@ struct AddClamp
__host__ __device__ constexpr void
operator()<half_t, half_t, half_t>(half_t& y, const half_t& x0, const half_t& x1) const
{
const half_t a = x0 + x1;
y = a > type_convert<half_t>(floor_)
? (a < type_convert<half_t>(ceil_) ? a : type_convert<half_t>(ceil_))
: type_convert<half_t>(floor_);
const half_t floor = type_convert<half_t>(floor_);
const half_t ceil = type_convert<half_t>(ceil_);
const half_t a = x0 + x1;
y = a > floor ? (a < ceil ? a : ceil) : floor;
};
template <>