Shuffle fix for gfx950 (#3491)

* solve compiler issue

* solve the gfx950 mfma shuffle regression

* refactor jenkinsfile to handle arch name better

* [CK TILE] set divisor to count of thread along k dimension

* fix the compiler error

* solve degradation

* Finish the multiplies fix

* fix the scales

* solve compilation error

* solve the composes

* solve the error of tile sweeper

* fix the test and example

* fix for gfx950

---------

Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
Co-authored-by: Cong Ma <congma13@amd.com>
This commit is contained in:
Thomas Ning
2026-01-14 01:21:29 +08:00
committed by GitHub
parent 9908a87c31
commit 00c46785a8
33 changed files with 161 additions and 152 deletions

View File

@@ -42,7 +42,7 @@ struct scales
};
template <typename Scale>
CK_TILE_HOST_DEVICE_EXTERN scales(Scale) -> scales<Scale>;
scales(Scale) -> scales<Scale>;
template <typename Left = void, typename Right = Left>
struct plus
@@ -65,8 +65,6 @@ struct plus<void, void>
}
};
CK_TILE_HOST_DEVICE_EXTERN plus() -> plus<void, void>;
template <typename Left = void, typename Right = Left>
struct minus
{
@@ -88,8 +86,6 @@ struct minus<void, void>
}
};
CK_TILE_HOST_DEVICE_EXTERN minus() -> minus<void, void>;
template <typename Left = void, typename Right = Left>
struct multiplies
{
@@ -111,8 +107,6 @@ struct multiplies<void, void>
}
};
CK_TILE_HOST_DEVICE_EXTERN multiplies() -> multiplies<void, void>;
template <typename T>
struct maximize
{
@@ -341,8 +335,6 @@ struct equal<void, void>
}
};
CK_TILE_HOST_DEVICE_EXTERN equal() -> equal<void, void>;
template <>
struct equal<float, float>
{
@@ -382,8 +374,6 @@ struct less<void, void>
}
};
CK_TILE_HOST_DEVICE_EXTERN less() -> less<void, void>;
template <typename Left = void, typename Right = Left>
struct less_equal
{
@@ -405,8 +395,6 @@ struct less_equal<void, void>
}
};
CK_TILE_HOST_DEVICE_EXTERN less_equal() -> less_equal<void, void>;
template <>
struct less_equal<float, float>
{