mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-04-19 22:39:03 +00:00
introducing ck_tile! (#1216)
* enable gfx940
* switch between intrinsic mfma routines on mi100/200 and mi300
* fix mfma_int8 on MI300
* disable 2 int8 examples on MI300
* Update cmake-ck-dev.sh
* restore gitignore file
* modify Jenkinsfile to the internal repo
* Bump rocm-docs-core from 0.24.0 to 0.29.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.24.0 to 0.29.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.24.0...v0.29.0)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* initial enablement of gfx950
* fix clang format
* disable examples 31 and 41 int8 on gfx950
* add code
* fix build wip
* fix xx
* now can build
* naming
* minor fix
* wip fix
* fix macro for exp2; fix warpgemm a/b in transposedC
* unify as tuple_array
* Update the required Python version to 3.9
* Update executable name in test scripts
* re-structure tuple/array to avoid spill
* Merge function templates
* Fix format
* Add constraint to array<> ctor
* Re-use function
* Some minor changes
* remove wrong code in store_raw()
* fix compile issue in transpose
* Rename enum
Rename 'cood_transform_enum' to 'coord_transform_enum'
* let more integral_constant->constant, and formating
* make sure thread_buffer can be tuple/array
* temp fix buffer_store spill
* not using custom data type by default, now we can have ISA-level same code as opt_padding
* fix compile error, fp8 not ready now
* fix fp8 duplicated move/shift/and/or problem
* Default use CK_TILE_FLOAT_TO_FP8_STOCHASTIC rounding mode
* fix scratch in fp8 kernel
* update some readme
* fix merge from upstream
* sync with upstream
* sync upstream again
* sync 22
* remove unused
* fix clang-format
* update README of ck_tile example
* fix several issue
* let python version to be 3.8 as minimal
* remove ck_tile example from default cmake target like all/install/check
* remove mistake
* 1).support receipe in generate.py 2).use simplified mask type 3).change left/right to pass into karg
* fix some bug in group-mode masking and codegen. update README
* F8 quantization for FMHA forward (#1224)
* Add SAccElementFunction, PComputeElementFunction, OAccElementFunction in pipeline
* Add element function to fmha api
* Adjust P elementwise function
* Fix bug of elementwise op, our elementwise op is not inout
* Add some elementwise op, prepare to quantization
* Let generate.py can generate different elementwise function
* To prevent compiler issue, remove the elementwise function we have not used.
* Remove f8 pipeline, we should share the same pipeline even in f8
* Remove remove_cvref_t
* Avoid warning
* Fix wrong fp8 QK/KV block gemm setting
* Check fp8 rounding error in check_err()
* Set fp8 rounding error for check_err()
* Use CK_TILE_FLOAT_TO_FP8_STANDARD as default fp8 rounding mode
* 1. codgen the f8 api and kernel
2. f8 host code
* prevent warning in filter mode
* Remove not-in-use elementwise function kargs
* Remove more not-in-use elementwise function kargs
* Small refinements in C++ source files
* Use conditional_t<> to simplify code
* Support heterogeneous argument for binary function types
* Re-use already-existing scales<> functor template
* Fix wrong value produced by saturating
* Generalize the composes<> template
* Unify saturates<> implementation
* Fix type errors in composes<>
* Extend less_equal<>
* Reuse the existing template less_equal<> in check_err()
* Add equal<float> & equal<double>
* Rename check_err() parameter
* Rename check_err() parameter
* Add FIXME comment for adding new macro in future
* Remove unnecessary cast to void
* Eliminate duplicated code
* Avoid dividing api pool into more than 2 groups
* Use more clear variable names
* Use affirmative condition in if stmt
* Remove blank lines
* Donot perfect forwarding in composes<>
* To fix compile error, revert generate.py back to 4439cc107d
* Fix bug of p element function
* Add compute element op to host softmax
* Remove element function in api interface
* Extract user parameter
* Rename pscale and oscale variable
* rename f8 to fp8
* rename more f8 to fp8
* Add pipeline::operator() without element_functor
* 1. Remove deprecated pipeline enum
2. Refine host code parameter
* Use quantization range as input
* 1. Rename max_dtype to dtype_max.
2. Rename scale to scale_s
3.Add init description
* Refine description
* prevent early return
* unify _squant kernel name in cpp, update README
* Adjust the default range.
* Refine error message and bias range
* Add fp8 benchmark and smoke test
* fix fp8 swizzle_factor=4 case
---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: Jing Zhang <jizha@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Po-Yen, Chen <PoYen.Chen@amd.com>
Co-authored-by: rocking <ChunYu.Lai@amd.com>
This commit is contained in:
4
include/ck_tile/ops/common/README.md
Normal file
4
include/ck_tile/ops/common/README.md
Normal file
@@ -0,0 +1,4 @@
|
||||
## common
|
||||
this folder is designed not to be included directly by use, e.g. if use include `ck_tile/ops/fmha.hpp`, then everything under `common` should also be included.
|
||||
|
||||
to achieve this we will duplicate the header include path under `common` to other module under `ops/*` inside remod.py. for internal developer, you can also include `ck_tile/ops/common.hpp` for convenience. (and so does external users...)
|
||||
412
include/ck_tile/ops/common/tensor_layout.hpp
Normal file
412
include/ck_tile/ops/common/tensor_layout.hpp
Normal file
@@ -0,0 +1,412 @@
|
||||
// SPDX-License-Identifier: MIT
|
||||
// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.
|
||||
|
||||
#pragma once
|
||||
|
||||
// TODO: this folder does not match the single namespace rule. need to refactor in the future
|
||||
namespace ck_tile {
|
||||
namespace tensor_layout {
|
||||
|
||||
struct BaseTensorLayout
|
||||
{
|
||||
};
|
||||
|
||||
namespace gemm {
|
||||
|
||||
struct RowMajor : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "RowMajor";
|
||||
};
|
||||
|
||||
struct ColumnMajor : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "ColumnMajor";
|
||||
};
|
||||
} // namespace gemm
|
||||
|
||||
namespace convolution {
|
||||
|
||||
// input tensor
|
||||
// packed NCW/NCHW/NCDHW
|
||||
struct NCW : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NCW";
|
||||
};
|
||||
|
||||
struct NCHW : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NCHW";
|
||||
};
|
||||
|
||||
struct NCDHW : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NCDHW";
|
||||
};
|
||||
|
||||
// packed GNCW/GNCHW/GNCDHW
|
||||
struct GNCW : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GNCW";
|
||||
};
|
||||
|
||||
struct GNCHW : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GNCHW";
|
||||
};
|
||||
|
||||
struct GNCDHW : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GNCDHW";
|
||||
};
|
||||
|
||||
// input tensor
|
||||
// packed NWC/NHWC/NDHWC
|
||||
struct NWC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NWC";
|
||||
};
|
||||
|
||||
struct NHWC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NHWC";
|
||||
};
|
||||
|
||||
struct NDHWC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NDHWC";
|
||||
};
|
||||
|
||||
// input tensor
|
||||
// packed GNWC/GNHWC/GNDHWC
|
||||
struct GNWC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GNWC";
|
||||
};
|
||||
|
||||
struct GNHWC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GNHWC";
|
||||
};
|
||||
|
||||
struct GNDHWC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GNDHWC";
|
||||
};
|
||||
|
||||
// for input bias
|
||||
struct GC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GC";
|
||||
};
|
||||
|
||||
// input tensor
|
||||
// packed NWGC/NHWGC/NDHWGC
|
||||
struct NWGC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NWGC";
|
||||
};
|
||||
|
||||
struct NHWGC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NHWGC";
|
||||
};
|
||||
|
||||
struct NDHWGC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NDHWGC";
|
||||
};
|
||||
|
||||
// input tensor
|
||||
// strided layout
|
||||
struct G_NW_C : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "G_NW_C";
|
||||
};
|
||||
|
||||
struct G_NHW_C : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "G_NHW_C";
|
||||
};
|
||||
|
||||
struct G_NDHW_C : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "G_NDHW_C";
|
||||
};
|
||||
|
||||
// for input bias
|
||||
struct G_C : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "G_C";
|
||||
};
|
||||
|
||||
// weight tensor
|
||||
// packed KCX/KCYX/KCZYX
|
||||
struct KCX : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "KCX";
|
||||
};
|
||||
|
||||
struct KCYX : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "KCYX";
|
||||
};
|
||||
|
||||
struct KCZYX : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "KCZYX";
|
||||
};
|
||||
|
||||
// weight tensor
|
||||
// packed KCX/KCYX/KCZYX
|
||||
struct GKCX : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GKCX";
|
||||
};
|
||||
|
||||
struct GKCYX : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GKCYX";
|
||||
};
|
||||
|
||||
struct GKCZYX : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GKCZYX";
|
||||
};
|
||||
|
||||
// weight tensor
|
||||
// packed KXC/KYXC/KZYXC
|
||||
struct KXC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "KXC";
|
||||
};
|
||||
|
||||
struct KYXC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "KYXC";
|
||||
};
|
||||
|
||||
struct KZYXC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "KZYXC";
|
||||
};
|
||||
|
||||
// weight tensor
|
||||
// packed GKXC/GKYXC/GKZYXC
|
||||
struct GKXC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GKXC";
|
||||
};
|
||||
|
||||
struct GKYXC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GKYXC";
|
||||
};
|
||||
|
||||
struct GKZYXC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GKZYXC";
|
||||
};
|
||||
|
||||
// weight tensor
|
||||
// packed KXGC/KYXGC/KZYXGC
|
||||
struct KXGC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "KXGC";
|
||||
};
|
||||
|
||||
struct KYXGC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "KYXGC";
|
||||
};
|
||||
|
||||
struct KZYXGC : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "KZYXGC";
|
||||
};
|
||||
|
||||
// weight tensor
|
||||
// strided
|
||||
struct G_K_X_C : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "G_K_X_C";
|
||||
};
|
||||
|
||||
struct G_K_YX_C : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "G_K_YX_C";
|
||||
};
|
||||
|
||||
struct G_K_ZYX_C : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "G_K_ZYX_C";
|
||||
};
|
||||
|
||||
// output tensor
|
||||
// packed NKW/NKHW/NKDHW
|
||||
struct NKW : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NKW";
|
||||
};
|
||||
|
||||
struct NKHW : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NKHW";
|
||||
};
|
||||
|
||||
struct NKDHW : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NKDHW";
|
||||
};
|
||||
|
||||
// output tensor
|
||||
// packed GNKW/GNKHW/GNKDHW
|
||||
struct GNKW : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GNKW";
|
||||
};
|
||||
|
||||
struct GNKHW : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GNKHW";
|
||||
};
|
||||
|
||||
struct GNKDHW : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GNKDHW";
|
||||
};
|
||||
|
||||
// output tensor
|
||||
// packed NWK/NHWK/NDHWK
|
||||
struct NWK : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NWK";
|
||||
};
|
||||
|
||||
struct NHWK : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NHWK";
|
||||
};
|
||||
|
||||
struct NDHWK : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NDHWK";
|
||||
};
|
||||
|
||||
// output tensor
|
||||
// packed GNWK/GNHWK/GNDHWK
|
||||
struct GNWK : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GNWK";
|
||||
};
|
||||
|
||||
struct GNHWK : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GNHWK";
|
||||
};
|
||||
|
||||
struct GNDHWK : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GNDHWK";
|
||||
};
|
||||
|
||||
// output tensor
|
||||
// packed NWGK/NHWGK/NDHWGK
|
||||
struct NWGK : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NWGK";
|
||||
};
|
||||
|
||||
struct NHWGK : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NHWGK";
|
||||
};
|
||||
|
||||
struct NDHWGK : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NDHWGK";
|
||||
};
|
||||
|
||||
// output tensor
|
||||
// strided layout
|
||||
struct G_NW_K : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "G_NW_K";
|
||||
};
|
||||
|
||||
struct G_NHW_K : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "G_NHW_K";
|
||||
};
|
||||
|
||||
struct G_NDHW_K : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "G_NDHW_K";
|
||||
};
|
||||
|
||||
// for output bias
|
||||
struct G_K : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "G_K";
|
||||
};
|
||||
|
||||
// K-reduced output tensor (packed)
|
||||
struct GNW : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GNW";
|
||||
};
|
||||
|
||||
struct GNHW : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GNHW";
|
||||
};
|
||||
|
||||
struct GNDHW : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "GNDHW";
|
||||
};
|
||||
|
||||
// K-reduced output tensor (packed)
|
||||
struct NWG : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NWG";
|
||||
};
|
||||
|
||||
struct NHWG : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NHWG";
|
||||
};
|
||||
|
||||
struct NDHWG : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "NDHWG";
|
||||
};
|
||||
|
||||
// K-reduced output tensor (strided)
|
||||
struct G_NW : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "G_NW";
|
||||
};
|
||||
|
||||
struct G_NHW : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "G_NHW";
|
||||
};
|
||||
|
||||
struct G_NDHW : public BaseTensorLayout
|
||||
{
|
||||
static constexpr const char* name = "G_NDHW";
|
||||
};
|
||||
|
||||
} // namespace convolution
|
||||
|
||||
template <
|
||||
typename Layout,
|
||||
typename std::enable_if<std::is_base_of<BaseTensorLayout, Layout>::value, bool>::type = false>
|
||||
std::ostream& operator<<(std::ostream& os, const Layout&)
|
||||
{
|
||||
os << Layout::name;
|
||||
return os;
|
||||
}
|
||||
|
||||
} // namespace tensor_layout
|
||||
} // namespace ck_tile
|
||||
Reference in New Issue
Block a user