Commit Graph

102 Commits

Author SHA1 Message Date
Illia Silin
a45e7f93e0 only build tests and examples if user sets GPU_TARGETS (#1565)
[ROCm/composable_kernel commit: f46a9eee9d]
2024-10-10 15:31:56 -07:00
Illia Silin
5ee327fd89 fix the target selection logic (#1561)
[ROCm/composable_kernel commit: 2e1165c1a7]
2024-10-09 15:21:57 -07:00
Illia Silin
ba9783ece1 add a CK_USE_CODEGEN build argument to enable codegen (#1552)
* add a CK_USE_CODEGEN build argument to enable codegen

* fix cmake codegen logic

[ROCm/composable_kernel commit: 7733ae167b]
2024-10-07 15:45:19 -07:00
Illia Silin
ee93500dad Fix build logic using GRU_ARCHS. (#1536)
* update build logic with GPU_ARCHS

* fix the GPU_ARCHS build for codegen

* unset GPU_TARGETS when GPU_ARCHS are set

[ROCm/composable_kernel commit: 7d8ea5f08b]
2024-10-07 08:18:23 -07:00
arai713
a1d3ec4b36 Codegen build (#1526)
* updating codegen build for MIOpen access: adding .cmake for codegen component

(cherry picked from commit 652a7c0463)

* updating CMake

(cherry picked from commit a685822e36)

[ROCm/composable_kernel commit: b545de175a]
2024-10-04 10:51:50 -07:00
Jun Liu
3739cf9f74 Customize filesystem in CK for legacy systems (#1509)
* Legacy support: customized filesystem

* Update cmakefile for python alternative path

* fix build issues

* CK has no boost dependency

* More fixes to issues found on legay systems

* fix clang format issue

* Check if blob is correctly generated in cmake

* fix the python issues

* add a compiler flag for codegen when using alternative python

* use target_link_options instead of target_compile_options

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 81bc1496b2]
2024-09-13 07:51:07 -07:00
Illia Silin
234bc58d2d Add an option to select an alternative python version during build. (#1496)
* locate a newwer version of python when -DRHEL=ON flag is set

* allow setting python version on cmake command line

[ROCm/composable_kernel commit: 841009c5ee]
2024-09-04 07:36:27 -07:00
Illia Silin
1ffa80536c fix codegen rtc lib build issue (#1485)
[ROCm/composable_kernel commit: 25935b57a0]
2024-08-23 15:11:47 -07:00
arai713
5e2632e486 Codegen INSTANCES_ONLY build (#1468)
* initial push - altering codegen build

* fix the codegen cmake

* enable codegen build for gfx908 and gfx90a

* enable building codegen with INSTANCES_ONLY=ON

* updating ck_rtc

* remove gpu targets for codegen and rename tests

* make codegen tests dependencies of tests and check targets

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 967b1f0fda]
2024-08-22 07:24:55 -07:00
Illia Silin
ad65d8d5b0 Re-enable fp8 types for all architectures. (#1470)
* re-enable fp8 and bf8 for all targets

* restore the fp8 gemm instances

* re-enable conv_3d fp8 on all architectures

* diasble several fp8 gemm instances on all architectures except gfx94

* clang format fix

[ROCm/composable_kernel commit: c8b6b64240]
2024-08-16 16:07:52 -06:00
trixirt
1eeb32a64d Check compiler flags before using (#1403)
* Check compiler flags before using

The user's compiler may not support these flags, so check.
Resolves failures on Fedora.

Signed-off-by: Tom Rix <trix@redhat.com>

* fix syntax CMakeLists.txt

Fix syntax in the check_cxx_compiler_flag.

---------

Signed-off-by: Tom Rix <trix@redhat.com>
Co-authored-by: Tom Rix <trix@redhat.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 49769ec889]
2024-08-14 20:43:10 -07:00
Haocong WANG
65d6442b4c [GEMM] gemm_universal related optimization (#1453)
* replace buffer_atomic with global_atomic

* fixed global_atomic_add

* added bf16 atomic_add

* format

* clang-format-12

* clean

* clean

* add guards

* Update gtest.cmake

* enabled splitk_gemm_multi_d

* format

* add ckProfiler

* format

* fixed naming

* format

* clean

* clean

* add guards

* fix clang format

* format

* add kbatch printout

* clean

* Add rocm6.2 related gemm optimization

* Limit bf16 atomic usage

* remove redundant RCR gemm_universal instance

* Add RRR fp8 gemm universal instance

* Bug fix

* Add GPU_TARGET guard to FP8/BF8 target

* bug fix

* update cmake

* remove all fp8/bf8 example if arch not support

* Enable fp8 RRR support in ckProfiler

* limit greedy-reverse flag to gemm_universal in ckProfiler

---------

Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: Jing Zhang <jizhan@meta.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 3049b5467c]
2024-08-14 10:42:30 +08:00
arai713
ab0829d8bd Codegen build w/CK (#1428)
* initial push

* cleaned up compiler errors

* removed commented code

* build codegen folder only for gfx9 targets

* remove separate stage for codegen tests from CI

* removed commented code from CMake

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: da214a5a58]
2024-08-09 08:15:06 -07:00
Illia Silin
9e9b3d563b check if the coerce-illegal-types flag is supported (#1451)
[ROCm/composable_kernel commit: ae3b8ff86c]
2024-08-08 07:29:29 -07:00
Jun Liu
37921efb24 Fix ROCm 6.2 compiler not fully supporting gfx12 when building CK with INSTANCES_ONLY (#1446)
[ROCm/composable_kernel commit: afbf6350f3]
2024-08-06 13:06:53 -07:00
Illia Silin
3d3819e0b3 Add compiler flags for ROCm versions 6.2+ (#1429)
* add compiler flags to fix compiler issues

* fix typo.

* disable test_smfmac_op on all devices except gfx942

* specify full path to compiler in CI

[ROCm/composable_kernel commit: d311c95396]
2024-08-01 08:27:52 -07:00
trixirt
9348857732 Introduce cmake USE_GLIBCXX_ASSERTIONS option (#1404)
A standard option in Fedora packaging that is used to check
the correctness of c++ use of the standard c++ library.

Signed-off-by: Tom Rix <trix@redhat.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 733f33af78]
2024-07-25 19:28:17 -07:00
Mateusz Ozga
2db188a709 An option whether to colorize output during build (#1390)
[ROCm/composable_kernel commit: 9cac282793]
2024-07-16 09:52:44 -07:00
Illia Silin
32620bf884 [ASAN builds] Modify the list of default targets for ASAN builds. (#1389)
* add a build parameter to build only XNACK targets

* use ENABLE_ASAN_PACKAGING flag to set targets for ASAN builds

---------

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

[ROCm/composable_kernel commit: 4c3107fdcb]
2024-07-16 09:19:23 -07:00
Illia Silin
1223a0ab72 [gfx12] add gfx12 to the default target list (#1379)
[ROCm/composable_kernel commit: a8eb872055]
2024-07-10 14:54:04 -07:00
Illia Silin
cd1e33cce4 Merging the gfx12 code into public repo. (#1362)
[ROCm/composable_kernel commit: 941d1f7ce0]
2024-06-27 00:33:34 -07:00
zjing14
05d0077378 Remove gfx900 and gfx906 from default target device to reduce package size (#1351)
[ROCm/composable_kernel commit: 8db331a511]
2024-06-19 11:47:18 -07:00
Illia Silin
6cf9f7f72c Select appropriate GPU targets for instances, tests, and examples. (#1304)
* set individual gpu targets for instances, examples, tests

* fix path to hip compiler

* fix path to hip compiler once more

* aggregate device macros in ck_tile config header

* fix the cmake logic for instances

* fix clang format

* add gfx900 and gfx906 to default set of targets

[ROCm/composable_kernel commit: 7b027d5643]
2024-05-22 11:45:27 -07:00
Illia Silin
a90f0099fc Code clean-up (#1285)
* code clean-up

* remove the profiling output samples

[ROCm/composable_kernel commit: 566b6480a2]
2024-05-10 09:41:39 -07:00
Illia Silin
d89deae29c Downgrade minimum required python version to 3.6 (#1274)
[ROCm/composable_kernel commit: 7797f7c7a1]
2024-05-01 15:34:56 -07:00
Illia Silin
12b1947344 Upgrade to ROCm6.1 and turn on the -enable-post-misched=0 compiler flag. (#1250)
* add rocm6.1 docker and make it default for CI

* fix typo

* move the rocm6.1 image into public dockerhub repo

[ROCm/composable_kernel commit: caae537d8e]
2024-04-18 11:10:23 -05:00
carlushuang
db614b49eb introducing ck_tile! (#1216)
* enable gfx940

* switch between intrinsic mfma routines on mi100/200 and mi300

* fix mfma_int8 on MI300

* disable 2 int8 examples on MI300

* Update cmake-ck-dev.sh

* restore gitignore file

* modify Jenkinsfile to the internal repo

* Bump rocm-docs-core from 0.24.0 to 0.29.0 in /docs/sphinx

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.24.0 to 0.29.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.24.0...v0.29.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* initial enablement of gfx950

* fix clang format

* disable examples 31 and 41 int8 on gfx950

* add code

* fix build wip

* fix xx

* now can build

* naming

* minor fix

* wip fix

* fix macro for exp2; fix warpgemm a/b in transposedC

* unify as tuple_array

* Update the required Python version to 3.9

* Update executable name in test scripts

* re-structure tuple/array to avoid spill

* Merge function templates

* Fix format

* Add constraint to array<> ctor

* Re-use function

* Some minor changes

* remove wrong code in store_raw()

* fix compile issue in transpose

* Rename enum
Rename 'cood_transform_enum' to 'coord_transform_enum'

* let more integral_constant->constant, and formating

* make sure thread_buffer can be tuple/array

* temp fix buffer_store spill

* not using custom data type by default, now we can have ISA-level same code as opt_padding

* fix compile error, fp8 not ready now

* fix fp8 duplicated move/shift/and/or problem

* Default use CK_TILE_FLOAT_TO_FP8_STOCHASTIC rounding mode

* fix scratch in fp8 kernel

* update some readme

* fix merge from upstream

* sync with upstream

* sync upstream again

* sync 22

* remove unused

* fix clang-format

* update README of ck_tile example

* fix several issue

* let python version to be 3.8 as minimal

* remove ck_tile example from default cmake target like all/install/check

* remove mistake

* 1).support receipe in generate.py 2).use simplified mask type 3).change left/right to pass into karg

* fix some bug in group-mode masking and codegen. update README

* F8 quantization for FMHA forward (#1224)

* Add SAccElementFunction, PComputeElementFunction, OAccElementFunction in pipeline

* Add element function to fmha api

* Adjust P elementwise function

* Fix bug of elementwise op, our elementwise op is not inout

* Add some elementwise op, prepare to quantization

* Let generate.py can generate different elementwise function

* To prevent compiler issue, remove the elementwise function we have not used.

* Remove f8 pipeline, we should share the same pipeline even in f8

* Remove remove_cvref_t

* Avoid warning

* Fix wrong fp8 QK/KV block gemm setting

* Check fp8 rounding error in check_err()

* Set fp8 rounding error for check_err()

* Use CK_TILE_FLOAT_TO_FP8_STANDARD as default fp8 rounding mode

* 1. codgen the f8 api and kernel
2. f8 host code

* prevent warning in filter mode

* Remove not-in-use elementwise function kargs

* Remove more not-in-use elementwise function kargs

* Small refinements in C++ source files

* Use conditional_t<> to simplify code

* Support heterogeneous argument for binary function types

* Re-use already-existing scales<> functor template

* Fix wrong value produced by saturating

* Generalize the composes<> template

* Unify saturates<> implementation

* Fix type errors in composes<>

* Extend less_equal<>

* Reuse the existing template less_equal<> in check_err()

* Add equal<float> & equal<double>

* Rename check_err() parameter

* Rename check_err() parameter

* Add FIXME comment for adding new macro in future

* Remove unnecessary cast to void

* Eliminate duplicated code

* Avoid dividing api pool into more than 2 groups

* Use more clear variable names

* Use affirmative condition in if stmt

* Remove blank lines

* Donot perfect forwarding in composes<>

* To fix compile error, revert generate.py back to 4439cc107d

* Fix bug of p element function

* Add compute element op to host softmax

* Remove element function in api interface

* Extract user parameter

* Rename pscale and oscale variable

* rename f8 to fp8

* rename more f8 to fp8

* Add pipeline::operator() without element_functor

* 1. Remove deprecated pipeline enum
2. Refine host code parameter

* Use quantization range as input

* 1. Rename max_dtype to dtype_max.
2. Rename scale to scale_s
3.Add init description

* Refine description

* prevent early return

* unify _squant kernel name in cpp, update README

* Adjust the default range.

* Refine error message and bias range

* Add fp8 benchmark and smoke test

* fix fp8 swizzle_factor=4 case

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: Jing Zhang <jizha@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Po-Yen, Chen <PoYen.Chen@amd.com>
Co-authored-by: rocking <ChunYu.Lai@amd.com>

[ROCm/composable_kernel commit: db376dd8a4]
2024-04-15 19:27:12 -05:00
Illia Silin
5f77ef0991 Update the config.h after the CK_USE_XDL/WMMA are set. (#1236)
* pass XDL and WMMA macros to libs that use CK

* update config.h after XDL and WMMA macros get set

[ROCm/composable_kernel commit: 7cdf5a96d2]
2024-04-12 10:55:02 -07:00
Illia Silin
f559597c46 Split the instances by architecture. (#1223)
* parse examples inside the add_example_executable function

* fix the example 64 cmake file

* add xdl flag to the gemm_bias_softmax_gemm_permute example

* add filtering of tests based on architecture type

* enable test_grouped_gemm for gfx9 only

* enable test_transpose only for gfx9

* only linnk test_transpose if it gets built

* split the gemm instances by architectures

* split gemm_bilinear,grouped_conv_bwd_weight instances by targets

* split instances by architecture

* split grouped_conv instances by architecture

* fix clang format

* fix the if-else logic in group_conv headers

* small fix for grouped convolution instances

* fix the grouped conv bwd weight dl instances

* fix client examples

* only enable client examples 3 and 4 on gfx9

* set the gfx9 macro

* make sure the architecture macros are set by cmake

* use separate set of xdl/wmma flags for host code

* sinmplify the main cmake file

* add conv_fwd_bf8 instance declaration

[ROCm/composable_kernel commit: ae57e5938e]
2024-04-02 09:42:17 -07:00
Illia Silin
006e903d91 fix the cmake option syntax (#1117)
[ROCm/composable_kernel commit: fbf31a2ea3]
2024-01-03 07:56:44 -08:00
Illia Silin
b048f22fb2 adding -Wno-switch-default compiler flag (#1115)
[ROCm/composable_kernel commit: b268f273de]
2024-01-02 14:01:12 -08:00
Artur Wojcik
a4bd3ff6db enable compilation of INSTANCES_ONLY for Windows (#1082)
* enable compilation of INSTANCES_ONLY for Windows

* suppress ROCMChecks warnings on GoogleTests

* suppress -Wfloat-equal warning on GoogleTests

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: fb5bd51b42]
2023-12-20 14:34:53 -08:00
Jun Liu
d29b1436fb ROCm 6.0 replaces all __HIP_PLATFORM_HCC__ with __HIP_PLATFORM_AMD__ (#1106)
* ROCm 6.0 replaces all __HIP_PLATFORM_HCC__ with __HIP_PLATFORM_AMD__

* make it backward compatible

* Update .clang-tidy

* Update ClangTidy.cmake

[ROCm/composable_kernel commit: 3ab1838fb0]
2023-12-19 07:16:49 -08:00
Illia Silin
d7a9aeeab6 add -Wno-pass-failed compiler flag (#1105)
[ROCm/composable_kernel commit: 3726a1730e]
2023-12-19 07:15:24 -08:00
trixirt
04d5dfa382 cmake: Add CK_PARALLEL_LINK_JOBS and CK_PARALLEL_COMPILE_JOBS options (#1063)
Copied from the llvm-project LLVM_PARALLEL_*_JOBS

Concurrent linking can break the build as well as having too many
compile jobs for the avaiable memory.  These options allow the user
to fine tune the build to fit within their machines memory
constraints.

An example use on linux is
COMPILE_JOBS=`cat /proc/cpuinfo | grep -m 1 'cpu cores' | awk '{ print $4 }'`
if [ ${COMPILE_JOBS}x = x ]; then
  COMPILE_JOBS=1
fi
BUILD_MEM=4
MEM_KB=0
MEM_KB=`cat /proc/meminfo | grep MemTotal | awk '{ print $2 }'`
MEM_MB=`eval "expr ${MEM_KB} / 1024"`
MEM_GB=`eval "expr ${MEM_MB} / 1024"`
COMPILE_JOBS_MEM=`eval "expr 1 + ${MEM_GB} / ${BUILD_MEM}"`
if [ "$COMPILE_JOBS_MEM" -lt "$COMPILE_JOBS" ]; then
  COMPILE_JOBS=$COMPILE_JOBS_MEM
fi
LINK_MEM=32
LINK_JOBS=`eval "expr 1 + ${MEM_GB} / ${LINK_MEM}"`

cmake -G Ninja -DCK_PARALLEL_LINK_JOBS=$LINK_JOBS
               -DCK_PARALLEL_COMPILE_JOBS=$COMPILE_JOBS

Signed-off-by: Tom Rix <trix@redhat.com>

[ROCm/composable_kernel commit: efaf31061a]
2023-12-14 17:26:41 -08:00
Illia Silin
e1a51c96fc Add daily run with mainline compiler. (#1075)
* add daily build with mainline compiler

* fix the compiler paths for ci

* remove the -flto flag

* build with clang by default

[ROCm/composable_kernel commit: afe4622014]
2023-12-04 19:04:52 -08:00
Illia Silin
1a389f2d2e Enable sccache in the default docker and CI. (#1009)
* replace ccache with sccache, pin package versions

* put ccache back temporarily to avoid breaking other CI jobs

* add sccashe_wrapper.sh script

* fix the package version syntax

* fix the pymysql package issue

* run sccache_wrapper before build if ccache server found

* set the paths before calling the sccache_wrapper

* use /tmp instead of /usr/local for cache

* try using sccache --start-server instead of wrapper

* try using redis server with sccache

* define SCCACHE_REDIS

* add redis and ping packages, and redis port

* use the new sccache redis server

* do not use sccache with staging compiler

* fix the condition syntax

* add stunnel to redis

* add tunnel verification

* separate caches for different architectures

* fix syntax for the cache tag

* quse double brackets for conditions

* add bash line to the script

* add a switch for sccache and only use it in build stage

* run check_host function when enabling sccache

* fix the invocation tags for sccache

* fix groovy syntax

* set the invocation tag in groovy

* disable sccache in clang-format stage

* try another syntax for invocation tags

* use local sccache server if can't connect to redis

* fix script syntax

* update README

* refresh readme

* readme updates

* remove the timing and verification caveat from readme

---------

Co-authored-by: Lisa Delaney <lisa.delaney@amd.com>

[ROCm/composable_kernel commit: 4e44a9e8da]
2023-10-30 13:16:29 -07:00
zjing14
dc94c20258 Clean DTYPES conditions in CMake (#974)
* Add a condition to build fp8 instances

* simplified buffer_load/store

* add bfp8/fp8

* fixed

* remove all f8/bf8 condition include folder

* fixed cmake conditions

* fixed DTYPES=fp16/bfp16

* fix

* fixed buffer_load

* fixed buffer_store

* fix

* clean example cmake files

* fixed ci

* fixed cit

---------

Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com>
Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: bf435140dc]
2023-10-18 11:14:14 -05:00
Illia Silin
6a8658812a get rid of gfx900/906, set rocm5.7 as default (#958)
[ROCm/composable_kernel commit: 59dbb01fd1]
2023-10-02 12:01:11 -07:00
Illia Silin
f9ce51a187 Use lower case for ckprofiler package. (#948)
* split ckProfiler gfx9 package into gfx90 and gfx94

* use lower case for package names

[ROCm/composable_kernel commit: 420b5a0382]
2023-09-26 17:43:09 -07:00
Illia Silin
37f4626e3e split ckProfiler gfx9 package into gfx90 and gfx94 (#946)
[ROCm/composable_kernel commit: 0b296a2722]
2023-09-26 11:22:31 -07:00
Illia Silin
99024ff371 Resolve some data type issues and cmake policy. (#940)
* split the types in gemm_bilinear instances, add condition to cmake policy

* fix syntax

* split the data types in batchnorm examples

* fix the batchnorm_bwd test

* fix types in the batchnorm_bwd test

[ROCm/composable_kernel commit: 2ea75bd6d7]
2023-09-26 08:39:11 -07:00
Illia Silin
3609ff10f7 Refactoring cmake files to build data types separately. (#932)
* refactor cmake files for the tests

* refactor cmake files for examples

* fix cmake for gemm example

* fix the cmake file for all examples

* add splitting by data types in gemm_splitk instance header

* rename test to reflect only dl instances are used

* clean up CI workspace, update cmake for instances

* change the jenkinsfile syntax

* build all instances except DL on gfx11

* move workspace cleanup after stages

* clean up workspace after every stage

* isolate data types in grouped_conv_fwd header

* isolate dl instances for grouped_conv2d_fwd

* fix syntax

* fix cmake and batchnorm instances

* fix typo

* fix reduction instances

* fix grouped_conv headers

* fix syntax

* replace parsing logic for instances, replace bfp16 with bf16

* fix the client examples build

* clean up DTYPES from instances cmake files

* update the parsing logic in cmake files

* make an exception for reduction kernels

* update few remaining cmake files to handle DTYPES

* fix syntax

* fix cmake conflicts

* replace f8 with fp8 test name

* resolve conflicts for dpp instances

[ROCm/composable_kernel commit: bba085d2b5]
2023-09-20 22:15:56 -07:00
Illia Silin
91cb870871 fix the ckprofiler package build in a loop (#926)
[ROCm/composable_kernel commit: 5a4416c8a7]
2023-09-19 09:17:39 -07:00
Jun Liu
f9e7629556 [Cmake] Set cmake default build type Release and path to /opt/rocm (#914)
[ROCm/composable_kernel commit: 5fe687fa27]
2023-09-13 14:38:12 -07:00
Rostyslav Geyyer
0752117077 Refactor f8_t, add bf8_t (#792)
* Refactor f8_t to add bf8_t

* Add check_err impl for f8_t

* Update fp8 test

* Format

* Revert the fix

* Update vector_type implementation

* Add bf8 test

* Add bf8, use BitInt types

* Add bf8 conversion methods

* Update type_convert for fp8/bf8

* Add check_err fp8/bf8 support

* Add subnorm fp8 tests

* Add subnorm bf8 tests

* Fix conversion

* Add bf8 cmake bindings

* Add macros to enable build with disabled fp8/bf8

* Remove is_native method

* Update flag combination for mixed precision instances

* Add more flag checks

* Add another flag to a client example

* Add type traits, decouple f8/bf8 casting

* Clean up

* Decouple fp8 and bf8 flags

* Remove more redundant flags

* Remove leftover comments

[ROCm/composable_kernel commit: 62d4af7449]
2023-09-12 17:04:27 -05:00
Lauren Wrubleski
766b5dc9d7 Fix config header installation (#880)
[ROCm/composable_kernel commit: bd8024b84a]
2023-09-04 09:49:40 -07:00
Jun Liu
2fb9a37881 [HotFix] add config and version files to pass on build info (#856)
* experiment with config file

* experiment with version.h config

* add more info to version.h

* minor updates

* minor updates

* fix case where DTYPE is not used

* large amount of files but minor changes

* remove white space

* minor changes to add more MACROs

* fix cmakedefine01

* fix issue with CK internal conflict

* fix define and define value

* fix clang-format

* fix formatting issue

* experiment with cmake

* clang format v12 to be consistent with miopen

* avoid clang-format for config file

[ROCm/composable_kernel commit: c8a8385fdd]
2023-08-23 11:36:17 -07:00
Illia Silin
33edc7449a Update the rocm version threshold to apply the -fno-offload-uniform-block flag. (#839)
* add fno-offload-uniform-block flag for rocm5.7 and up

* add a comment and compiler ticket number

* update the threshold rocm version

[ROCm/composable_kernel commit: cbbd172fd6]
2023-08-09 13:50:04 -07:00
Illia Silin
62bf177e98 add no-offload-uniform-block flag for rocm5.7 and up (#838)
* add -fno-offload-uniform-block flag for rocm5.7 and up

* add a comment and compiler ticket number

[ROCm/composable_kernel commit: 6802611334]
2023-08-08 17:58:31 -07:00