Commit Graph

523 Commits

Author SHA1 Message Date
aska-0096
e8ca3daf4e update instances 2024-12-13 03:29:15 +00:00
aska-0096
26d5174e15 update instance and lds layout strategy 2024-11-26 07:29:38 +00:00
aska-0096
ea90b01fc9 fix bug in enable f8 gemm inside ckProfiler 2024-11-20 09:33:39 +00:00
aska-0096
c99e3d595e Merge branch 'mem_gemm_opt' of https://github.com/ROCm/composable_kernel into update_cka8w8 2024-11-20 05:41:33 +00:00
aska-0096
ec6b000c77 Merge branch 'develop' of https://github.com/ROCm/composable_kernel into update_cka8w8 2024-11-19 08:50:37 +00:00
Illia Silin
8aba2724cc Add bf16 and int8 wmma gemms for Navi3x and Navi4x. (#1671)
* add bf16 gemms for gfx11/gfx12

* reduce the input values in test_gemm

* add int8 wmma gemm instances for gfx11/gfx12

* add example gemm_wmma_int8

* fix bug in gemm_wmma_int8 test

* increase bf16 gemm test tolerance

* update the dates and clean-up commented-out instances
2024-11-18 14:07:04 -08:00
Bartłomiej Kocot
754adc70e3 Batched GEMM Multiple D based on Universal GEMM (#1655)
* Batched GEMM Multiple D based on Universal GEMM

Co-authored-by: Jing Zhang <jizhan@fb.com>

* CI fixes

Co-authored-by: Jing Zhang <jizhan@fb.com>

---------

Co-authored-by: Jing Zhang <jizhan@fb.com>
2024-11-18 14:03:45 +01:00
aska-0096
f3bbfe3efe Merge branch 'develop' of https://github.com/ROCm/composable_kernel into update_cka8w8 2024-11-18 07:32:39 +00:00
aska-0096
2b840f5a85 reduce prefetch stage in blockwisepipev4 2024-11-18 07:32:30 +00:00
Illia Silin
efd9261545 fix clang format (#1662) 2024-11-13 09:20:18 -08:00
Taylor Ding
73f02a1083 Move checks for compatibility from Argument() to IsSupportedArgument() (#1653) 2024-11-13 11:20:38 -05:00
Illia Silin
75c5bfa364 enable compilation for generic navi targets (#1645) 2024-11-07 14:14:42 -08:00
darren-amd
d0e3a70a2e Statically Cast Pointer Offset (#1631)
* explicit cast ptr offset

* formating change
2024-11-05 09:59:08 -08:00
aska-0096
55cb3bdee5 clean the flush_cache api 2024-11-05 10:10:11 +00:00
aska-0096
f20e48f1f4 Merge branch 'develop' of https://github.com/ROCm/composable_kernel into update_cka8w8 2024-11-05 07:03:42 +00:00
aska-0096
b97c68764e update ck_a8w8 library, update flush cache timing api 2024-11-05 06:57:48 +00:00
Bartłomiej Kocot
9a8a52130d Remove virtual destructors from unary ops (#1610)
* Remove virtual destructors from unary ops

* Fixes

* Fixes

* clang format fixes
2024-10-30 17:42:50 +01:00
aska-0096
b3e5048f12 tempsave 2024-10-30 07:38:59 +00:00
Illia Silin
922e42a039 fix compilation errors for gfx12 with clang20 (#1606) 2024-10-28 19:02:48 -07:00
Bartłomiej Kocot
31bf253aeb Add dynamic elementwise op (#1426)
* Add dynamic elementwise op

Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com>

* CI issues fix

* Custom parameter value for dynamic functions - Comments addressed

---------

Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com>
Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com>
2024-10-26 15:22:37 +02:00
valarLip
37f7afed1e add int8 gemm multiply multiply a8w8 (#1591)
* add int8 gemm multiply multiply a8w8

* uncomment

* clang-format-12

* Add example_gemm_multiply_multiply_xdl_int8

* Remove shell scripts

* update preprocess number for mi308; bring back printout in ckprofiler

* format

---------

Co-authored-by: chenjun <junchen2@amd.com>
Co-authored-by: Haocong WANG <haocwang@amd.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>
2024-10-26 16:39:34 +08:00
aledudek
9385caa306 Generic threshold calculation (#1546)
* Calculate generic relative threshold pool3dfwd

* Calculate absolute error threshold pool3d fwd

* Generic threshold calculation take max input for relative error pool3dfwd

* Remove max possible value for error calculation at runtime

* Remove debug print in pool3dfwd

* Pool3d fwd adjusted types in generic threshold calculation

* Generic threshold calculation take into account number of accumulations and accdatatype

* Generic threshold fix final error formula

* Generic threshold calculation - num of accs fix

* Generic threshold calculation - adjust absolute error

* Generic threshold calculation - OutDataType in absolute error
2024-10-25 12:46:24 +02:00
aska-0096
e8c19535f7 update preprocess number for mi308; bring back printout in ckprofiler 2024-10-25 04:29:34 +00:00
Haocong WANG
47294b4b22 Merge branch 'develop' into gemm_multiply_multiply_int8a8w8 2024-10-23 11:28:40 +08:00
Jatin Chaudhary
4d5248e2d1 Explicit cast values to half (#1593)
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2024-10-22 11:17:32 -07:00
chenjun
1670bba95f clang-format-12 2024-10-21 23:16:04 +08:00
chenjun
7fb0b3223c add int8 gemm multiply multiply a8w8 2024-10-21 21:57:41 +08:00
Rostyslav Geyyer
4cf70b36c1 Add custom type vector support (#1333)
* Add non_native_vector_type

* Add a test

* Add non-native vector type

* Fix CTOR

* Fix non-native vector type of 1

* Fix CTORs

* Use vector_type to cover non-native implementation as well

* Update the test

* Format

* Format

* Fix copyright years

* Remove BoolVecT so far

* Add AsType test cases

* Update assert error message

* Remove redundant type

* Update naming

* Add complex half type with tests

* Add tests for vector reshaping

* Add missing alignas

* Update test/data_type/test_custom_type.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Compare custom types to built-in types

* Add default constructor test

* Add an alignment test

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2024-10-14 11:56:45 -05:00
Bartłomiej Kocot
f21cda2536 Add transpose scale amax example (#1547)
* Add transpose scale amax example

* fixes

* Tune reduce instance
2024-10-14 17:39:38 +02:00
Adam Osewski
29d384d0b2 Implement GetWorkSpaceSize from BaseOperator. (#1564) 2024-10-12 14:05:11 +08:00
Christopher Millette
ceaed8e097 Fixes small memory leak from missing hipEventDestroy (#1554) 2024-10-09 09:41:35 +02:00
Illia Silin
7d8ea5f08b Fix build logic using GRU_ARCHS. (#1536)
* update build logic with GPU_ARCHS

* fix the GPU_ARCHS build for codegen

* unset GPU_TARGETS when GPU_ARCHS are set
2024-10-07 08:18:23 -07:00
Bartłomiej Kocot
6b54d2faf8 Fix grouped gemm check to avoid overflow (#1545) 2024-10-04 17:32:43 +02:00
macurtis-amd
aeb7c91f48 Fix compilation errors generated by forthcoming Clang changes (#1544)
Without this change, the following diagnostic is generated:
  a template argument list is expected after a name prefixed by the template
  keyword [-Wmissing-template-arg-list-after-template-kw]

See C++17 spec [temp.names] p5.
2024-10-02 13:56:22 -07:00
Illia Silin
42e6dceacc Fix compilation errors with Clang20.0. (#1533)
* fix clang20 compilation errors for gfx90a

* fix clang20 compilation errors for gfx11 targets
2024-09-25 13:45:38 -07:00
Bartłomiej Kocot
4ba52b35dc Add support for NGCHW in grouped conv fwd (#1499)
* Support NGCHW in grouped conv fwd

* Remove not needed variable

* Fixes
2024-09-20 10:45:46 +02:00
Adam Osewski
0c39954da9 Remove unsupported (fp8) type from Add memory operation. (#1521)
The dynamic buffer doesn't have support for fp8 in `Update` operation thus fp8 is not supporting `InMemoryDataOperation::Add`
2024-09-20 09:40:45 +02:00
Jun Liu
81bc1496b2 Customize filesystem in CK for legacy systems (#1509)
* Legacy support: customized filesystem

* Update cmakefile for python alternative path

* fix build issues

* CK has no boost dependency

* More fixes to issues found on legay systems

* fix clang format issue

* Check if blob is correctly generated in cmake

* fix the python issues

* add a compiler flag for codegen when using alternative python

* use target_link_options instead of target_compile_options

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>
2024-09-13 07:51:07 -07:00
Mateusz Ozga
448c0f56d8 Pool2d max/avg kernel in the BWD version (#1494)
* Add pool2d instance BWD AVG

* Add pool2d instance BWD MAX

* Fix: avg review

* Fix review: part2

* Fix - enable test when type is compiled

* Fix review part3
2024-09-12 11:47:52 +02:00
jakpiase
e8d2887cb2 Rewrite pool2d fwd (#1462)
* added pool2d fwd

* add tests

* add reviewers changes

* Revert "Merge remote-tracking branch 'origin/develop' into jakpiase/pool2d_fwd_new"

This reverts commit 6b2ba7ff89, reversing
changes made to 22c82bea0c.

* Revert "add reviewers changes"

This reverts commit 22c82bea0c.

* added reviewers comments

* revert some old files

* add reviewers requests

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
2024-09-11 15:21:00 +02:00
jakpiase
2a261afcdf Added structural sparsity blockwise gemm (#1435)
* Implemented smfmac xdlops

* Added smfmac blockwise xdlops

* fixes

* add reviewers suggestions

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
2024-09-11 15:19:42 +02:00
Haocong WANG
0b3a409d4f Merge branch 'develop' of https://github.com/ROCm/composable_kernel into mem_gemm_opt 2024-09-06 03:22:06 +00:00
M.Emin Ozturk
8378855361 Moficiation to fix this issue "threadwise_tensor_slice_transfer_v5r1 issue #1279" (#1492)
* issue fix, one line changed for tmp

* clang

---------

Co-authored-by: Emin Ozturk <emin.ozturk@utah.edu>
Co-authored-by: Harisankar Sadasivan <135730918+hsadasiv@users.noreply.github.com>
2024-09-04 21:52:55 -07:00
Haocong WANG
5b10dae6a4 Add gemm universal bf16 instances (#1484)
* revert ckprofiler change

* temp save

* Add test and test pass

* test pass

* Fix bug inside rotating buffer when tensor is not packed

* bug fix

* clang format

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2024-09-04 20:58:54 -07:00
aska-0096
dbfcb380cd temp save 2024-09-05 03:04:31 +00:00
aska-0096
cc404d1190 Merge branch 'develop' of https://github.com/ROCm/composable_kernel into mem_gemm_opt 2024-09-04 15:18:52 +00:00
aska-0096
41fcfbc64e clang format 2024-09-04 15:11:51 +00:00
aska-0096
6df91708a6 temp save 2024-09-04 14:32:02 +00:00
Bartłomiej Kocot
73b67f290f Add support for NGCHW in grouped conv bwd wei (#1491)
* Add support for NGCHW in grouped conv bwd wei

* Comments fixes

* navi fixes

* Update function names
2024-09-03 10:52:03 +02:00
aska-0096
4885c38aa4 Merge branch 'transpose_opt' of https://github.com/ROCm/composable_kernel into rowwise_opt 2024-09-03 08:37:45 +00:00