Adam Osewski
87b69930e7
Change block gemm pipeline local prefill loop order. ( #1692 )
...
* Fix loop order.
* Fix loop order in pipeline v4
[ROCm/composable_kernel commit: bfe983a151 ]
2024-11-26 17:36:53 +01:00
jakpiase
50ee0ac283
Add check for bf16 splitk support for grouped gemm splitk ( #1673 )
...
* add check for bf16 splitk support for grouped gemm splitk
* Update if condition
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
[ROCm/composable_kernel commit: b70f367f80 ]
2024-11-26 13:56:32 +01:00
Harisankar Sadasivan
0d34db594d
universal streamk fp8 changes ( #1665 )
...
* universal streamk fp8 changes & ckprofiler instances
* revert strides to -1 and verification options
* fp8 exclusion on pre-gfx94 for universal_streamk
* PR review based revisions: permissions reverted, removed hip err checks
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
[ROCm/composable_kernel commit: d6d4c2788b ]
2024-11-21 08:21:37 -08:00
Illia Silin
dc227604bc
Add bf16 and int8 wmma gemms for Navi3x and Navi4x. ( #1671 )
...
* add bf16 gemms for gfx11/gfx12
* reduce the input values in test_gemm
* add int8 wmma gemm instances for gfx11/gfx12
* add example gemm_wmma_int8
* fix bug in gemm_wmma_int8 test
* increase bf16 gemm test tolerance
* update the dates and clean-up commented-out instances
[ROCm/composable_kernel commit: 8aba2724cc ]
2024-11-18 14:07:04 -08:00
Bartłomiej Kocot
b89a44ea33
Batched GEMM Multiple D based on Universal GEMM ( #1655 )
...
* Batched GEMM Multiple D based on Universal GEMM
Co-authored-by: Jing Zhang <jizhan@fb.com >
* CI fixes
Co-authored-by: Jing Zhang <jizhan@fb.com >
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
[ROCm/composable_kernel commit: 754adc70e3 ]
2024-11-18 14:03:45 +01:00
Illia Silin
d80f50d5e9
fix clang format ( #1662 )
...
[ROCm/composable_kernel commit: efd9261545 ]
2024-11-13 09:20:18 -08:00
Taylor Ding
7c9257128d
Move checks for compatibility from Argument() to IsSupportedArgument() ( #1653 )
...
[ROCm/composable_kernel commit: 73f02a1083 ]
2024-11-13 11:20:38 -05:00
Illia Silin
ddfcce82ab
enable compilation for generic navi targets ( #1645 )
...
[ROCm/composable_kernel commit: 75c5bfa364 ]
2024-11-07 14:14:42 -08:00
darren-amd
bee5289f56
Statically Cast Pointer Offset ( #1631 )
...
* explicit cast ptr offset
* formating change
[ROCm/composable_kernel commit: d0e3a70a2e ]
2024-11-05 09:59:08 -08:00
Bartłomiej Kocot
724312aea3
Remove virtual destructors from unary ops ( #1610 )
...
* Remove virtual destructors from unary ops
* Fixes
* Fixes
* clang format fixes
[ROCm/composable_kernel commit: 9a8a52130d ]
2024-10-30 17:42:50 +01:00
Illia Silin
75b1a7a6fe
fix compilation errors for gfx12 with clang20 ( #1606 )
...
[ROCm/composable_kernel commit: 922e42a039 ]
2024-10-28 19:02:48 -07:00
Bartłomiej Kocot
930195c384
Add dynamic elementwise op ( #1426 )
...
* Add dynamic elementwise op
Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com >
* CI issues fix
* Custom parameter value for dynamic functions - Comments addressed
---------
Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com >
Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com >
[ROCm/composable_kernel commit: 31bf253aeb ]
2024-10-26 15:22:37 +02:00
valarLip
85cf31cf40
add int8 gemm multiply multiply a8w8 ( #1591 )
...
* add int8 gemm multiply multiply a8w8
* uncomment
* clang-format-12
* Add example_gemm_multiply_multiply_xdl_int8
* Remove shell scripts
* update preprocess number for mi308; bring back printout in ckprofiler
* format
---------
Co-authored-by: chenjun <junchen2@amd.com >
Co-authored-by: Haocong WANG <haocwang@amd.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
[ROCm/composable_kernel commit: 37f7afed1e ]
2024-10-26 16:39:34 +08:00
aledudek
2c3399d29b
Generic threshold calculation ( #1546 )
...
* Calculate generic relative threshold pool3dfwd
* Calculate absolute error threshold pool3d fwd
* Generic threshold calculation take max input for relative error pool3dfwd
* Remove max possible value for error calculation at runtime
* Remove debug print in pool3dfwd
* Pool3d fwd adjusted types in generic threshold calculation
* Generic threshold calculation take into account number of accumulations and accdatatype
* Generic threshold fix final error formula
* Generic threshold calculation - num of accs fix
* Generic threshold calculation - adjust absolute error
* Generic threshold calculation - OutDataType in absolute error
[ROCm/composable_kernel commit: 9385caa306 ]
2024-10-25 12:46:24 +02:00
Jatin Chaudhary
2a074cd391
Explicit cast values to half ( #1593 )
...
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
[ROCm/composable_kernel commit: 4d5248e2d1 ]
2024-10-22 11:17:32 -07:00
Rostyslav Geyyer
63c478e14f
Add custom type vector support ( #1333 )
...
* Add non_native_vector_type
* Add a test
* Add non-native vector type
* Fix CTOR
* Fix non-native vector type of 1
* Fix CTORs
* Use vector_type to cover non-native implementation as well
* Update the test
* Format
* Format
* Fix copyright years
* Remove BoolVecT so far
* Add AsType test cases
* Update assert error message
* Remove redundant type
* Update naming
* Add complex half type with tests
* Add tests for vector reshaping
* Add missing alignas
* Update test/data_type/test_custom_type.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
* Compare custom types to built-in types
* Add default constructor test
* Add an alignment test
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
[ROCm/composable_kernel commit: 4cf70b36c1 ]
2024-10-14 11:56:45 -05:00
Bartłomiej Kocot
f51ed2ad28
Add transpose scale amax example ( #1547 )
...
* Add transpose scale amax example
* fixes
* Tune reduce instance
[ROCm/composable_kernel commit: f21cda2536 ]
2024-10-14 17:39:38 +02:00
Adam Osewski
ec25e56259
Implement GetWorkSpaceSize from BaseOperator. ( #1564 )
...
[ROCm/composable_kernel commit: 29d384d0b2 ]
2024-10-12 14:05:11 +08:00
Christopher Millette
f76d476983
Fixes small memory leak from missing hipEventDestroy ( #1554 )
...
[ROCm/composable_kernel commit: ceaed8e097 ]
2024-10-09 09:41:35 +02:00
Illia Silin
ee93500dad
Fix build logic using GRU_ARCHS. ( #1536 )
...
* update build logic with GPU_ARCHS
* fix the GPU_ARCHS build for codegen
* unset GPU_TARGETS when GPU_ARCHS are set
[ROCm/composable_kernel commit: 7d8ea5f08b ]
2024-10-07 08:18:23 -07:00
Bartłomiej Kocot
58d4e01645
Fix grouped gemm check to avoid overflow ( #1545 )
...
[ROCm/composable_kernel commit: 6b54d2faf8 ]
2024-10-04 17:32:43 +02:00
macurtis-amd
72f57d6c42
Fix compilation errors generated by forthcoming Clang changes ( #1544 )
...
Without this change, the following diagnostic is generated:
a template argument list is expected after a name prefixed by the template
keyword [-Wmissing-template-arg-list-after-template-kw]
See C++17 spec [temp.names] p5.
[ROCm/composable_kernel commit: aeb7c91f48 ]
2024-10-02 13:56:22 -07:00
Illia Silin
fb225a93d0
Fix compilation errors with Clang20.0. ( #1533 )
...
* fix clang20 compilation errors for gfx90a
* fix clang20 compilation errors for gfx11 targets
[ROCm/composable_kernel commit: 42e6dceacc ]
2024-09-25 13:45:38 -07:00
Bartłomiej Kocot
9d56280a0a
Add support for NGCHW in grouped conv fwd ( #1499 )
...
* Support NGCHW in grouped conv fwd
* Remove not needed variable
* Fixes
[ROCm/composable_kernel commit: 4ba52b35dc ]
2024-09-20 10:45:46 +02:00
Adam Osewski
bb7364d3c9
Remove unsupported (fp8) type from Add memory operation. ( #1521 )
...
The dynamic buffer doesn't have support for fp8 in `Update` operation thus fp8 is not supporting `InMemoryDataOperation::Add`
[ROCm/composable_kernel commit: 0c39954da9 ]
2024-09-20 09:40:45 +02:00
Jun Liu
3739cf9f74
Customize filesystem in CK for legacy systems ( #1509 )
...
* Legacy support: customized filesystem
* Update cmakefile for python alternative path
* fix build issues
* CK has no boost dependency
* More fixes to issues found on legay systems
* fix clang format issue
* Check if blob is correctly generated in cmake
* fix the python issues
* add a compiler flag for codegen when using alternative python
* use target_link_options instead of target_compile_options
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: 81bc1496b2 ]
2024-09-13 07:51:07 -07:00
Mateusz Ozga
92d1b386b2
Pool2d max/avg kernel in the BWD version ( #1494 )
...
* Add pool2d instance BWD AVG
* Add pool2d instance BWD MAX
* Fix: avg review
* Fix review: part2
* Fix - enable test when type is compiled
* Fix review part3
[ROCm/composable_kernel commit: 448c0f56d8 ]
2024-09-12 11:47:52 +02:00
jakpiase
cb4975cf70
Rewrite pool2d fwd ( #1462 )
...
* added pool2d fwd
* add tests
* add reviewers changes
* Revert "Merge remote-tracking branch 'origin/develop' into jakpiase/pool2d_fwd_new"
This reverts commit 6b2ba7ff89 , reversing
changes made to 22c82bea0c .
* Revert "add reviewers changes"
This reverts commit 22c82bea0c .
* added reviewers comments
* revert some old files
* add reviewers requests
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
[ROCm/composable_kernel commit: e8d2887cb2 ]
2024-09-11 15:21:00 +02:00
jakpiase
bf3518b45a
Added structural sparsity blockwise gemm ( #1435 )
...
* Implemented smfmac xdlops
* Added smfmac blockwise xdlops
* fixes
* add reviewers suggestions
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
[ROCm/composable_kernel commit: 2a261afcdf ]
2024-09-11 15:19:42 +02:00
M.Emin Ozturk
6c1bd4d47c
Moficiation to fix this issue "threadwise_tensor_slice_transfer_v5r1 issue #1279 " ( #1492 )
...
* issue fix, one line changed for tmp
* clang
---------
Co-authored-by: Emin Ozturk <emin.ozturk@utah.edu >
Co-authored-by: Harisankar Sadasivan <135730918+hsadasiv@users.noreply.github.com >
[ROCm/composable_kernel commit: 8378855361 ]
2024-09-04 21:52:55 -07:00
Haocong WANG
4e4514caa8
Add gemm universal bf16 instances ( #1484 )
...
* revert ckprofiler change
* temp save
* Add test and test pass
* test pass
* Fix bug inside rotating buffer when tensor is not packed
* bug fix
* clang format
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
[ROCm/composable_kernel commit: 5b10dae6a4 ]
2024-09-04 20:58:54 -07:00
Bartłomiej Kocot
950165c6fb
Add support for NGCHW in grouped conv bwd wei ( #1491 )
...
* Add support for NGCHW in grouped conv bwd wei
* Comments fixes
* navi fixes
* Update function names
[ROCm/composable_kernel commit: 73b67f290f ]
2024-09-03 10:52:03 +02:00
Bartłomiej Kocot
9974926658
Revert "Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd ( #1382 ) ( #1406 ) ( #1415 )" ( #1455 )" ( #1490 )
...
This reverts commit 725dd433cdc6435d481e806b5442a07b0097c94a.
[ROCm/composable_kernel commit: a9b170b541 ]
2024-09-02 10:39:49 +02:00
Andriy Roshchenko
f6c6819b47
Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. ( #1473 )
...
* Enable CMakePresets build
* Verify Convolution, Scaling and ReLU algorithms.
* Add tensor element-wise scale and type cast operation.
* Reduction implemented but does not work.
* Exploration of Reduction functionality.
* Completed example for Convolution scaled with ReLu activation and AMAX reduction.
* WIP: Add required instances for convolution.
* WIP: Create client example. Implement convolution stage.
* Add elementwise instances.
* Add elementwise scale + convert example.
* Add reduction instances.
* WIP: Client example for AMAX reduction.
* WIP: Add instances for multistage reduction.
* WIP: Implementation of multistage reduction.
* Refactoring.
* Clean up.
* Add CMakePresets.json
* Guard off FP8 instances when the data type is not available.
* Add example for Scaled FP8 Convolution with AMAX reduction.
* Refactor CombConvScaleRelu instances.
* Add CombConvScale instances.
* Add client example for Scaled FP8 Convolution with AMAX reduction.
* Cleanup.
[ROCm/composable_kernel commit: c3515f277c ]
2024-08-21 15:22:41 -07:00
Rostyslav Geyyer
0ab95a332e
Set RNE fp8 conversion as a default ( #1458 )
...
* Set RNE fp8 conversion as a default
* Update f8 tests
* Disable failing test on gfx11
* Update bf8 tests
* Add a flag
* Fix the flag
* Raise flag for gfx10 as well
* Temp commit for tolerance testing
* Update tolerances
[ROCm/composable_kernel commit: e20f20efbf ]
2024-08-21 09:09:48 -07:00
Haocong WANG
65d6442b4c
[GEMM] gemm_universal related optimization ( #1453 )
...
* replace buffer_atomic with global_atomic
* fixed global_atomic_add
* added bf16 atomic_add
* format
* clang-format-12
* clean
* clean
* add guards
* Update gtest.cmake
* enabled splitk_gemm_multi_d
* format
* add ckProfiler
* format
* fixed naming
* format
* clean
* clean
* add guards
* fix clang format
* format
* add kbatch printout
* clean
* Add rocm6.2 related gemm optimization
* Limit bf16 atomic usage
* remove redundant RCR gemm_universal instance
* Add RRR fp8 gemm universal instance
* Bug fix
* Add GPU_TARGET guard to FP8/BF8 target
* bug fix
* update cmake
* remove all fp8/bf8 example if arch not support
* Enable fp8 RRR support in ckProfiler
* limit greedy-reverse flag to gemm_universal in ckProfiler
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
Co-authored-by: Jing Zhang <jizhan@meta.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: 3049b5467c ]
2024-08-14 10:42:30 +08:00
Mateusz Ozga
7a4690b077
Support large: 12d tensor size for reduction kenrel ( #1465 )
...
[ROCm/composable_kernel commit: 0606e5498e ]
2024-08-13 16:15:47 +02:00
Bartłomiej Kocot
15ab8b0d5c
Fix bug with n block id calculation in DeviceGroupedConvXdlCShuffle ( #1457 )
...
* Fix typo in TransformConvFwdToGemm
* Fix bug in n offset calculation
[ROCm/composable_kernel commit: 4a870942e6 ]
2024-08-10 13:12:05 +02:00
Jun Liu
254a7dadb6
Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd ( #1382 ) ( #1406 ) ( #1415 )" ( #1455 )
...
This reverts commit 0c367d5912486f4fcbae1dbb861a1fb8176ca308.
[ROCm/composable_kernel commit: 5ff8eeebf9 ]
2024-08-08 19:09:33 -07:00
Juan Manuel Martinez Caamaño
61ecdbc128
Remove reinterpret_cast uses that result in undefined behaviour. ( #1445 )
...
* Remove reinterpret_cast uses that result in undefined behaviour. Use a bitcast instead.
See https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_accessibility
Closes #1439
* fix clang format
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: 901e5f1540 ]
2024-08-07 11:49:02 -07:00
Juan Manuel Martinez Caamaño
e539c37e7d
Add missing constexpr to if conditions ( #1444 )
...
[ROCm/composable_kernel commit: fd9ef4e678 ]
2024-08-06 11:40:34 -07:00
jakpiase
e8ee8856fa
Fix for beta!=0 in reduce ( #1440 )
...
* fix for beta!=0 in reduce
* add reviewers suggestions
[ROCm/composable_kernel commit: b74d4d4d54 ]
2024-08-06 09:10:39 -07:00
Bartłomiej Kocot
69a6b563f9
Add Grouped Conv Fwd Large Tensor kernel ( #1432 )
...
* Support 64 bit indexing
* Add new grouped conv fwd kernel for large tensors
* Add instances large tensor
* Fixes for transform conv to gemm
* Fixes
* fixes
* Remove not needed instances
* examples fixes
* Remove not need ds arrays
* Fix tests
* Add 2GB check in gridwise dl
* Fixes
[ROCm/composable_kernel commit: 4ec5c52a0c ]
2024-08-06 10:06:10 +02:00
arai713
735984bb5a
Codegen: isSupportedArgument check ( #1417 )
...
* added isSupportedArgument check into codegen device op
* adding function call
* remove commented code
[ROCm/composable_kernel commit: d32997a792 ]
2024-07-31 07:12:15 -07:00
Bartłomiej Kocot
1567614d80
Revert Revert Support access per groups and filter2x3 in grouped conv fwd ( #1382 ) ( #1406 ) ( #1415 )
...
[ROCm/composable_kernel commit: 33b399cc15 ]
2024-07-30 18:36:04 +02:00
zjing14
a94e87d868
Add rotating buff for gemm_multi_d ( #1411 )
...
* add rotating_buff for gemm_multi_d
* format
* Update flush_cache.hpp
* Update gtest.cmake
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
Co-authored-by: Haocong WANG <haocwang@amd.com >
[ROCm/composable_kernel commit: 105bd708c7 ]
2024-07-25 23:21:21 +08:00
Andriy Roshchenko
e3b469a493
Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. ( #1412 )
...
* Add CMakePresets configurations.
* Add binary elementwise ConvScaleAdd and an example.
* Numerical verification of results.
Observed significant irregularities in F8 to F32 type conversions:
```log
ConvScaleAdd: float=145.000000 f8_t=160.000000 e=144.000000
ConvScaleAdd: float=97.000000 f8_t=96.000000 e=104.000000
ConvScaleAdd: float=65.000000 f8_t=64.000000 e=72.000000
```
* Implemented ConvScaleAdd + Example.
* Add ConvScale+Bias Instances
* Add Client Example for ConvScale+Bias
* Fix number of bytes in an example..
* Cleanup.
[ROCm/composable_kernel commit: 4a8a1befd5 ]
2024-07-24 15:49:55 -05:00
Bartłomiej Kocot
1f93d3f961
Add support for half_t and bfloat to reduction operations ( #1395 )
...
* Add support for half_t and bfloat to reduction operations
* Fix bhalf convert
* Next fix bf16
[ROCm/composable_kernel commit: ffabd70a15 ]
2024-07-24 12:12:37 -05:00
Bartłomiej Kocot
b23a3fcf77
Revert Support access per groups and filter2x3 in grouped conv fwd ( #1382 ) ( #1406 )
...
[ROCm/composable_kernel commit: 5d8c3d8190 ]
2024-07-22 14:21:24 +02:00
Haocong WANG
a0e0f3cdcc
[GEMM] F8 GEMM, performance optimized. ( #1384 )
...
* add ab_scale init support
* enabled interwave
* add scale type; update isSupport
* adjust example
* clean
* enable f8 pure gemm rcr ckprofiler
* Add gemm_multiply_multiply instances
* clang format
* Optimize for ScaleBlockMNK=128
* enable abscale f8 gemm ck profiler
* Add pure f8 gemm test suite
* Reverting to the state of project at f60fd77
* update copyright
* clang format
* update copyright
---------
Co-authored-by: root <jizhan@amd.com >
[ROCm/composable_kernel commit: 8c90f25be3 ]
2024-07-19 22:06:52 +08:00