Rostyslav Geyyer
94e5175ba3
Clean up
2025-05-01 17:22:34 +00:00
Rostyslav Geyyer
0fc2f528e0
Fix conflicts
2025-04-30 20:03:27 +00:00
Rostyslav Geyyer
045a71bc14
Merge branch 'develop' into lwpck-2876
2025-04-30 19:52:03 +00:00
Rostyslav Geyyer
4ec936befd
Remove leftovers
2025-04-30 18:59:06 +00:00
Rostyslav Geyyer
b27154e689
Add fp4 dimensions
2025-04-30 16:33:58 +00:00
Rostyslav Geyyer
840527ec41
Enable tests
2025-04-30 16:33:36 +00:00
Bartłomiej Kocot
4094ad158a
Integrate universal gemm with conv bwd data and add SplitK ( #1315 )
...
* Integrate universal gemm with conv bwd data
* Fix multi d kernel
* Add splitK support
* instances refactor
* instances refactor
* refactor
* fixeS
* fixes
* 16x16 instnaces
* Fixes
* Fix
* Fix
* Fix
* Fix
* Fix
* Fixes
* fix
* fix
2025-04-28 23:54:49 +02:00
Anton Gorenko
edd92fc546
DeviceGemm_Wmma_CShuffleV3 with BlockGemmPipelineVersion::v3 ( #2096 )
...
* Prepare files for DeviceGemm_Wmma_CShuffleV3
* Implement main part of CShuffleV3 with block pipeline v3 for WMMA
* Remove unused functions and template params for A/B descriptors
* Support both gfx11 and gfx12
* Enable SplitK for gfx12 and disable for gfx11
* Added RowColRow layout for DeviceGemmV2 fp16
* Added more instances for Row, Col, Row data layout
* Added instances for DeviceGemm_Wmma_CShuffleV3, Col, Row, Row data layout
* Added instances for DeviceGemm_Wmma_CShuffleV3, Col, Col, Row data layout
* Added more instances for DeviceGemm_Wmma_CShuffleV3, Row, Row, Row data layout
* Fix formatting
* Add documentation
Based on e5ad48a784
* Enable gemm_universal profiling for gfx11/12
* Add WMMA intrinsics for F8/BF8
* Support F8/BF8 DeviceGemm_Wmma_CShuffleV3, add basic instances
* Add BF16 instances and tests
* Fix test_gemm_universal_wmma_fp8 by adding CK_USE_WMMA_FP8
---------
Co-authored-by: Anca Hamuraru <anca@streamhpc.com >
2025-04-28 10:14:21 +05:00
Rostyslav Geyyer
416e851584
Temporarily disable MX FP4 device tests ( #2112 )
2025-04-22 16:08:48 -05:00
Muhammed Emin Ozturk
b092c18da7
MI308 fix for streamk 1-Tile floating point exception ( #2101 )
2025-04-21 11:44:07 -07:00
Andriy Roshchenko
213b203a3c
MX GEMM - Parameterized Test Template ( #2088 )
...
* Tests for MX FP8 GEMM
* Improve documentation
2025-04-16 19:56:00 -06:00
aledudek
7c32652e03
Add grouped conv fwd 3d GKCYX instances for f32, f16, bf16 ( #2069 )
...
* Part1
* Add grouped conv fwd 3d GKCYX instances for f32, f16, bf16
* Add missing coma
* Add missing cpp instance files
* Fix 3d layout
* Add missing closing bracket
* Add missing comp x2 and part2 instances
* Fix typo in instance name
* fix
* Fix
---------
Co-authored-by: Bartlomiej Kocot <barkocot@amd.com >
2025-04-16 11:00:55 +02:00
Andriy Roshchenko
7106976a72
MX GEMM - New GEMM pipeline for MX data types ( #2059 )
...
* Allow selection of mfma_scale instructions
* Read B tensor from LDS to VGPR in chunks of 16 in MFMA order
* Add constexpr and synchronize return type for `get_exponent_value`
* Pass scales by reference and add comments to `mfma_scale_f32_32x32x64`
* Add support for microscaling instructions in `XdlopsGemm`
* Fix `mfma_scale_f32_16x16x128f8f6f4` wrapper
* Remove software implementation of MX GEMM
* Make interface of `intrin_mfma_scale_f32_16x16x128f8f6f4<16, 16>` consistent with the other scale instruction
* Update README
* Updated CHANGELOG
* Remove unused static methods
2025-04-15 17:17:07 -06:00
Muhammed Emin Ozturk
74fda2e796
CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test Redo PR #2044 ( #2070 )
...
* fix and split gemm_universal test
* Update test_gemm_universal_streamk_ut_cases_fp8.inc
2025-04-11 10:17:29 -07:00
Illia Silin
3e6d21adeb
enable gfx115x support ( #2065 )
2025-04-09 10:06:42 -07:00
Illia Silin
29f7266216
Revert "CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test (…" ( #2054 )
...
This reverts commit 7142d8003c .
2025-04-07 06:49:36 -07:00
Muhammed Emin Ozturk
7142d8003c
CkProfiler StreamK GemmUniversal Fix and Split Gemm_universal Test ( #2044 )
...
* fix and split gemm_universal test
* clang
* Update test_gemm_universal_ut_cases_bf16.inc
* Update test_gemm_universal_xdl_bf16.cpp
* Update test_gemm_universal_ut_cases_fp16.inc
2025-04-03 14:22:43 -07:00
Rostyslav Geyyer
5d22aa296e
Disable some tests
2025-04-03 20:41:53 +00:00
Thomas Ning
50d1f8ff90
Add the MI355 support for CK TILE GEMM ( #2046 )
...
* Get the root cause of the ck tile gemm failing on mi355
* Fix the ck tile gemm on MI355
* delete the debug info
2025-04-03 11:48:54 -07:00
Rostyslav Geyyer
265af71a71
Add FP16/BF16<->FP8/BF8 conversions ( #2035 )
...
* Move conversion functions and add missing conversions
* Add tests
* Add missing conversions
* Add missing conversions
* Add bf8 tests
* Update clipping for vectors
* Add missing conversions
* Add bf16 fp8 tests
* Add bf16 bf8 tests
* Fix device conversion
* Fix conversions
* Fix vector use
* Minor fix
* Add a workaround flag
* Add a workaround flag for bf16 conversion
* Add another workaround
* Add a workaround for fp16 to bf8 conversion
* Update type alias
* Add docstrings and missing wrappers
* Fix if defined macros
* Fix more if defined macros
* Add comments
* Remove __host__ specifier
* Add a gfx950 guard
* Update function naming
2025-04-03 12:42:03 -05:00
Bartłomiej Kocot
2ccf914888
Add support for GKCYX grouped conv weight ( #2023 )
...
* Grouped conv bwd weight GKCYX support
* fix and changelog
* fix
* fix
* fixes
* comments
* fix
2025-04-02 23:59:49 +02:00
Bartłomiej Kocot
8c0ab61ece
Grouped conv backward data GKCYX support ( #2029 )
...
* Grouped conv backward data GKCYX support
* profiler
* Converter
* split instances
2025-04-01 13:24:38 -07:00
Bartłomiej Kocot
ec742908bd
Grouped conv fwd v3 fix for SplitN an G > 1 ( #2038 )
...
* Grouped conv fwd v3 fix for SplitN an G > 1
* Remove int8 large test
* Retore int8 test
2025-04-01 13:19:35 -07:00
Muhammed Emin Ozturk
dd4c12b155
f8/bf16 GEMM Stream-K ( #1879 )
2025-03-31 20:30:17 -06:00
Rostyslav Geyyer
e3386e4f66
Add more debug info
2025-03-27 16:29:39 +00:00
Rostyslav Geyyer
441343a23d
Add MX FP4 device conversion tests ( #1889 )
...
* Add conversion tests
* Fix ctor
* Fix nan logic
* Fix conversion logic
* Permute packed f4_t values
* Fix conversion to float, repack vector elements
* Fix device tests
* Permute elements in a vector
* Add a repro test
* Add a conversion for a repro test
* Update test vectors
* Update conversion
* Fix the test
* Update test vector generator
* Fix vector sr conversion
* Permute conversion args
* Update conversion
* Test
* Fix packing
* Simplify conversion function
* Pack conversion in a loop
* Pack conversion in a loop
* Pack another conversion in a loop
* Pack one more conversion in a loop
* Pack the last conversion in a loop
* Clean up
* Add printf to fix intrinsic
* Add a sw-based workaround
2025-03-26 19:23:01 -05:00
Bartłomiej Kocot
54c81a1fcf
Add support for GKCYX grouped conv fwd ( #2015 )
...
* Add support for GKCYX grouped conv fwd
* fixes
* fix
* changelog
* Fixes
2025-03-26 21:13:38 +01:00
Rostyslav Geyyer
2fd48e1231
Fix B stride
2025-03-25 18:34:07 +00:00
Rostyslav Geyyer
29be4e3787
Add more debugging
2025-03-25 18:07:33 +00:00
Rostyslav Geyyer
63e9a652a1
Update data layout
2025-03-25 15:41:30 +00:00
jakpiase
0e91d32c61
[CK_TILE] Switch to universal gemm for batched and grouped gemms ( #1919 )
...
* switch to universal gemm for batched and grouped gemms
* added reviewer comments
* fixed grouped gemm tests
2025-03-20 11:17:04 +01:00
Rostyslav Geyyer
5adf19ccb3
Update chunk size
2025-03-19 20:37:05 +00:00
Bartłomiej Kocot
c2e4898b4b
Grouped conv bwd data NGCHW ( #1967 )
...
* Grouped conv bwd data NGCHW
* fixes
* fix
* Improvements
* Fix
* Fix
* add client example
2025-03-17 13:32:00 +01:00
Illia Silin
d4a6d69643
disable tests that take too long to build for gfx90a ( #1975 )
2025-03-12 17:54:03 -07:00
Rostyslav Geyyer
7b410eefc6
Add non scaled tests
2025-03-12 16:05:01 +00:00
Haocong WANG
ba209b9dab
reduce test size to avoid timeout on specific silicon ( #1966 )
2025-03-11 09:15:26 -07:00
Rostyslav Geyyer
d466096e25
Update chunk size for f4x2
2025-03-10 21:09:38 +00:00
Rostyslav Geyyer
234cbcb7af
Add f4x2 init mode
2025-03-10 18:54:40 +00:00
Rostyslav Geyyer
39b93e4a20
Add tests
2025-03-10 15:46:05 +00:00
kylasa
66c5f5b0b6
Addressing (Post Merge) code review comments for PR 1845 ( #1883 )
...
* Addressing code review comments.
* Addressing code review comments.
* Reorganized code for better readability.
* add ck_tile gemms for new types in CI
* fix jenkins syntax
* fix script syntax
* Add the test cases back
* Address the review comments
* Address review comments
* clang format
* Solve the merging issues
* Addressed the comments
* clang format
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
Co-authored-by: ThomasNing <thomas.ning@amd.com >
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2025-03-06 11:40:30 -08:00
Illia Silin
9b51c08bf7
remove support for gfx940 and gfx941 targets ( #1944 )
...
* remove support for gfx940 and gfx941 targets
* update changelog
2025-03-05 11:07:33 -08:00
Bartłomiej Kocot
1bf29478cd
[CK TILE] Fix double lds in ck tile gemm ( #1924 )
2025-02-28 08:07:53 -08:00
Rostyslav Geyyer
ddaa893d7b
Merge branch 'develop' into lwpck-2836
2025-02-21 14:59:53 -06:00
Rostyslav Geyyer
3fe8d318bf
Clean up
2025-02-21 20:51:49 +00:00
Andriy Roshchenko
ffa13455a2
MX FP GEMM - Test MX FP8 MFMA Instructions ( #1902 )
...
* Refactored `load_A_row_major` to follow scale mapping
* Refactored `load_A_col_major` to follow scale mapping
* Refactored `load_B_col_major` to follow scale mapping
* Verified non-scaled test
* Verified scaled tests
* Used ReferenceMXGemm for verification
* Updated license headers
2025-02-21 13:35:54 -07:00
Rostyslav Geyyer
50c1291317
Fix packing
2025-02-18 22:25:16 +00:00
Rostyslav Geyyer
7daf21081e
Fix vector sr conversion
2025-02-18 19:56:20 +00:00
Rostyslav Geyyer
e323d613ff
Update test vector generator
2025-02-18 19:47:36 +00:00
Rostyslav Geyyer
d90b50596b
Fix the test
2025-02-18 19:28:59 +00:00
Mateusz Ozga
c287418dcc
Apply universal gemm to bwd_weight_cshuffle operator ( #1873 )
...
* Universal gemm - initial commit
* Review part 1
* Fix tests
* Remove instances
* Remove comp instances
2025-02-18 10:10:22 +01:00