kylasa
5f1ad09b61
Code drop for 2 warp ping pong scheduler along K dimension. ( #2276 )
...
* Code drop for 2 warp ping pong scheduler along K dimension.
* Addressing code review comments.
* Addressing Clang formatting issues.
* Addressing build issues.
* Addressing build issues of other GEMM pipelines with ping pong scheduler code drop.
* Fix for LDS memory size for GEMM pipelines.
* Addressing code review feedback comments.
* Change log update.
* Addressing code review comments and build issues.
* Added new policy for pipeline specific logic about LDS needs.
* Clang Fix during build.
2025-06-12 18:24:02 -07:00
carlushuang
8aff45a8af
[CK_TILE] moe sorting optimization : refactor subtoken logic to let more kernel pickup mp kernel ( #2327 )
...
* refactor subtoken logic to let more kernel pickup mp kernel
* typo
2025-06-12 11:44:22 +08:00
Thomas Ning
06e0b8436c
Epilogue cshuffle Improvement ( #2312 )
...
* add cshuffle's mxdlperwavepershuffle support, not finished
* add epilogue functions
* add cshuffle's mxdlperwavepershuffle support, not finished
* add epilogue functions
* update cshuffle logic
* update cshuffle_logics
* add some change within review
* update some codes following the code review
* update epilogue logic
* remove from problem
* update codes following review.
* fix some issues
* solve the previous PR error, refine the code
* Update include/ck_tile/ops/epilogue/cshuffle_epilogue.hpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
* Comment addressed
* handling tile_engine failing case
* handling tile_engine failing case
---------
Co-authored-by: joyeamd <John.Ye@amd.com >
Co-authored-by: joye <joye@amd.com >
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
Co-authored-by: khushbu agarwal <khuagarw@amd.com >
2025-06-10 22:44:50 -07:00
Thomas Ning
14d229d6c8
fix on the typo ( #2326 )
2025-06-10 16:34:33 -07:00
Khushbu Agarwal
bd270fe4bc
fix flatmm kernel for bigger size for fp16 datatype ( #2302 )
2025-06-10 11:13:40 -07:00
Eisuke Kawashima
4e586ca958
chore: unset executable permission ( #2303 )
...
Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com >
2025-06-10 09:13:59 -07:00
MHYangAMD
9fcf21a4ec
Fix fmha fwd precision issue on MI3XX series ( #2285 )
...
* Fix fmha fwd precision issue on MI3XX series
For fmha fwd fp16 cases, we found that using
impl::cast_tile_pk_fp16_fp32 for casting P would lead to precision
issues, since it uses __builtin_amdgcn_cvt_pkrtz, which is round to zero.
For examaple, fixing K,V to be all 1, and Q is random, which outputs are
expected to be all 1. But we found that it would have some incorrect
outputs 0.9995, which are smaller than the atol 0.001. (1 - 0.9995 =
0.0005 < 0.001) Thus, ck do not report this error.
* Add option to switch rtn/rtz for fmha fwd
2025-06-10 15:03:23 +08:00
Sami Remes
1c6f83df6c
[CK_TILE] Tileloop persistent gemm - resubmit ( #2299 )
...
* Reapply "[CK_TILE] Tile loop persistent gemm kernel (#2191 )" (#2293 )
This reverts commit 233e274077 .
* Add missing header for kentry
---------
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
2025-06-06 14:18:49 -07:00
Andriy Roshchenko
00247e3c29
Optimized GEMMs for MX FP4/8 ( #2294 )
...
Adds V3 GEMM pipeline for MX FP4 and MX FP8
Adds V3 GEMM pipeline for MX FP4 with preshuffling
Adds MXFP4 GEMM tests (#2275 )
Adds MXFP4 GEMM examples
Adds MXFP4 GEMMs to ckProfiler
Co-authored-by: Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com >
Co-authored-by: Andriy Roshchenko <andriy.roshchenko@amd.com >
Co-authored-by: aska-0096 <haocwang@amd.com >
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: OscarXu <huaiguxu@amd.com >
Co-authored-by: mtgu0705 <mtgu@amd.com >
Co-authored-by: Ding, Yi <yi.ding@amd.com >
Co-authored-by: feifei14119 <feiw@amd.com >
Co-authored-by: Lin, Qun <qlin@amd.com >
Co-authored-by: joye <joye@amd.com >
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com >
2025-06-05 13:54:15 -06:00
Illia Silin
233e274077
Revert "[CK_TILE] Tile loop persistent gemm kernel ( #2191 )" ( #2293 )
...
This reverts commit ffb52783d0 .
2025-06-05 09:24:00 -07:00
Sami Remes
7ea1508b59
[CK_TILE] Move GEMM pipeline tail handling logic to pipelines ( #2222 )
...
* Add TailHandler for V3, V4 and Mem pipelines
* Adapt examples and tests to use TailHandler
* move tail-handling logic to pipeline in persistent grouped gemm
* Fix Mem pipeline dispatching, add CompV4 dispatching
* Use a macro for handling the many tails of Mem pipeline
* Fix formatting again
* Use const-ref RunFunction, remove unnecessary try_run
2025-06-04 11:50:21 +03:00
Sami Remes
ffb52783d0
[CK_TILE] Tile loop persistent gemm kernel ( #2191 )
...
* Implement tile loop persistent gemm kernel
* Enable timing
* Add tests for persistent gemm
* Fix formatting
* Fix gemm_basic
* Rename True/False to Persistent/NonPersistent
* Use only one set of layouts for persistent tests
* Fix gemm example persistent template parameter
* Fix formatting
2025-06-04 11:46:28 +03:00
Khushbu Agarwal
59a85cb4bc
[CK_Tile] Fix gemm kernel for 4,64,16 and 64,4,16 warp tile sizes ( #2262 )
...
* debugging issue
* debugging issue
* debugging
* debugging
* reverting debugging code
* clang formatted
* updating default_config.json
* fix ci failure
* clang formatted
2025-06-03 20:16:10 -07:00
Illia Silin
4e561af18c
Revert "add CShuffleM/NXdlPerWavePerShuffle in cshuffle_epilogue ( #2185 )" ( #2260 )
...
This reverts commit fd6a859b44 .
2025-05-29 16:22:16 -07:00
joyeamd
fd6a859b44
add CShuffleM/NXdlPerWavePerShuffle in cshuffle_epilogue ( #2185 )
...
* add cshuffle's mxdlperwavepershuffle support, not finished
* add epilogue functions
* add cshuffle's mxdlperwavepershuffle support, not finished
* add epilogue functions
* update cshuffle logic
* update cshuffle_logics
* add some change within review
* update some codes following the code review
* update epilogue logic
* remove from problem
* update codes following review.
* fix some issues
2025-05-29 14:31:14 +02:00
Po Yen Chen
c42b957d65
[CK_TILE] For FMHA forward kernels, assign block indices reversely if using mask ( #2209 )
...
* Assign block indices reversely if kHasMask=true
* Assign block indices reversely for splitkv kernel
2025-05-27 10:58:58 +08:00
Zzz9990
ece38b9d7a
[VLLM V1] Add chunked prefill for FA to pass seq with small seqlen_q ( #2221 )
...
* fix splitkv compiler issue since lse is used to select kernel instances
* bypass seqlen == 1
* add chunked prefill into mha varlen
This reverts commit aa9847e42d .
* skip compile when receipt 2-4 and add comments
* fix
---------
Co-authored-by: fsx950223 <fsx950223@outlook.com >
2025-05-26 19:17:18 +08:00
Sami Remes
d1e6f0982d
[CK_TILE] Grouped GEMM tile loop ( #2146 )
...
* Add trait to use a persistent kernel and split the entrypoints in grouped gemm
* Some helper functions for persistent kernel case
* Get max occupancy grid using device properties
* Implement tile loop in main entry point to grouped gemm
* Enable GridSize() on device
* Handle offset tile index using real current block index
* Add persistent kernel choice to grouped gemm example
* Use a for-loop for iterating over the group
* Reduce VGPR spills by early-exit
* Enable persistent kernel choice in grouped_gemm example
* Add persistent kernel option to grouped_gemm test
* Fix formatting with remod.py
* Remove GridUpdateBlocks as blocks are now iteratively computed
* Add comment about VGPR spilling
* Fix formatting
* Use CK_TILE_HOST instead of __host__
* Enable all Row/Col combinations in grouped gemm unit test
* Add some KBatch=2 cases to grouped gemm tests
* Fix SplitK for grouped gemm
* Enable pipeline hotloop/tailnumber selection in-kernel for grouped gemm
* Add type traits
* Split examples to regular and tileloop
* Formatting
* Use hipExtStreamGetCUMask to get current active CUs for the given stream
* Align test and example kernel config, and disable validation for splitk repeats
* Remove debug options from CMakeLists.txt
* Separate the code paths for persistent/non-persistent in test
* Fix formatting
* Address review comments
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2025-05-20 17:18:57 +03:00
Po Yen Chen
791802b381
[CK_TILE] fMHA batch_prefill block index & logits soft-capping optimizations ( #2198 )
...
* Write soft-sign in inline asm
* Change tile idx computation
* Add macro to turn off soft-sign asm opt
* Use simple for loop to avoid register spill
* Only do block id transform for masking cases
2025-05-16 15:14:46 +08:00
Khushbu Agarwal
3d8d6e75e4
Adding validation for tile sizes in Tile Engine ( #2189 )
...
* Adding validation for tile sizes
* Add architecture in config, and shuffle lines of code in warp_gemm.hpp
* Enable MFMA for gfx950, and invalid tile handling
2025-05-15 10:28:31 -07:00
BingYuan.Zhou
41c17d0a95
fix moe sorting build fail ( #2190 )
...
* fix moe sorting build fail
* refile code
---------
Co-authored-by: solin <bingzhou@amd.com >
2025-05-14 09:31:26 +08:00
Po Yen Chen
2920604786
[CK_TILE] Add logits soft-capping & customization support to the FMHA forward kernel/pipelines ( #2163 )
...
* hack for cap logits
* fix bug
* Re-format files
* Allow specifying logits_soft_cap through APIs
* Support turn on/off logits_soft_cap in async pipeline
* Do not generate non-verified kernels
* Align receipt used in Aiter
* Sync logits soft-capping across pipelines
* Re-enable some hdim pipelines
* fix perf
* Add attention variant for logits_soft_cap
* Add newline at end-of-file
* Fix performance
* Add comment to explain logits_soft_cap pre-processing
* Unify code
* Unify floating-point literal style
* Use class data member to slience the compilation error
* [CK_TILE] Update attention customizaton interface: add LogitsMask() (#2133 )
* Send 'mask' along with variant params to the LogitsMask()
* Send block indices to the variant
* Add indices parameters in variant interface
* Fix fmha bwd codegen error
* Allow switch logits_soft_cap impl
* Eliminate register spills
* Fix compilation errors
* Fix wrong LSE
* Fix LSE for splitkv kernel
* Sync splitkv pipeline changes
* Add batch_prefill kernel/pipeline
* Fix codegen error
* Undo changes in CMakeLists.txt
* Merge pipeline filtering check
* Use different code path if kHasLogitsSoftCap=false
* Remove [[maybe_unused]] attribute
* Use pre-existing compile-time flag to instantiate templates
* Sync pipeline changes
* Update CHANGELOG.md
---------
Co-authored-by: Bernard <bernaliu@amd.com >
Co-authored-by: coderfeli <coderfeli@163.com >
2025-05-13 12:19:25 +08:00
Khushbu Agarwal
f05e45ba59
Disable SMFMA gfx90a ( #2184 )
...
* sparsity fix for gfx90a
* reverting tile_engine changes
2025-05-12 09:56:23 -07:00
Thomas Ning
9d1e44e56a
Vectorized Transpose for Batched Transpose CK Tile Operator ( #2131 )
...
* Shared Memory for single data point
* CKTile Transpose vectorize CP1
* CKTile Transpose vectorize CP2
* CKTile Transpose vectorize CP2.1
* fixed the compile error of the transpose tile 2d
* Have the correct result for the current test sample
* Changes to printing tensor
* fp8 support added
* Debugging for transpose
* solving the corner issue
* Changed padding flag
* Intermideate Debugging
* Intermidiate Debugging
* Intermediate Debugging
* Finished debugging of the transpose op
* Code Cleanup
* Adding edge case smoke tests
* Adding Transpose test to CI/CD
* Adding Transpose test to CI/CD
* Adding Transpose test to CI/CD
* Addressing Review Comment
* Addressing Comments
* Addressing Comments
* Measuring Perf Tests
* Code Cleanup
* Changlog
* Added the running iterations
* clang format
* Fix the changelog
* Fix the compilation error
* change the printing factor
---------
Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com >
2025-05-12 00:41:45 -07:00
Khushbu Agarwal
d8faf1c6a1
Support for swizzle and transpose for MFMA_16x16x32_F16/BF16 ( #2172 )
...
* Changes for updating tile distribution for shuffle and transpose
* Fixed swizzle and transpose, removed comments
* clang formatted
* Adding support for bf16 type
* Addressing review comments
2025-05-10 22:40:05 -07:00
Khushbu Agarwal
ef72a4b9bc
Disable SMFMA for gfx90a ( #2182 )
2025-05-09 00:18:07 -07:00
Thomas Ning
c757046d49
Revert "Disable the SMFMA instruction for gfx90a. ( #2174 )" ( #2175 )
...
This reverts commit a32d907771 .
2025-05-08 00:07:03 -07:00
Khushbu Agarwal
a32d907771
Disable the SMFMA instruction for gfx90a. ( #2174 )
...
* remove smfma for gfx90a
* clang formatted
2025-05-07 23:09:22 -07:00
BingYuan.Zhou
6a3960c1e1
Flatmm merge ( #2168 )
...
* sync with function interface of cshuffleepiloge,fix flatmm build fail
* move code from solin/flatmm which add mfma16*16*32fp8 and optimize flatmm
---------
Co-authored-by: solin <bingzhou@amd.com >
2025-05-08 12:59:57 +08:00
jakpiase
cb07ad84d5
fix for default epilogue ( #2167 )
2025-05-07 10:46:53 -07:00
Aviral Goel
769336b640
[CK_TILE] Add type traits to detect tile window types at compile time ( #2158 )
...
* added WindowType enum to tile_window_structs and static assert checks in computev4 pipeline
* added type traits instead of enum to tile_window() and tile_window_linear() with debug comments
* removed comments, added documentation and clang format
2025-05-07 00:00:39 -07:00
carlushuang
4e9b76f88c
[CK_TILE] optimize moe sorting kernel, boost large context case up to 20x ( #2153 )
...
* combine 2-3 as single stage
* support zeroing
* improve long tokens
* update specialization
* b16 ws
* 8bit topk optimize
* update 15 example
2025-05-06 17:32:07 +08:00
jakpiase
0bcb804ad0
[CK_TILE] Remove scratch usage from universal gemm ( #2001 )
...
* moves kbatch condition outside of kernel
* add reviewer comments
* fixes
* fix tests
* fixes after review
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2025-05-05 18:46:44 +02:00
Khushbu Agarwal
d58f2b8bd0
mfma_32x32x64_fp8/bf8 ( #2148 )
...
* support for mfma_32x32x64_fp8
* clang-formatted
* Fixing sparsity in codegen
2025-05-01 13:36:24 -07:00
Illia Silin
9a9f59ae69
Revert "Add ck tile examples to package ( #1880 )" ( #2150 )
2025-04-30 10:20:16 -07:00
Aviral Goel
65f182d617
Add Matrix A and Matrix B Swizzle for LDS in Computev4 policy ( #2136 )
...
* fixed computev4 policy bug for lds swizzle
* added swizzle for input matrix B
* Improved ComputeV4 policy and pipeline by swizzling A and B
* consolidated LDS descriptor functions in parent struct
2025-04-28 18:20:47 -07:00
Khushbu Agarwal
d107f3c3a5
Support for MFMA_16x16x128 for fp8/bf8 ( #2125 )
...
* Adding 16x16x128 support for gfx950
* Support for fp8 and bf8
* fix input arguments for MFMA scale instruction
* clang-formatted
* Fixes for lwpck-3145 (#2138 )
* Fix lds tile & cmake dep & default epilogue
* Fallback BTypeToUse to ADataType in WOQ cases
* reverting instance json file
* reverting instance json file
---------
Co-authored-by: Yi DING <yi.ding@amd.com >
2025-04-28 18:19:50 -07:00
jakpiase
434d19f696
Add ck tile examples to package ( #1880 )
...
* add ck tile examples to package
* Update jenkinsfile
* fix for jenkinsfile
* fix for building ck tile code on non gfx9
* compile ck tile examples only for gfx94
* include ck tile examples in all target
* fix for basic gemm UseStructuredSparsity
* Update CMakeLists.txt
* Update gemm_pipeline_problem.hpp
* add targets to rocm install
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2025-04-28 09:53:19 -07:00
Khushbu Agarwal
a2ed34a112
MFMA_32x32x16 for gfx950 ( #2121 )
...
* Enable MFMA_32x32x16 for fp16/BF16 for gfx950
* clang formatted
2025-04-24 10:20:22 -07:00
carlushuang
5487289fc4
[CK_TILE] support gfx950 matrix core in 01_fmha fwd ( #2110 )
...
* gfx950 01_fmha fwd
* fix comment
---------
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
2025-04-23 12:40:18 -07:00
Gino Lu
504f563f78
[CK-Tile] warp-gemm support for using V_MFMA_F32_16x16x32_BF16 ( #2073 )
...
* draft v_mfma_f32_16x16x32_bf16
* fix error config and add debug code.
* Solve the CShuffle Problem
* draft v_mfma_f32_16x16x32_bf16
* fix error config and add debug code.
* Solve the CShuffle Problem
* fix error while testing new command
* Finished the feature of new mfma 16*16*32
* Addressed the comment
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com >
2025-04-22 15:52:36 -07:00
Thomas Ning
a738e43445
MFMA 16x16x32fp8 ( #2103 )
...
* add mfma_16x16x32_fp8
* clang format code
* Finished the fix for gemm basic
* clang foramt
* rebuild CI
* recover gemm.hpp
* add MFMA 16*16*32bf8
---------
Co-authored-by: solin <bingzhou@amd.com >
2025-04-21 10:21:35 -07:00
solin
c318ec0778
fix CI build fail
2025-04-21 16:00:12 +08:00
BingYuan.Zhou
eaf1f0bf3b
[flatmm] implement basic fp16 flatmm ( #2089 )
...
* [flatmm] implement basic fp16 flatmm
* fix CI build fail
---------
Co-authored-by: root <root@hjbog-srdc-50.amd.com >
Co-authored-by: solin <bingzhou@amd.com >
2025-04-16 16:51:17 +08:00
Thomas Ning
269f4f6af5
Solve the Static Encoding Pattern compile error when the tile size is too small ( #2079 )
2025-04-13 20:09:30 -07:00
jakpiase
6c61f4d237
[CK_TILE] Add 2:4 structured sparsity support for fp16 gemm ( #1957 )
...
* add structured sparsity fp16 support for gemm
* added reviewer suggestions
* update changelog
* update changelog
* add reviewers suggestions
* Minor fix
* clang fix
* fix doxygen
2025-04-11 12:18:26 +02:00
slippedJim
5f885d2b7a
add fmha fwd splitkv receipt for aiter c++ api ( #2068 )
...
* add s_randval for c++ api
* Fix bug of bias in splitkv
---------
Co-authored-by: rocking <ChunYu.Lai@amd.com >
2025-04-10 23:21:13 +08:00
Illia Silin
572cd820ce
Split env.hpp header from the ck.hpp header. ( #2049 )
...
* split env.hpp out of main headers
* fix namespace logic
2025-04-03 15:30:21 -07:00
Adam Osewski
e5ad48a784
Basic docs for universal gemm & ck-tile gemm. ( #2014 )
...
* Basic docs for universal gemm & ck-tile gemm.
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle_v3.hpp
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Reviewers suggestions.
* Align tparam names in doc with class tparams.
* More reviewers fine tuning ;)
---------
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
2025-04-02 11:03:40 +02:00
rocking
8a20b62e91
Reduce redundant space in bias tensor ( #2024 )
...
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
2025-03-28 21:58:06 +08:00