* Support bf16/fb8/bf8 datatypes for ck_tile/gemm
* remove commented out code.
* Addressing code review comments and enabling universal_gemm for all the supported data types.
* Merge conflict resolution.
* Solve the memory pipeline compilation error. Merge with the new change of CShuffle
* finish the feature, pass the tests
* Fix the pipeline and add the benchmark script for other data types
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com>
* Add spatially local tile partitioner
* Use 1D Grid size & create partitioner object.
* Docs & use 1D partitioner in example.
* Clang format.
* Change kernel grid size
Now: X is the # of output C-tiles,
Y is the batch count
Z is the splitK
* Formatting & more doc.
* Clang format.
* Fix batched gemm test. Use 1d partitioner.
* Move condition.
* FIx ctor.
* clang-format.
* Refactor universal gemm policy.
* Adapt example to refactor changes.
* Introduce static encoding pattern
* Adding shuffled encoding patterns.
* Fix err in reverse tuple.
* Add transpose_tile2d
* Small refactoring + doc
* Enable reading on contiguous dimension in all layouts.
* Transpose A/B register tile if needed for comp v3 pipeline.
* Take contiguous dim size when calculating dram vector load size.
* A/B smem pack size taken from WarpGemm attributes
* Update B LDS layout and setup tile distribution pattern at class level.
* Fix static assert.
* Fix errors in examples.
* Formatting & fix IsTranspose
* Fix VectorSize & refactor.
* Add error loging messages.
* Fix VecLoadSize and TranspseC for mem pipeline.
* Update unit-tests & disable mem pipeline.
* Clang format
* Update include/ck_tile/core/tensor/tile_window.hpp
Co-authored-by: jakpiase <jakub.piasecki@amd.com>
* Fix compilation and reviewers comments.
* Refactor unit-test. Fallback to non-universal gemm.
Need to use GemmPipelineAGmemBGmemCRegV1 for now,
since GemmKernel is now supporting also non-K major vector reads.
---------
Co-authored-by: jakpiase <jakub.piasecki@amd.com>
* refactor the block_gemm_areg_breg_creg_v1 and add the v2 policy with 2x2 warp gemm
* Finished the 2x2 warp gemm policy and the block selection mechanism
* Clang format
* address poyen's comment
* Address feedbacks
* Fixed the compilation issue
* Change the function name