mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 02:02:46 +00:00
* initial poc
* factor out common parts in operator()
* cv4
* rest of the universal gemm pipelines
* fix test
* remove boilerplate from tile engine
* fix example
* fix example
* format
* fix tests build for gemm
* remove base pipeline codegen from gemm instance builder
* unify v3 logic with the rest of universal gemm pipelines
* fix build for multi abd test
* fix test gemm multi d
* fix build for weight preshuffle
* fix grouped gemm test
* fix grouped gemm multi d test
* fix grouped gemm preshuffle
* fix grouped gemm example except for quant
* fix gemm preshuffle
* fix splitk 2 stage example
* fix batched gemm example
* fix multid example
* fix multiabd example
* fix batched gemm test
* fixup
* fix examples build
* fix grouped gemm test build
* fix smoke builder
* hacky poc
* fix tile engine
* kill the lambda
* maybe fix test build
* more fixes
* clang-format
* save temp
* clang-format
* mostly fix examples
* clang-format
* remove dead code
* more cleanup
* fix fmha bwd build (default epilogue set/add appears to be broken)
* fix default epilogue tests but not correctness
* clang-format
* fix bquant
* clang-format
* cleanup dead code
* rearrange make windows for readability
* restore changes to IsSupportedArgument
* fix smoke-builder
* clang-format
* fixup rename class
* build fixes
* clang-format
* fix builder
* fixup
* remove set from builder tests
* fix test
* clang-format
* re-refactor the kernels
* clang-format
* fix header license
* remove memory operation from conv bwd test
* clang-format
* clang-format example,include
* clang-format test
* build fixes
* clang-format
* solve compilation error
* fix the CI
* solve compilation error
* clang format
* solve merge conflict
* solve merge conflict
* solve the gfx11 error
* solve test error
* moar build fixes
* remove AtomicAddRequiresKBatchGreaterThanOne test since the property is removed from the kernel scope
---------
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
[ROCm/composable_kernel commit: e339101e9c]
Stream-K GEMM
This folder contains examples of Stream-K GEMMs using the ck_tile tile-programming implementation.
build
# in the root of ck_tile
mkdir build && cd build
# you can replace <arch> with the appropriate architecture (for example gfx942) or leave it blank
../script/cmake-ck-dev.sh ../ <arch>
# Compile the Stream-K kernels
make tile_example_streamk_gemm_basic -j
This will result in an executable build/bin/tile_example_streamk_gemm_basic
example
args:
-m m dimension (default:512)
-n n dimension (default:512)
-k k dimension (default:512)
-a_layout tensor A data layout (default: R)
-b_layout tensor B data layout (default: C)
-c_layout tensor C data layout (default: R)
-reduction_strategy strategy for storing results in C tensor. atomic/reduction (default:atomic)
-persistent_dp persistent strategy for data-parallel section. Set to 0 for non-persistent or to 1 for persistent. (default:0)
-stride_a tensor A stride (default:0)
-stride_b tensor B stride (default:0)
-stride_c tensor C stride (default:0)
-v validation strategy. 0. No validation, 1. Validation on CPU, 2. Validation on GPU (default:1)
-prec data type. fp16/bf16/fp8/bf8 (default:fp16)
-warmup number of iterations before benchmarking the kernel (default:50)
-repeat number of iterations to benchmark the kernel (default:100)
-timer timing mode. gpu:gpu timer, cpu:cpu timer (default:gpu)
-init data initialization strategy. 0:random, 1:linear, 2:constant(1) (default:0)
-flush_cache flush the cache before running the kernel (default:true)