mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-19 12:30:16 +00:00
* Allow selection of mfma_scale instructions
* Read B tensor from LDS to VGPR in chunks of 16 in MFMA order
* Add constexpr and synchronize return type for `get_exponent_value`
* Pass scales by reference and add comments to `mfma_scale_f32_32x32x64`
* Add support for microscaling instructions in `XdlopsGemm`
* Fix `mfma_scale_f32_16x16x128f8f6f4` wrapper
* Remove software implementation of MX GEMM
* Make interface of `intrin_mfma_scale_f32_16x16x128f8f6f4<16, 16>` consistent with the other scale instruction
* Update README
* Updated CHANGELOG
* Remove unused static methods
[ROCm/composable_kernel commit: 7106976a72]
25 lines
617 B
Markdown
25 lines
617 B
Markdown
# GEMM Examples for Microscaling Formats
|
|
|
|
## example_gemm_mx_fp8
|
|
|
|
Custom verification parameters:
|
|
```bash
|
|
# arg1: verification (0=no, 1=CPU)
|
|
# arg2: initialization (0=constant values, 1=integer values, 2=decimal values)
|
|
# arg3: time kernel (0=no, 1=yes)
|
|
# arg4: verbosity (0=no info, 1=verbose info)
|
|
# arg5 to 10: M(128x), N(128x), K(64x), StrideA, StrideB, StrideC
|
|
# arg11: KBatch
|
|
./bin/example_gemm_mx_fp8 1 1 0 1
|
|
```
|
|
|
|
Custom tensor shapes:
|
|
```bash
|
|
./bin/example_gemm_mx_fp8 1 2 1 0 128 128 256 -1 -1 -1 1
|
|
```
|
|
|
|
Default invocation:
|
|
```bash
|
|
# Implies: ./bin/example_gemm_mx_fp8 1 2 0 0
|
|
./bin/example_gemm_mx_fp8
|
|
``` |