ROCm/composable_kernel

Fork 0

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-03-15 12:47:44 +00:00

Files

Vidyasagar Ananthan 92c67a824f [DOCS] Documentation Addition (Readme updates) (#2495 )

* GH-2368 Adding a basic glossary

GH-2368 Minor edits

GH-2368 Adding missing READMEs and standardization.

resolving readme updates

GH-2368 Minor improvements to documentation.

Improving some readmes.

Further improvement for readmes.

Cleaned up the documentation in 'client_example' (#2468)

Update for PR

Update ACRONYMS.md to remove trivial terms

Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine.

revise 37_transpose readme

revise 36_copy readme

Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity.

Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity.

Remove references to the Tile Engine in README files across multiple examples

* GH-2368 Adding a basic glossary

GH-2368 Minor edits

GH-2368 Adding missing READMEs and standardization.

resolving readme updates

GH-2368 Minor improvements to documentation.

Improving some readmes.

Further improvement for readmes.

Cleaned up the documentation in 'client_example' (#2468)

Update for PR

Update ACRONYMS.md to remove trivial terms

Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine.

revise 37_transpose readme

revise 36_copy readme

Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity.

Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity.

Remove references to the Tile Engine in README files across multiple examples

Refine README files by removing outdated references to the Tile Engine

* Updates based on PR feedback 1

* Updates based on PR feedback 2

* Updates based on PR feedback 3

* Updates based on PR feedback 4

* Updates based on PR feedback 5

* Updates based on PR feedback 6

* Updates based on PR feedback 7

* Updates based on PR feedback 8

* Content Modification of CK Tile Example

* Modify the ck_tile gemm config

---------

Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>

2025-10-16 03:10:57 -07:00

4.6 KiB

Raw Permalink Blame History

Acronyms in Composable Kernel

The following acronyms are used in the Composable Kernel codebase:

Acronym	Expansion	Explanation
BF16	Brain Floating Point 16	1 Signed bit, 8 Exponent bits, 7 Significand bits
BF8	8-bit Brain Floating Point	1 Signed bit, 3 Exponent bits, 4 Significand bits
DLA	Deep Learning Accelerator	Specialized hardware for deep learning workloads
DRAM	Dynamic Random-Access Memory	Main memory. Global memory on GPU
E2E	End-to-End	Complete pipeline or process from input to output
ELU	Exponential Linear Unit	Activation function: `x` if `x>0` else `\alpha(e^x-1)`
FMHA	Fused Multi-Head Attention	Efficient transformer attention kernel, fusing softmax, masking, and matmul
FP16	Half-Precision Floating Point	16-bit IEEE floating point format
FP32	Single-Precision Floating Point	32-bit IEEE floating point format
FP64	Double-Precision Floating Point	64-bit IEEE floating point format
FP8	8-bit Floating Point	Experimental 8-bit floating point format for inference
GEMM	General Matrix Multiply	Matrix multiplication operation: `C = A \times B`
GELU	Gaussian Error Linear Unit	Activation function: `x \cdot \Phi(x)`
GQA	Grouped Query Attention	Variant of multi-head attention with grouped queries/keys/values
HBM	High Bandwidth Memory	Fast memory used in modern GPUs
HIP	Heterogeneous-Compute Interface for Portability	AMD's CUDA-like GPU programming API
INT8	8-bit Integer	Quantized integer format for inference
KVS	Key-Value Store	Data structure for storing key-value pairs (context: QKV in transformers)
L2/L1	Level 2/Level 1 Cache	On-chip memory hierarchy in CPUs/GPUs
LDS	Local Data Share	Shared memory on AMD GPUs (equivalent to CUDA's shared memory)
LLM	Large Language Model	Transformer-based model for NLP tasks
LSE	Log-Sum-Exp	Numerically stable softmax computation: `\log(\sum \exp(x))`
MHA	Multi-Head Attention	Attention mechanism with multiple heads in transformers
MFMA	Matrix Fused Multiply-Add	AMD GPU hardware instruction for matrix-matrix multiplication
MoE	Mixture of Experts	Neural network architecture with multiple expert subnetworks
MQA	Multi-Query Attention	Variant of multi-head attention with shared keys/values across heads
RCCL	ROCm Collective Communications Library	AMD Library for multi-GPU communication
NCHW	Batch, Channel, Height, Width	Tensor layout: batch-major, channels-first
NHWC	Batch, Height, Width, Channel	Tensor layout: batch-major, channels-last
OOM	Out Of Memory	Error when memory allocation fails
QAT	Quantization Aware Training	Training technique for quantized inference
QKV	Query, Key, Value	Components of transformer attention mechanism
RDMA	Remote Direct Memory Access	High-speed network memory access
RDQuant	Rowwise Dynamic Quantization	Quantization technique with per-row scaling for int8 inference
ReLU	Rectified Linear Unit	Activation function: `\max(0, x)`
ROCm	Radeon Open Compute	AMD's open GPU computing stack
SGD	Stochastic Gradient Descent	Optimization algorithm for training neural networks
SM	Streaming Multiprocessor	GPU compute unit (NVIDIA terminology)
SWA	Sliding Window Attention	Attention mechanism with a limited window for each token
TLB	Translation Lookaside Buffer	Memory management unit cache for virtual-to-physical address translation
VGPR	Vector General Purpose Register	GPU register for vector operations
WARP	Group of Threads	Smallest scheduling unit on NVIDIA GPUs (32 threads)
WMMA	Warp Matrix Multiply-Accumulate	NVIDIA's matrix-multiply hardware primitive
XLA	Accelerated Linear Algebra	Compiler for optimizing ML computations (Google)

Common Variable Acronyms in Code

Symbol	Meaning	Context
M, N, K	Matrix dimensions	GEMM: `A[M,K] \times B[K,N] = C[M,N]`
Q, K, V	Query, Key, Value	Transformer attention
S	Sequence length	NLP, transformers
D	Dimension	Hidden size, feature dim
B	Batch size	ML batch processing
H	Head count	Multi-head attention
C	Channel	CNNs, tensor layouts
T	Token	NLP, sequence models

If you find an acronym not listed here, please submit a pull request or issue!

4.6 KiB Raw Permalink Blame History

Acronyms in Composable Kernel

Common Variable Acronyms in Code

4.6 KiB

Raw Permalink Blame History