mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-03-15 12:47:44 +00:00
* GH-2368 Adding a basic glossary GH-2368 Minor edits GH-2368 Adding missing READMEs and standardization. resolving readme updates GH-2368 Minor improvements to documentation. Improving some readmes. Further improvement for readmes. Cleaned up the documentation in 'client_example' (#2468) Update for PR Update ACRONYMS.md to remove trivial terms Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats Apply suggestion from @spolifroni-amd Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Apply suggestion from @spolifroni-amd Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine. revise 37_transpose readme revise 36_copy readme Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity. Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity. Remove references to the Tile Engine in README files across multiple examples * GH-2368 Adding a basic glossary GH-2368 Minor edits GH-2368 Adding missing READMEs and standardization. resolving readme updates GH-2368 Minor improvements to documentation. Improving some readmes. Further improvement for readmes. Cleaned up the documentation in 'client_example' (#2468) Update for PR Update ACRONYMS.md to remove trivial terms Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats Apply suggestion from @spolifroni-amd Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Apply suggestion from @spolifroni-amd Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine. revise 37_transpose readme revise 36_copy readme Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity. Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity. Remove references to the Tile Engine in README files across multiple examples Refine README files by removing outdated references to the Tile Engine * Updates based on PR feedback 1 * Updates based on PR feedback 2 * Updates based on PR feedback 3 * Updates based on PR feedback 4 * Updates based on PR feedback 5 * Updates based on PR feedback 6 * Updates based on PR feedback 7 * Updates based on PR feedback 8 * Content Modification of CK Tile Example * Modify the ck_tile gemm config --------- Co-authored-by: AviralGoelAMD <aviral.goel@amd.com> Co-authored-by: ThomasNing <thomas.ning@amd.com>
4.6 KiB
4.6 KiB
Acronyms in Composable Kernel
The following acronyms are used in the Composable Kernel codebase:
| Acronym | Expansion | Explanation |
|---|---|---|
| BF16 | Brain Floating Point 16 | 1 Signed bit, 8 Exponent bits, 7 Significand bits |
| BF8 | 8-bit Brain Floating Point | 1 Signed bit, 3 Exponent bits, 4 Significand bits |
| DLA | Deep Learning Accelerator | Specialized hardware for deep learning workloads |
| DRAM | Dynamic Random-Access Memory | Main memory. Global memory on GPU |
| E2E | End-to-End | Complete pipeline or process from input to output |
| ELU | Exponential Linear Unit | Activation function: x if x>0 else \alpha(e^x-1) |
| FMHA | Fused Multi-Head Attention | Efficient transformer attention kernel, fusing softmax, masking, and matmul |
| FP16 | Half-Precision Floating Point | 16-bit IEEE floating point format |
| FP32 | Single-Precision Floating Point | 32-bit IEEE floating point format |
| FP64 | Double-Precision Floating Point | 64-bit IEEE floating point format |
| FP8 | 8-bit Floating Point | Experimental 8-bit floating point format for inference |
| GEMM | General Matrix Multiply | Matrix multiplication operation: C = A \times B |
| GELU | Gaussian Error Linear Unit | Activation function: x \cdot \Phi(x) |
| GQA | Grouped Query Attention | Variant of multi-head attention with grouped queries/keys/values |
| HBM | High Bandwidth Memory | Fast memory used in modern GPUs |
| HIP | Heterogeneous-Compute Interface for Portability | AMD's CUDA-like GPU programming API |
| INT8 | 8-bit Integer | Quantized integer format for inference |
| KVS | Key-Value Store | Data structure for storing key-value pairs (context: QKV in transformers) |
| L2/L1 | Level 2/Level 1 Cache | On-chip memory hierarchy in CPUs/GPUs |
| LDS | Local Data Share | Shared memory on AMD GPUs (equivalent to CUDA's shared memory) |
| LLM | Large Language Model | Transformer-based model for NLP tasks |
| LSE | Log-Sum-Exp | Numerically stable softmax computation: \log(\sum \exp(x)) |
| MHA | Multi-Head Attention | Attention mechanism with multiple heads in transformers |
| MFMA | Matrix Fused Multiply-Add | AMD GPU hardware instruction for matrix-matrix multiplication |
| MoE | Mixture of Experts | Neural network architecture with multiple expert subnetworks |
| MQA | Multi-Query Attention | Variant of multi-head attention with shared keys/values across heads |
| RCCL | ROCm Collective Communications Library | AMD Library for multi-GPU communication |
| NCHW | Batch, Channel, Height, Width | Tensor layout: batch-major, channels-first |
| NHWC | Batch, Height, Width, Channel | Tensor layout: batch-major, channels-last |
| OOM | Out Of Memory | Error when memory allocation fails |
| QAT | Quantization Aware Training | Training technique for quantized inference |
| QKV | Query, Key, Value | Components of transformer attention mechanism |
| RDMA | Remote Direct Memory Access | High-speed network memory access |
| RDQuant | Rowwise Dynamic Quantization | Quantization technique with per-row scaling for int8 inference |
| ReLU | Rectified Linear Unit | Activation function: \max(0, x) |
| ROCm | Radeon Open Compute | AMD's open GPU computing stack |
| SGD | Stochastic Gradient Descent | Optimization algorithm for training neural networks |
| SM | Streaming Multiprocessor | GPU compute unit (NVIDIA terminology) |
| SWA | Sliding Window Attention | Attention mechanism with a limited window for each token |
| TLB | Translation Lookaside Buffer | Memory management unit cache for virtual-to-physical address translation |
| VGPR | Vector General Purpose Register | GPU register for vector operations |
| WARP | Group of Threads | Smallest scheduling unit on NVIDIA GPUs (32 threads) |
| WMMA | Warp Matrix Multiply-Accumulate | NVIDIA's matrix-multiply hardware primitive |
| XLA | Accelerated Linear Algebra | Compiler for optimizing ML computations (Google) |
Common Variable Acronyms in Code
| Symbol | Meaning | Context |
|---|---|---|
| M, N, K | Matrix dimensions | GEMM: A[M,K] \times B[K,N] = C[M,N] |
| Q, K, V | Query, Key, Value | Transformer attention |
| S | Sequence length | NLP, transformers |
| D | Dimension | Hidden size, feature dim |
| B | Batch size | ML batch processing |
| H | Head count | Multi-head attention |
| C | Channel | CNNs, tensor layouts |
| T | Token | NLP, sequence models |
If you find an acronym not listed here, please submit a pull request or issue!