Files
Vidyasagar Ananthan 92c67a824f [DOCS] Documentation Addition (Readme updates) (#2495)
* GH-2368 Adding a basic glossary

GH-2368 Minor edits

GH-2368 Adding missing READMEs and standardization.

resolving readme updates

GH-2368 Minor improvements to documentation.

Improving some readmes.

Further improvement for readmes.

Cleaned up the documentation in 'client_example' (#2468)

Update for PR

Update ACRONYMS.md to remove trivial terms

Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine.

revise 37_transpose readme

revise 36_copy readme

Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity.

Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity.

Remove references to the Tile Engine in README files across multiple examples

* GH-2368 Adding a basic glossary

GH-2368 Minor edits

GH-2368 Adding missing READMEs and standardization.

resolving readme updates

GH-2368 Minor improvements to documentation.

Improving some readmes.

Further improvement for readmes.

Cleaned up the documentation in 'client_example' (#2468)

Update for PR

Update ACRONYMS.md to remove trivial terms

Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Apply suggestion from @spolifroni-amd

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine.

revise 37_transpose readme

revise 36_copy readme

Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity.

Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity.

Remove references to the Tile Engine in README files across multiple examples

Refine README files by removing outdated references to the Tile Engine

* Updates based on PR feedback 1

* Updates based on PR feedback 2

* Updates based on PR feedback 3

* Updates based on PR feedback 4

* Updates based on PR feedback 5

* Updates based on PR feedback 6

* Updates based on PR feedback 7

* Updates based on PR feedback 8

* Content Modification of CK Tile Example

* Modify the ck_tile gemm config

---------

Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>
2025-10-16 03:10:57 -07:00

6.1 KiB

Tensor Contraction with Multiple A, B, and D Tensors

This example demonstrates a tensor contraction operation with multiple A, B, and D tensors. This extends the basic tensor contraction to handle multiple input tensor pairs and auxiliary tensors simultaneously, enabling complex multi-input tensor network computations to be executed in a single kernel launch.

Mathematical Formulation

This operation performs multiple tensor contractions simultaneously and combines them with auxiliary tensors.

  1. Multiple Tensor Contractions: Compute contractions from multiple A and B tensor pairs using Einstein summation notation. C_{temp0} = \text{einsum}(\text{pattern}_0, A_0, B_0) C_{temp1} = \text{einsum}(\text{pattern}_1, A_1, B_1) \vdots C_{tempK} = \text{einsum}(\text{pattern}_K, A_K, B_K)

  2. Combination with Auxiliary Tensors: Apply a user-defined function that combines all contraction results with multiple D tensors. E = f(C_{temp0}, C_{temp1}, \ldots, C_{tempK}, D_0, D_1, \ldots, D_M)

Each contraction can have different Einstein summation patterns, allowing for complex tensor network computations. The key optimization is that all intermediate tensors are never written to global memory.

Algorithmic Strategy: Multi-Input Contraction with Tensor-to-GEMM Mapping

This kernel extends the tensor contraction algorithm to handle multiple simultaneous contractions.

  1. Unified Tensor-to-GEMM Mapping: Each tensor contraction is mapped to a GEMM operation through tensor reshaping:

    • Multiple Reshaping Operations: For each contraction pair (A_i, B_i), the tensors are logically reshaped into 2D matrices based on their Einstein summation pattern.
    • Coordinated Memory Layout: The reshaping operations are coordinated to enable efficient memory access patterns across all contractions.
  2. Multi-Contraction Tile Computation: Within each thread block:

    • Parallel GEMM Execution: Multiple GEMM operations (representing the contractions) are computed simultaneously.
    • Complex Address Calculation: Each contraction requires its own address calculation logic for the tensor descriptor interpretation.
    • Register Management: Multiple accumulator arrays are maintained for the different contraction results.
  3. Tensor Fusion Epilogue: After computing all contractions:

    • Multi-Tensor Reshape: The GEMM results are logically reshaped back to their target tensor shapes.
    • Load Auxiliary Tensors: Read corresponding elements from all D tensors.
    • Apply Fusion Function: Execute the user-defined function f combining all results.
    • Store Final Tensor: Write the combined result to the output tensor.

Source Code Organization

Build and Run

Prerequisites

Ensure the Composable Kernel library is built and installed.

cd /path/to/composable_kernel/build
make -j install

Build the Example

cd /path/to/composable_kernel/example/61_contraction_multi_ABD
mkdir build && cd build

cmake \
  -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
  -DCMAKE_PREFIX_PATH="/opt/rocm;${CK_INSTALL_PATH}" \
  ..

make -j

Run the Example

# Run the example with default settings
./contraction_multi_ABD_xdl

# Run with verification, data initialization, and timing
./contraction_multi_ABD_xdl 1 2 1

Applications

This kernel is valuable for complex tensor network computations found in advanced scientific and machine learning applications.

  • Tensor Network Methods: Computing multiple tensor contractions simultaneously in quantum physics simulations, such as DMRG (Density Matrix Renormalization Group) or PEPS (Projected Entangled Pair States).
  • Multi-Modal Tensor Analysis: Processing multiple tensor contractions for different data modalities in machine learning applications.
  • Higher-Order Statistics: Computing multiple statistical tensor operations simultaneously, such as different moments or correlation patterns.
  • Advanced Neural Network Layers: Implementing complex layers that require multiple tensor operations, such as tensor decomposition layers or high-dimensional convolutions.
  • Scientific Computing: Simulating physical systems that require multiple tensor contractions, such as in quantum chemistry or condensed matter physics.

Computational Complexity

The complexity depends on the specific contraction patterns used:

  • Multiple Contractions: Each contraction has its own complexity based on tensor dimensions and contraction indices
  • Memory Access: Complex patterns due to multiple tensor descriptors and reshaping operations
  • Register Pressure: High due to multiple accumulator arrays and intermediate results
  • Instruction Diversity: Different contractions may have different computational patterns

Comparison with Single Contraction

Aspect Single Contraction Multi-Contraction
Input Complexity Single tensor pair Multiple tensor pairs
Memory Layout Single reshaping pattern Multiple coordinated patterns
Computation Single GEMM operation Multiple parallel GEMMs
Fusion Opportunity Simple epilogue Complex multi-input epilogue
Applications Basic tensor operations Complex tensor networks

This kernel showcases the ability to handle extremely complex tensor network computations efficiently, making it valuable for advanced scientific computing and machine learning research applications.