mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-04-20 06:49:15 +00:00
[rocm-libraries] ROCm/rocm-libraries#4431 (commit ca33816)
[CK] updated github repo link The location of the github repo has changed; the landing page of the docs needs to reflect this. Updated only the git repo links in the docs folder. Also added info to the install doc about how to do a sparse checkout. Updated some refs that were messed up while I was at it.
This commit is contained in:
committed by
assistant-librarian[bot]
parent
9059730caf
commit
6549c320fc
@@ -9,7 +9,7 @@ Overview
|
||||
|
||||
At the foundation of the CK Tile system lies BufferView, a compile-time abstraction that provides structured access to raw memory regions within GPU kernels. This serves as the bridge between the hardware's physical memory model and the higher-level abstractions that enable efficient GPU programming. BufferView encapsulates the complexity of GPU memory hierarchies while exposing a unified interface that works seamlessly across different memory address spaces including global memory shared across the entire device, local data share (LDS) memory shared within a workgroup, or the ultra-fast register files private to each thread.
|
||||
|
||||
BufferView serves as the foundation for :ref:`ck_tile_tensor_views`, which add multi-dimensional structure on top of raw memory access. Understanding BufferView is essential before moving on to more complex abstractions like :ref:`ck_tile_distribution` and :ref:`ck_tile_tile_window`.
|
||||
BufferView serves as the foundation for :ref:`ck_tile_tensor_views`, which add multi-dimensional structure on top of raw memory access. Understanding BufferView is essential before moving on to more complex abstractions like :ref:`ck_tile_tile_distribution` and :ref:`ck_tile_tile_window`.
|
||||
|
||||
By providing compile-time knowledge of buffer properties through template metaprogramming, BufferView enables the compiler to generate optimal machine code for each specific use case. This zero-overhead abstraction ensures that the convenience of a high-level interface comes with no runtime performance penalty.
|
||||
|
||||
@@ -456,7 +456,7 @@ Address spaces are encoded in types so that common errors are reported at compil
|
||||
|
||||
BufferView supports configurable handling of invalid values, optional runtime bounds checks, and conditional access patterns. It also provides atomic operations for thread-safe updates. These features are intended to cover common edge cases without adding unnecessary overhead.
|
||||
|
||||
By hiding the complexity of different memory spaces while exposing the operations needed for high-performance GPU computing, BufferView establishes a pattern that the rest of CK Tile follows: compile-time abstractions that enhance rather than compromise performance. The :ref:`ck_tile_tensor_views` and :ref:`ck_tile_distribution` add capability while maintaining the efficiency established at the base. For hardware-specific details about memory hierarchies, see :ref:`ck_tile_gpu_basics`.
|
||||
By hiding the complexity of different memory spaces while exposing the operations needed for high-performance GPU computing, BufferView establishes a pattern that the rest of CK Tile follows: compile-time abstractions that enhance rather than compromise performance. The :ref:`ck_tile_tensor_views` and :ref:`ck_tile_tile_distribution` add capability while maintaining the efficiency established at the base. For hardware-specific details about memory hierarchies, see :ref:`ck_tile_gpu_basics`.
|
||||
|
||||
Next Steps
|
||||
----------
|
||||
|
||||
@@ -550,7 +550,7 @@ This example demonstrates how CK Tile transforms convolution from a memory-bound
|
||||
|
||||
- **Sliding windows** can be efficiently represented using tensor descriptors with appropriate strides
|
||||
- **Im2col transformation** converts convolution to matrix multiplication without data copies
|
||||
- **Tile distribution** enables optimal work distribution across GPU threads (see :ref:`ck_tile_distribution`)
|
||||
- **Tile distribution** enables optimal work distribution across GPU threads (see :ref:`ck_tile_tile_distribution`)
|
||||
- **Multi-channel support** extends naturally through higher-dimensional descriptors
|
||||
- **Performance optimizations** like vectorization and shared memory are seamlessly integrated (see :ref:`ck_tile_gemm_optimization` for similar techniques)
|
||||
|
||||
|
||||
@@ -317,7 +317,7 @@ Movement Through Adaptors
|
||||
Advanced Movement Patterns
|
||||
==========================
|
||||
|
||||
Real-world applications use advanced movement patterns for optimal memory access. These patterns often relate to :ref:`ck_tile_tile_window` operations and :ref:`ck_tile_distribution` concepts:
|
||||
Real-world applications use advanced movement patterns for optimal memory access. These patterns often relate to :ref:`ck_tile_tile_window` operations and :ref:`ck_tile_tile_distribution` concepts:
|
||||
|
||||
Tiled Access Pattern
|
||||
--------------------
|
||||
|
||||
@@ -45,7 +45,7 @@ Learning Path
|
||||
|
||||
How to work with multi-dimensional data structures and memory layouts.
|
||||
|
||||
4. **Core API**: :ref:`ck_tile_distribution`
|
||||
4. **Core API**: :ref:`ck_tile_tile_distribution`
|
||||
|
||||
The tile distribution system that maps work to GPU threads.
|
||||
|
||||
@@ -105,4 +105,4 @@ Next Steps
|
||||
|
||||
To dive deeper, start with :ref:`ck_tile_introduction` to understand the motivation and core concepts behind CK Tile.
|
||||
|
||||
For practical examples, see the `example/ck_tile <https://github.com/ROCm/composable_kernel/tree/develop/example/ck_tile>`_ directory in the Composable Kernel repository.
|
||||
For practical examples, see the `example/ck_tile <https://github.com/ROCm/rocm-libraries/tree/develop/projects/composablekernel/example/ck_tile>`_ directory in the Composable Kernel repository.
|
||||
|
||||
@@ -276,7 +276,7 @@ The foundation of the exploration begins with raw memory access through :ref:`ck
|
||||
|
||||
With these foundational concepts established, the documentation delves into the :ref:`ck_tile_coordinate_systems` that powers tile distribution. This engine implements the mathematical framework that have been introduced, providing compile-time transformations between P-space, Y-space, X-space, and D-space. Understanding these transformations at a deep level enables developers to reason about performance implications and design custom distribution strategies for novel algorithms. The :ref:`ck_tile_transforms` and :ref:`ck_tile_adaptors` provide the building blocks for these transformations.
|
||||
|
||||
The high-level :ref:`ck_tile_distribution` APIs represent the culmination of these lower-level abstractions. These APIs provide an accessible interface for common patterns while exposing enough flexibility for advanced optimizations. Through concrete examples and detailed explanations, the documentation will demonstrate how to leverage these APIs to achieve near-optimal performance across a variety of computational patterns. The :ref:`ck_tile_tile_window` abstraction provides the gateway for efficient data access.
|
||||
The high-level :ref:`ck_tile_tile_distribution` APIs represent the culmination of these lower-level abstractions. These APIs provide an accessible interface for common patterns while exposing enough flexibility for advanced optimizations. Through concrete examples and detailed explanations, the documentation will demonstrate how to leverage these APIs to achieve near-optimal performance across a variety of computational patterns. The :ref:`ck_tile_tile_window` abstraction provides the gateway for efficient data access.
|
||||
|
||||
The exploration of coordinate systems goes beyond the basic P, Y, X, D framework to encompass advanced topics such as multi-level tiling, replication strategies, and specialized coordinate systems for specific algorithm classes. The :ref:`ck_tile_encoding_internals` reveals the mathematical foundations, while :ref:`ck_tile_thread_mapping` shows how these abstractions map to hardware. This comprehensive treatment ensures that developers can handle not just common cases but also novel algorithms that require custom distribution strategies.
|
||||
|
||||
|
||||
@@ -23,7 +23,7 @@ The concept of a tile represents the fundamental unit of data organization in th
|
||||
|
||||
Distribution
|
||||
~~~~~~~~~~~~
|
||||
The distribution pattern represents one of the most compile-time abstractions in the CK framework, defining the precise mapping between logical data elements and the physical processing resources that will operate on them. A distribution is far more than an assignment scheme—it embodies a strategy for achieving optimal performance on GPU hardware. The distribution determines which threads access which data elements, how those accesses are ordered to maximize memory bandwidth, and how intermediate results are shared between cooperating threads. By encoding these decisions at compile time, distributions enable the generation of highly optimized code that respects hardware constraints while maintaining algorithmic clarity. For a detailed exploration of distribution concepts, see :ref:`ck_tile_distribution`.
|
||||
The distribution pattern represents one of the most compile-time abstractions in the CK framework, defining the precise mapping between logical data elements and the physical processing resources that will operate on them. A distribution is far more than an assignment scheme—it embodies a strategy for achieving optimal performance on GPU hardware. The distribution determines which threads access which data elements, how those accesses are ordered to maximize memory bandwidth, and how intermediate results are shared between cooperating threads. By encoding these decisions at compile time, distributions enable the generation of highly optimized code that respects hardware constraints while maintaining algorithmic clarity. For a detailed exploration of distribution concepts, see :ref:`ck_tile_tile_distribution`.
|
||||
|
||||
**C++ Type**: ``tile_distribution<...>``
|
||||
|
||||
@@ -378,6 +378,6 @@ Related Documentation
|
||||
|
||||
- :ref:`ck_tile_introduction` - Introduction and motivation
|
||||
- :ref:`ck_tile_buffer_views` - Raw memory access
|
||||
- :ref:`ck_tile_distribution` - Core distribution concepts
|
||||
- :ref:`ck_tile_tile_distribution` - Core distribution concepts
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user