diff --git a/docs/Contributors_Guide.rst b/docs/Contributors_Guide.rst
index bd414c08d6..c5bdf28026 100644
--- a/docs/Contributors_Guide.rst
+++ b/docs/Contributors_Guide.rst
@@ -8,14 +8,14 @@
Contributing to Composable Kernel
********************************************************************
-Review the `Composable Kernel documentation `_ before contributing to the Composable Kernel project. This documentation provides information about core concepts and configurations, as well as providing :doc:`steps for building Composable Kernel `. Some of this information is also available in the `Composable Kernel README `_.
+Review the `Composable Kernel documentation `_ before contributing to the Composable Kernel project. This documentation provides information about core concepts and configurations, as well as providing :doc:`steps for building Composable Kernel `. Some of this information is also available in the `Composable Kernel README `_.
Consult the `AMD Developer Central portal `_ for more information about AMD products.
Reporting issues
=================
-Use `Github issues `_ to log and track issues and enhancement requests.
+Use `Github issues `_ to log and track issues and enhancement requests.
If you encounter an issue with the Composable Kernel library, search the existing GitHub issues to determine whether the problem has already been
reported. If it hasn't, submit a new issue that includes:
@@ -52,11 +52,11 @@ All external contributors to the Composable Kernel codebase must follow these gu
* Ensure a manageable pull request size: Pull requests should be limited to approximately one thousand lines. If your changes significantly exceed one thousand lines, break them into smaller pull requests that can be reviewed independently.
-* Use pre-commit hooks to adhere to the coding style: Composable Kernel's coding style is defined in `.clang-format `_. Use the provided pre-commit hooks to run clang formatting and linting. Instructions on installing pre-commit hooks are available in the `README file `_.
+* Use pre-commit hooks to adhere to the coding style: Composable Kernel's coding style is defined in `.clang-format `_. Use the provided pre-commit hooks to run clang formatting and linting. Instructions on installing pre-commit hooks are available in the `README file `_.
Forks require an approver from AMD to trigger continuous integration (CI) testing. This approval process is necessary for security and resource management.
Depending on the complexity of your changes, an AMD developer might need to pull your changes and perform additional fixes or modifications before merging. This collaborative approach ensures compatibility with internal systems and standards.
-You can see a complete list of pull requests on the `Composable Kernel GitHub page `_.
+You can see a complete list of pull requests on the `Composable Kernel GitHub page `_.
diff --git a/docs/conceptual/CK-Tile-intra-inter-wave.rst b/docs/conceptual/CK-Tile-intra-inter-wave.rst
index 1d9e634c0f..a92fcf4399 100644
--- a/docs/conceptual/CK-Tile-intra-inter-wave.rst
+++ b/docs/conceptual/CK-Tile-intra-inter-wave.rst
@@ -20,7 +20,7 @@ In intrawave scheduling, the full K dimension is loaded into each wave. Each wav
Because the CU has flexibility in scheduling operations, intrawave scheduling is best for compute-bound workloads.
-An example of both interwave and intrawave scheduling can be found in |gemm_utils.hpp|_, which is part of the `GEMM with CK Tile example `_.
+An example of both interwave and intrawave scheduling can be found in |gemm_utils.hpp|_, which is part of the `GEMM with CK Tile example `_.
.. |gemm_utils.hpp| replace:: ``gemm_utils.hpp``
-.. _gemm_utils.hpp: https://github.com/ROCm/composable_kernel/blob/develop/example/ck_tile/03_gemm/gemm_utils.hpp#L37
\ No newline at end of file
+.. _gemm_utils.hpp: https://github.com/ROCm/rocm-libraries/tree/develop/projects/composablekernel/example/ck_tile/03_gemm/gemm_utils.hpp#L37
\ No newline at end of file
diff --git a/docs/conceptual/ck_tile/buffer_views.rst b/docs/conceptual/ck_tile/buffer_views.rst
index 600aaed96f..ca574724ab 100644
--- a/docs/conceptual/ck_tile/buffer_views.rst
+++ b/docs/conceptual/ck_tile/buffer_views.rst
@@ -9,7 +9,7 @@ Overview
At the foundation of the CK Tile system lies BufferView, a compile-time abstraction that provides structured access to raw memory regions within GPU kernels. This serves as the bridge between the hardware's physical memory model and the higher-level abstractions that enable efficient GPU programming. BufferView encapsulates the complexity of GPU memory hierarchies while exposing a unified interface that works seamlessly across different memory address spaces including global memory shared across the entire device, local data share (LDS) memory shared within a workgroup, or the ultra-fast register files private to each thread.
-BufferView serves as the foundation for :ref:`ck_tile_tensor_views`, which add multi-dimensional structure on top of raw memory access. Understanding BufferView is essential before moving on to more complex abstractions like :ref:`ck_tile_distribution` and :ref:`ck_tile_tile_window`.
+BufferView serves as the foundation for :ref:`ck_tile_tensor_views`, which add multi-dimensional structure on top of raw memory access. Understanding BufferView is essential before moving on to more complex abstractions like :ref:`ck_tile_tile_distribution` and :ref:`ck_tile_tile_window`.
By providing compile-time knowledge of buffer properties through template metaprogramming, BufferView enables the compiler to generate optimal machine code for each specific use case. This zero-overhead abstraction ensures that the convenience of a high-level interface comes with no runtime performance penalty.
@@ -456,7 +456,7 @@ Address spaces are encoded in types so that common errors are reported at compil
BufferView supports configurable handling of invalid values, optional runtime bounds checks, and conditional access patterns. It also provides atomic operations for thread-safe updates. These features are intended to cover common edge cases without adding unnecessary overhead.
-By hiding the complexity of different memory spaces while exposing the operations needed for high-performance GPU computing, BufferView establishes a pattern that the rest of CK Tile follows: compile-time abstractions that enhance rather than compromise performance. The :ref:`ck_tile_tensor_views` and :ref:`ck_tile_distribution` add capability while maintaining the efficiency established at the base. For hardware-specific details about memory hierarchies, see :ref:`ck_tile_gpu_basics`.
+By hiding the complexity of different memory spaces while exposing the operations needed for high-performance GPU computing, BufferView establishes a pattern that the rest of CK Tile follows: compile-time abstractions that enhance rather than compromise performance. The :ref:`ck_tile_tensor_views` and :ref:`ck_tile_tile_distribution` add capability while maintaining the efficiency established at the base. For hardware-specific details about memory hierarchies, see :ref:`ck_tile_gpu_basics`.
Next Steps
----------
diff --git a/docs/conceptual/ck_tile/convolution_example.rst b/docs/conceptual/ck_tile/convolution_example.rst
index c2fe62bb22..a857f9ae9e 100644
--- a/docs/conceptual/ck_tile/convolution_example.rst
+++ b/docs/conceptual/ck_tile/convolution_example.rst
@@ -550,7 +550,7 @@ This example demonstrates how CK Tile transforms convolution from a memory-bound
- **Sliding windows** can be efficiently represented using tensor descriptors with appropriate strides
- **Im2col transformation** converts convolution to matrix multiplication without data copies
-- **Tile distribution** enables optimal work distribution across GPU threads (see :ref:`ck_tile_distribution`)
+- **Tile distribution** enables optimal work distribution across GPU threads (see :ref:`ck_tile_tile_distribution`)
- **Multi-channel support** extends naturally through higher-dimensional descriptors
- **Performance optimizations** like vectorization and shared memory are seamlessly integrated (see :ref:`ck_tile_gemm_optimization` for similar techniques)
diff --git a/docs/conceptual/ck_tile/coordinate_movement.rst b/docs/conceptual/ck_tile/coordinate_movement.rst
index 78d864bf75..73633afa88 100644
--- a/docs/conceptual/ck_tile/coordinate_movement.rst
+++ b/docs/conceptual/ck_tile/coordinate_movement.rst
@@ -317,7 +317,7 @@ Movement Through Adaptors
Advanced Movement Patterns
==========================
-Real-world applications use advanced movement patterns for optimal memory access. These patterns often relate to :ref:`ck_tile_tile_window` operations and :ref:`ck_tile_distribution` concepts:
+Real-world applications use advanced movement patterns for optimal memory access. These patterns often relate to :ref:`ck_tile_tile_window` operations and :ref:`ck_tile_tile_distribution` concepts:
Tiled Access Pattern
--------------------
diff --git a/docs/conceptual/ck_tile/index.rst b/docs/conceptual/ck_tile/index.rst
index 287143d6de..072c900560 100644
--- a/docs/conceptual/ck_tile/index.rst
+++ b/docs/conceptual/ck_tile/index.rst
@@ -45,7 +45,7 @@ Learning Path
How to work with multi-dimensional data structures and memory layouts.
-4. **Core API**: :ref:`ck_tile_distribution`
+4. **Core API**: :ref:`ck_tile_tile_distribution`
The tile distribution system that maps work to GPU threads.
@@ -105,4 +105,4 @@ Next Steps
To dive deeper, start with :ref:`ck_tile_introduction` to understand the motivation and core concepts behind CK Tile.
-For practical examples, see the `example/ck_tile `_ directory in the Composable Kernel repository.
+For practical examples, see the `example/ck_tile `_ directory in the Composable Kernel repository.
diff --git a/docs/conceptual/ck_tile/introduction_motivation.rst b/docs/conceptual/ck_tile/introduction_motivation.rst
index e6f2112311..a939aef4c8 100644
--- a/docs/conceptual/ck_tile/introduction_motivation.rst
+++ b/docs/conceptual/ck_tile/introduction_motivation.rst
@@ -276,7 +276,7 @@ The foundation of the exploration begins with raw memory access through :ref:`ck
With these foundational concepts established, the documentation delves into the :ref:`ck_tile_coordinate_systems` that powers tile distribution. This engine implements the mathematical framework that have been introduced, providing compile-time transformations between P-space, Y-space, X-space, and D-space. Understanding these transformations at a deep level enables developers to reason about performance implications and design custom distribution strategies for novel algorithms. The :ref:`ck_tile_transforms` and :ref:`ck_tile_adaptors` provide the building blocks for these transformations.
-The high-level :ref:`ck_tile_distribution` APIs represent the culmination of these lower-level abstractions. These APIs provide an accessible interface for common patterns while exposing enough flexibility for advanced optimizations. Through concrete examples and detailed explanations, the documentation will demonstrate how to leverage these APIs to achieve near-optimal performance across a variety of computational patterns. The :ref:`ck_tile_tile_window` abstraction provides the gateway for efficient data access.
+The high-level :ref:`ck_tile_tile_distribution` APIs represent the culmination of these lower-level abstractions. These APIs provide an accessible interface for common patterns while exposing enough flexibility for advanced optimizations. Through concrete examples and detailed explanations, the documentation will demonstrate how to leverage these APIs to achieve near-optimal performance across a variety of computational patterns. The :ref:`ck_tile_tile_window` abstraction provides the gateway for efficient data access.
The exploration of coordinate systems goes beyond the basic P, Y, X, D framework to encompass advanced topics such as multi-level tiling, replication strategies, and specialized coordinate systems for specific algorithm classes. The :ref:`ck_tile_encoding_internals` reveals the mathematical foundations, while :ref:`ck_tile_thread_mapping` shows how these abstractions map to hardware. This comprehensive treatment ensures that developers can handle not just common cases but also novel algorithms that require custom distribution strategies.
diff --git a/docs/conceptual/ck_tile/terminology.rst b/docs/conceptual/ck_tile/terminology.rst
index 7d5fc87fe9..77b20a6c8b 100644
--- a/docs/conceptual/ck_tile/terminology.rst
+++ b/docs/conceptual/ck_tile/terminology.rst
@@ -23,7 +23,7 @@ The concept of a tile represents the fundamental unit of data organization in th
Distribution
~~~~~~~~~~~~
-The distribution pattern represents one of the most compile-time abstractions in the CK framework, defining the precise mapping between logical data elements and the physical processing resources that will operate on them. A distribution is far more than an assignment scheme—it embodies a strategy for achieving optimal performance on GPU hardware. The distribution determines which threads access which data elements, how those accesses are ordered to maximize memory bandwidth, and how intermediate results are shared between cooperating threads. By encoding these decisions at compile time, distributions enable the generation of highly optimized code that respects hardware constraints while maintaining algorithmic clarity. For a detailed exploration of distribution concepts, see :ref:`ck_tile_distribution`.
+The distribution pattern represents one of the most compile-time abstractions in the CK framework, defining the precise mapping between logical data elements and the physical processing resources that will operate on them. A distribution is far more than an assignment scheme—it embodies a strategy for achieving optimal performance on GPU hardware. The distribution determines which threads access which data elements, how those accesses are ordered to maximize memory bandwidth, and how intermediate results are shared between cooperating threads. By encoding these decisions at compile time, distributions enable the generation of highly optimized code that respects hardware constraints while maintaining algorithmic clarity. For a detailed exploration of distribution concepts, see :ref:`ck_tile_tile_distribution`.
**C++ Type**: ``tile_distribution<...>``
@@ -378,6 +378,6 @@ Related Documentation
- :ref:`ck_tile_introduction` - Introduction and motivation
- :ref:`ck_tile_buffer_views` - Raw memory access
-- :ref:`ck_tile_distribution` - Core distribution concepts
+- :ref:`ck_tile_tile_distribution` - Core distribution concepts
diff --git a/docs/index.rst b/docs/index.rst
index 6744318e51..5eae912494 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -10,7 +10,7 @@ Composable Kernel User Guide
The Composable Kernel library provides a programming model for writing performance critical kernels for machine learning workloads across multiple architectures including GPUs and CPUs, through general purpose kernel languages such as `HIP C++ `_.
-The Composable Kernel repository is located at `https://github.com/ROCm/composable_kernel `_.
+The Composable Kernel project is located in https://github.com/ROCm/rocm-libraries/tree/develop/projects/composablekernel.
.. grid:: 2
:gutter: 3
diff --git a/docs/install/Composable-Kernel-install.rst b/docs/install/Composable-Kernel-install.rst
index 61b1fe0fcb..243f607b35 100644
--- a/docs/install/Composable-Kernel-install.rst
+++ b/docs/install/Composable-Kernel-install.rst
@@ -6,12 +6,28 @@
Building and installing Composable Kernel with CMake
******************************************************
-Before you begin, clone the `Composable Kernel GitHub repository `_ and create a ``build`` directory in its root:
+Before you begin, clone the `Composable Kernel project `_.
+
+Use sparse checkout when cloning the Composable Kernel project:
+
+.. code::
+
+ git clone --no-checkout --filter=blob:none https://github.com/ROCm/rocm-libraries.git
+ cd rocm-libraries
+ git sparse-checkout init --cone
+ git sparse-checkout set projects/composablekernel
+
+Then use ``git checkout`` to check out the branch you need.
+
+The develop branch is intended for users who want to preview new features or contribute to the Composable Kernel codebase.
+
+If you don't intend to contribute to the codebase and won't be previewing features, use a branch that matches the version of ROCm installed on your system.
+
+Create the ``build`` directory under ``rocm-libraries/projects/composablekernel``:
.. code:: shell
- git clone https://github.com/ROCm/composable_kernel.git
- cd composable_kernel
+ cd projects/composablekernel
mkdir build
Change directory to the ``build`` directory and generate the makefile using the ``cmake`` command. Two build options are required:
@@ -19,7 +35,6 @@ Change directory to the ``build`` directory and generate the makefile using the
* ``CMAKE_PREFIX_PATH``: The ROCm installation path. ROCm is installed in ``/opt/rocm`` by default.
* ``CMAKE_CXX_COMPILER``: The path to the Clang compiler. Clang is found at ``/opt/rocm/llvm/bin/clang++`` by default.
-
.. code:: shell
cd build
@@ -65,8 +80,9 @@ After running ``make install``, the Composable Kernel files will be saved to the
* Header files: ``/opt/rocm/include/ck/`` and ``/opt/rocm/include/ck_tile/``
* Examples, tests, and ckProfiler: ``/opt/rocm/bin/``
-For information about ckProfiler, see `the ckProfiler readme file `_.
+For information about ckProfiler, see `the ckProfiler readme file `_.
For information about running the examples and tests, see :doc:`Composable Kernel examples and tests <../tutorial/Composable-Kernel-examples>`.
+
diff --git a/docs/reference/Composable-Kernel-wrapper.rst b/docs/reference/Composable-Kernel-wrapper.rst
index 67ed977245..9363a60a5b 100644
--- a/docs/reference/Composable-Kernel-wrapper.rst
+++ b/docs/reference/Composable-Kernel-wrapper.rst
@@ -47,10 +47,10 @@ Output::
Tutorials:
-* `GEMM tutorial `_
+* `GEMM tutorial `_
Advanced examples:
-* `Image to column `_
-* `Basic gemm `_
-* `Optimized gemm `_
+* `Image to column `_
+* `Basic gemm `_
+* `Optimized gemm `_
diff --git a/docs/tutorial/Composable-Kernel-examples.rst b/docs/tutorial/Composable-Kernel-examples.rst
index 62422d6f15..82e6318753 100644
--- a/docs/tutorial/Composable-Kernel-examples.rst
+++ b/docs/tutorial/Composable-Kernel-examples.rst
@@ -31,10 +31,10 @@ To run regression tests, use ``make regression``.
In general, tests that run for under thirty seconds are included in the smoke tests and tests that run for over thirty seconds are included in the regression tests.
.. |example| replace:: ``example``
-.. _example: https://github.com/ROCm/composable_kernel/tree/develop/example
+.. _example: https://github.com/ROCm/rocm-libraries/tree/develop/projects/composablekernel/example
.. |client_example| replace:: ``client_example``
-.. _client_example: https://github.com/ROCm/composable_kernel/tree/develop/client_example
+.. _client_example: https://github.com/ROCm/rocm-libraries/tree/develop/projects/composablekernel/client_example
.. |test| replace:: ``test``
-.. _test: https://github.com/ROCm/composable_kernel/tree/develop/test
\ No newline at end of file
+.. _test: https://github.com/ROCm/rocm-libraries/tree/develop/projects/composablekernel/test
\ No newline at end of file