mirror of
https://github.com/NVIDIA/cutlass.git
synced 2026-05-11 08:50:09 +00:00
Update README.md
This commit is contained in:
@@ -8,9 +8,9 @@ It incorporates the same strategies for data movement and hierarchical decomposi
|
|||||||
that are used to implement cuBLAS. CUTLASS decomposes these “moving parts” into
|
that are used to implement cuBLAS. CUTLASS decomposes these “moving parts” into
|
||||||
reusable, modular software components abstracted by C++ template classes. These
|
reusable, modular software components abstracted by C++ template classes. These
|
||||||
thread-wide, warp-wide, block-wide, and device-wide primitives can be specialized
|
thread-wide, warp-wide, block-wide, and device-wide primitives can be specialized
|
||||||
and tuned via custom tiling sizes, data types, and other algorithmic policy.
|
and tuned via custom tiling sizes, data types, and other algorithmic policy. The
|
||||||
The resulting flexibility simplifies their use as building blocks within custom
|
resulting flexibility simplifies their use as building blocks within custom kernels
|
||||||
kernels and applications.
|
and applications.
|
||||||
|
|
||||||
To support a wide variety of applications, CUTLASS provides extensive support for
|
To support a wide variety of applications, CUTLASS provides extensive support for
|
||||||
mixed-precision computations, providing specialized data-movement and
|
mixed-precision computations, providing specialized data-movement and
|
||||||
|
|||||||
Reference in New Issue
Block a user