cutlass

mirror of https://github.com/NVIDIA/cutlass.git synced 2026-07-14 19:17:06 +00:00

Files

Gregory Meyer (gregjm) ecbd24566c Enable shared memory intrinsics and ldmatrix PTX on Clang. (#754 )

* Enable shared memory intrinsics and ldmatrix PTX on Clang.

This commit adds preprocessor checks to enable the shared memory
intrinsics `__cvta_generic_to_shared` and `__nvvm_get_smem_pointer`, as
well as the `ldmatrix` PTX instructions, on Clang. Preventing these
intrinsics from being used is a significant latency regression on Clang.

* refine the macro

---------

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>

2023-03-31 21:42:24 -04:00

cute

Enable shared memory intrinsics and ldmatrix PTX on Clang. (#754 )

2023-03-31 21:42:24 -04:00

cutlass

Enable shared memory intrinsics and ldmatrix PTX on Clang. (#754 )

2023-03-31 21:42:24 -04:00