CUTLASS 3.3.0 (#1167)

* Release 3.3.0

Adds support for mixed precision GEMMs On Hopper and Ampere
Adds support for < 16B aligned GEMMs on Hopper
Enhancements to EVT
Enhancements to Python interface
Enhancements to Sub-byte type handling in CuTe
Several other bug-fixes and performance improvements.

* minor doc update
This commit is contained in:
Pradeep Ramani
2023-11-02 08:09:05 -07:00
committed by GitHub
parent 922fb5108b
commit c008b4aea8
263 changed files with 16214 additions and 5008 deletions

View File

@@ -67,14 +67,13 @@ The CUTLASS Python interface currently supports the following operations:
* Grouped GEMM (for pre-SM90 kernels)
### Getting started
We recommend using the CUTLASS Python interface via one of the Docker images located in the [docker](/python/docker) directory.
We recommend using the CUTLASS Python interface via an [NGC PyTorch Docker container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch):
```bash
docker build -t cutlass-cuda12.1:latest -f docker/Dockerfile-cuda12.1-pytorch .
docker run --gpus all -it --rm cutlass-cuda12.1:latest
docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:23.08-py3
```
The CUTLASS Python interface has been tested with CUDA 11.8, 12.0, and 12.1 on Python 3.8.10 and 3.9.7.
The CUTLASS Python interface has been tested with CUDA 11.8, 12.0, and 12.1 on Python 3.8 and 3.9.
#### Optional environment variables
Prior to installing the CUTLASS Python interface, one may optionally set the following environment variables:
@@ -82,19 +81,21 @@ Prior to installing the CUTLASS Python interface, one may optionally set the fol
* `CUDA_INSTALL_PATH`: the path to the installation of CUDA
If these environment variables are not set, the installation process will infer them to be the following:
* `CUTLASS_PATH`: one directory level above the current directory (i.e., `$(pwd)/..`)
* `CUTLASS_PATH`: either one directory level above the current directory (i.e., `$(pwd)/..`) if installed locally or in the `source` directory of the location in which `cutlass_library` was installed
* `CUDA_INSTALL_PATH`: the directory holding `/bin/nvcc` for the first version of `nvcc` on `$PATH` (i.e., `which nvcc | awk -F'/bin/nvcc' '{print $1}'`)
**NOTE:** The version of `cuda-python` installed must match the CUDA version in `CUDA_INSTALL_PATH`.
#### Installation
The CUTLASS Python interface can currently be installed via:
The CUTLASS Python interface can currently be installed by navigating to the root of the CUTLASS directory and performing
```bash
python setup.py develop --user
pip install .
```
This will allow changes to the Python interface source to be reflected when using the Python interface.
We plan to add support for installing via `python setup.py install` in a future release.
If you would like to be able to make changes to CULASS Python interface and have them reflected when using the interface, perform:
```bash
pip install -e .
```
### Examples
Jupyter notebook examples of using the CUTLASS Python interface are located in [examples/python](/examples/python).
@@ -135,10 +136,7 @@ python setup_library.py develop --user
Alternatively, `cutlass_library` will automatically be installed if you install the CUTLASS Python interface package.
You can also use the [generator.py](/python/cutlass_library/generator.py) script directly without installing the module via:
```bash
python -m cutlass_library.generator
```
You can also use the [generator.py](/python/cutlass_library/generator.py) script directly without installing the module.
# Copyright