Commit Graph

4 Commits

Author SHA1 Message Date
dummycoderfe
561c221342 [Ck tile] layernorm2d fwd optimize (#1637)
* optimze small N case using vec io and using rcp div

* [Ck_tile] layernorm, add param to control fastdiv; change generate codes and test pass

* [Ck_tile] fix blockSize compute in Generic2dBlockShape

* [Ck_tile]fix kfastfdiv template style

* [Ck_tile] layernorm, fix stype in review

---------

Co-authored-by: dummycoderfe <noplydummmycoder@163.com>

[ROCm/composable_kernel commit: 686a58a912]
2024-11-08 12:28:23 +08:00
Juan Manuel Martinez Caamaño
6e74da9b87 [generate.py] Override blob list if it already exists (#1635)
Before, generate.py appended the list at the end of the output file.
When running the cmake configuration steps multiple times on the
examples, the blob list (such as fwd_blob_list.txt) would grow at every
configuration.
`library/src/tensor_operation_instance/gpu/mha/CMakeLists.txt` worked around
this issue by removing the output file if it exists.

Now, generate.py overrides the content of the output file.
There is no need for the workaround in the CMakeLists.txt;
and the issue is solved for the example projects too.

[ROCm/composable_kernel commit: 464abd235e]
2024-11-05 10:09:52 +01:00
carlushuang
537ff25c21 [CK_TILE] layernorm have more accurate residual (#1623)
* more accurate residual

* modify comment

* Fix literal case in README.md

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

[ROCm/composable_kernel commit: cb6c5d39dc]
2024-11-02 13:30:16 +08:00
carlushuang
776c87ea7e [CK_TILE] layernorm support fused-quant/fused-add (#1604)
* add prenorm/postnorm support, refactor using generate.py

* update README

* update README

* fix format

* update some description and fix format

* update format

* format

* use non-raw for loading

* format and update n4096

* dynamic-quant ready

* update readme

* support fused dynamic-quant

* update fused-quant, with smooth

* update README

* update args

* update some based on comment

[ROCm/composable_kernel commit: c3a4800c5f]
2024-10-31 14:54:53 +08:00