Commit Graph

4 Commits

Author SHA1 Message Date
dummycoderfe
7ba8518112 [Ck tile] layernorm2d fwd optimize (#1637)
* optimze small N case using vec io and using rcp div

* [Ck_tile] layernorm, add param to control fastdiv; change generate codes and test pass

* [Ck_tile] fix blockSize compute in Generic2dBlockShape

* [Ck_tile]fix kfastfdiv template style

* [Ck_tile] layernorm, fix stype in review

---------

Co-authored-by: dummycoderfe <noplydummmycoder@163.com>

[ROCm/composable_kernel commit: 686a58a912]
2024-11-08 12:28:23 +08:00
Juan Manuel Martinez Caamaño
4bb95f18ed [generate.py] Override blob list if it already exists (#1635)
Before, generate.py appended the list at the end of the output file.
When running the cmake configuration steps multiple times on the
examples, the blob list (such as fwd_blob_list.txt) would grow at every
configuration.
`library/src/tensor_operation_instance/gpu/mha/CMakeLists.txt` worked around
this issue by removing the output file if it exists.

Now, generate.py overrides the content of the output file.
There is no need for the workaround in the CMakeLists.txt;
and the issue is solved for the example projects too.

[ROCm/composable_kernel commit: 464abd235e]
2024-11-05 10:09:52 +01:00
carlushuang
232d1462a1 [CK_TILE] layernorm have more accurate residual (#1623)
* more accurate residual

* modify comment

* Fix literal case in README.md

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

[ROCm/composable_kernel commit: cb6c5d39dc]
2024-11-02 13:30:16 +08:00
carlushuang
38d6f8a8e2 [CK_TILE] layernorm support fused-quant/fused-add (#1604)
* add prenorm/postnorm support, refactor using generate.py

* update README

* update README

* fix format

* update some description and fix format

* update format

* format

* use non-raw for loading

* format and update n4096

* dynamic-quant ready

* update readme

* support fused dynamic-quant

* update fused-quant, with smooth

* update README

* update args

* update some based on comment

[ROCm/composable_kernel commit: c3a4800c5f]
2024-10-31 14:54:53 +08:00