* 50ms -> 28ms
* Fix bug in non fuse_add_store cases
* Fine tuned setting for 2 pass pipeline
* adjust workload
* remove unnecessary change
* add layernorm
* Adding output quant and unquant results at the same time.
* fix test
* fix format
* tune for cases 128x640 and 128x1024
* bug ifx
* Add shortcut to RMSNorm
* Modify test for adding shortcut for RMSNorm
* Add fused parameter into tests
* 1. Add YDataType. 2. rmsnorm2d_fwd_traits_ from rmsnorm2d_fwd.hpp to rmsnorm2d_fwd_api.cpp and rmsnorm2d_fwd_instance_common.hpp
* 1. Supports various stride and percisions.
* Add support of Epilogue
* Add fuse and epilogue support to rmsnorm ref
* Modify rmsnorm example
* Refactor tests/examples
* Bug fix for newly added tests/examples
* Bug fix for new tests 2
* Modify smoke test scripts
remove dbg code
* Supports non-smooth dyanmic quant
* Update Rmsnorm2dFwd::GetName()
* rename xscale and prec_sx to smoothscale and prec_sm
Bug fix after rename
Remove files
* change example_rmsnorm2d_fwd.cpp
* update performance calculator
* Fix issue in two-pass when fuse add is enabled
* Remove comment of beta
---------
Co-authored-by: rocking <ChunYu.Lai@amd.com>