* add ck_tile/smoothquant out stride parameter
* Remove the default stride value
---------
Co-authored-by: so <a.com>
[ROCm/composable_kernel commit: 4e73177684]
* Fix cmake example build
* Support max3 in smoothquant one pass
* support max3 in two pass
* support max3 in add_rmsnorm_rdquant
[ROCm/composable_kernel commit: abae2afc72]