* Fuse add+add+fused_rms
* Try this
* Macro to easily enable/disable fusion
* Various:
* Check that all tensors involved are on the same device before applying fusion
* Fuse sigmoid+scale+sum_rows+div
* Fix the fused bailingmoe2 experts selection
The issue there was that the bias was not per row, but per
expert group, so only the first n_per_group biases were used
for al experts.
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>