* Merge Q and K into a single tensor * Make V mul mat follow QK mul mat so they can be fused, which gives a slightly bbetter TG performance. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>