* feat(grouped_gemm_multi_d): add new example that integrates grouped_gemm and multi_d_gemm feature
* feat: generalized grouped_gemm_kernel.hpp
* feat: generalized grouped_gemm_kernel.hpp even further by removing hardcoded 0
* refactor: grouped_gemm_multi_d relies on grouped_gemm_kernel
* tests(grouped_gemm): grouped_gemm test suite passes with minor adjustments
* fix: segfault fix by passing correct parameters for d tensors
* docs: add multi d info and trim down outdated content
* tests: add unit tests for grouped_gemm_multi_d and minor changes in grouped_gemm related test for compatibility
* style: clang format
* fix: incorrect validation method and Dtensor layout in test suite
* test(grouped_gemm): add gtests for the example to maintain its integrity
* test(grouped_gemm_preshuffle): add prefill variant to testbed to cover wider range
* fix: removed residue code to make b_shuffle() work again
* test(grouped_gemm_preshuffle): limit the test suite to gfx942 arch as it fails on gfx90a
* build: add gfx950 as build target for gtests
* test(grouped_gemm_preshuffle): temporarily disable fp8 prec tests due to numerical errors
* fix(grouped_gemm_preshuffle): resolved fp8 tests failure on gfx950 by adding correct compiler flag