* feat(grouped_gemm_multi_d): add new example that integrates grouped_gemm and multi_d_gemm feature
* refactor: grouped_gemm_multi_d relies on grouped_gemm_kernel
* tests(grouped_gemm): grouped_gemm test suite passes with minor adjustments
* fix: segfault fix by passing correct parameters for d tensors
* style: clang format
* WIP: host code for grouped_gemm_multi_d persistent kernel compiles but segfaults
* feat(grouped_gemm_multi_d): add functionality to run persistant kernel
* feat(grouped_gemm_multi_d): add new example that integrates grouped_gemm and multi_d_gemm feature
* refactor: grouped_gemm_multi_d relies on grouped_gemm_kernel
* tests(grouped_gemm): grouped_gemm test suite passes with minor adjustments
* fix: segfault fix by passing correct parameters for d tensors
* style: clang format
* fix: incorrect validation method and Dtensor layout in test suite
* docs: improved README text based on review comments
* fix: parameterize NumDTensor in GroupedGemmHostArgs and remove lint