[CK] Grouped Convolution Global Load/Store instances
## Motivation
Support global load and store in grouped convolutions using instance
factory.
## Technical Details
- add new instances for each direction
- add new tests for large cases
## Test Plan
New test for large cases
## Test Result
pending
## Submission Checklist
- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
AICK-1255
Refactor and integrate CK GPU references into ckProfiler.
- All convolution layouts and groupings supported for all three directions
- Unit tests verifying GPU and CPU reference is the same
- Support added to profiler (do_verification = 2 enables GPU reference)
- One profiler-based test per direction changed to GPU reference to demonstrate usag
Closes AICK-427