[CK][CK TILE] Improve oob check
## Motivation
Improve OOB checks. Remove permutes which have been generated by thread
buffer zero clear. at now in assembly there is only condmask instead of
permute + condmask.
Change number of KPack for generated instances
## Technical Details
Remove permute instructions from assembly
## Test Plan
test_grouped_convnd_fwd_tile
## Test Result
passed
## Submission Checklist
- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.