* Add TailHandler for V3, V4 and Mem pipelines
* Adapt examples and tests to use TailHandler
* move tail-handling logic to pipeline in persistent grouped gemm
* Fix Mem pipeline dispatching, add CompV4 dispatching
* Use a macro for handling the many tails of Mem pipeline
* Fix formatting again
* Use const-ref RunFunction, remove unnecessary try_run