1. Parallelized dtrsm_small across m-dimension or n-dimension based on side(Left/Right). 2. Fine-tuning with AOCL_DYNAMIC to achieve better performance. AMD-Internal: [CPUPL-2103] Change-Id: I6be6a2b579de7df9a3141e0d68bdf3e8a869a005