Interchanged some loops to favour column-major storage. Added check condiion to identify last column and load it using a 'for' loop to avoid memory accesses out of buffer Change-Id: Id5d2e16c65017a7f4b641d33228d23903efd09ac
b426f9e