mirror of
https://github.com/amd/blis.git
synced 2026-05-11 01:30:00 +00:00
Details: - Redefined bli_is_last_iter() to take thread_id and num_thread arguments, which allows the macro to correctly compute whether a given iteration is the last that the thread will compute in that particular loop. The new definition, however, remains disabled (commented out) until someone can look at this more closely, as the new definition seems to actually hurt performance slightly. - Whitespace and related updates to level-3 macro-kernels. - Updated test suite so that performance results in the hundreds of gigaflops does not disrupt the column alignment of the output.