mirror of
https://github.com/amd/blis.git
synced 2026-05-12 01:59:59 +00:00
Details: - Reorganized logic of bli_thread_partition_2x2() so that the primary guts were factored out into "fast" and "slow" variants. Then added logic to the "fast" variant that allows for more optimal thread factorizations in some situations where there is at least one factor of 2. - Changed BLIS_THREAD_RATIO_M from 2 to 1 in bli_kernel_macro_defs.h and added comments to that file describing BLIS_THREAD_RATIO_? and BLIS_THREAD_MAX_?R. - In bli_family_zen.h and bli_family_zen2.h, preprocessed out several macros not used in vanilla BLIS and removed the unused macro BLIS_ENABLE_ZEN_BLOCK_SIZES from the former file. - Disabled AMD's small matrix handling entry points in bli_syrk_front.c and bli_trsm_front.c. (These branches of small matrix handling have not been reviewed by vanilla BLIS developers.) - Added commented-out calls printf() to bli_rntm.c. - Whitespace changes to bli_thread.c.