blis/kernels/zen/1 at dev - blis

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-06-29 02:37:05 +00:00

Files

S, Hari Govind cdd181a7d7 Optimize complex scalv kernels with inline assembly and FMA instructions

This PR optimizes the complex scalar vector multiplication kernels by replacing
intrinsics with inline assembly and leveraging FMA (Fused Multiply-Add) instructions
for improved performance.

Changes:
- Replaced intrinsic-based implementation with inline assembly
- Utilizes `vfmaddsub231ps`, `vfmadd231ss`, and `vfmsub231ss` FMA instructions
- Improved instruction scheduling and register usage
- Handles both unit-stride (vectorized) and non-unit-stride (scalar) cases
- Processes up to 16 complex elements per iteration in the main loop

2026-04-13 16:06:02 +05:30

bli_addv_zen_int.c

Code cleanup: Copyright notices

2024-08-05 15:35:08 -04:00

bli_amaxv_zen_int.c

Ensure consistency across AVX2 and AVX512 AMAX kernels

2026-03-03 13:01:52 +05:30

bli_axpbyv_zen_int_10.c

Standardize Zen kernel names

2025-08-19 18:19:51 +01:00

bli_axpbyv_zen_int.c

Standardize Zen kernel names