mirror of
https://github.com/NVIDIA/cutlass.git
synced 2026-05-13 01:35:45 +00:00
Before this fix, combining two Boolean (i1) DSL values with Python `and` triggered a verbose i1→i32→i1 round-trip in __dsl_and__: arith.extui (×3), arith.select, arith.cmpi ne (×2) — 6 extra MLIR ops. Add a fast path: when both operands are Boolean, delegate directly to __and__, emitting a single arith.andi %a, %b : i1 — identical to `&`. Both operators were already semantically equivalent; this fix makes the generated MLIR identical as well. Includes: - repro_dsl_and_bool.py — minimal standalone reproducer / bug-report script - test_dsl_and_fix.py — pytest tests verifying the fixed behaviour