| kernel | gload | gstore | LDS write | LDS read | conflicts | LDS insts | time (us) | BW (TB/s) | rules |
|---|---|---|---|---|---|---|---|---|---|
| 01 row-major | bx4 | store_short x8 | ds_write_b128 | ds_read_u16 x8 | 3,670,016 | 294,912 | 14.08 | 4.77 | R1 X R2 X |
| 02 col-major | bx4 | store_short x8 | ds_write_u16 x8 | ds_read_b128 | 1,572,864 | 294,912 | 12.25 | 5.48 | R1 X on write |
| 03 padding | bx4 | store_short x8 | ds_write_b128 | ds_read_u16 x8 | 786,432 | 327,680 | 11.93 | 5.62 | pad helps write, read partial |
| 05 XOR+pad | bx4 | store_short x8 | ds_write_b128 | ds_read_u16 x8 | 0 | 294,912 | 12.42 | 5.4 | R1 OK via XOR+pad |
| 08 XOR clean | bx4 | store_short x8 | ds_write_b128 | ds_read_b128 | 0 | 65,536 | 11.88 | 5.65 | R1 R2 R3 OK |
| 10 M-vec store | bx4 | store_dwordx4 | ds_write_b128 | ds_read_u16 x8 | 0 | 294,912 | 12.69 | 5.29 | R1 OK, wider gstore |



