mirror of
https://github.com/NVIDIA/cutlass.git
synced 2026-05-14 02:02:25 +00:00
Introduces cutlass::epilogue::thread::Snake, a two-operand activation functor implementing Snake_a(x) = x + (1/a) * sin^2(a*x) from Ziyin et al. 2020 (arXiv:2006.08195). The per-channel learnable frequency `a` flows through an EVT child (e.g. Sm90RowBroadcast), composing into Sm90EVT<Sm90Compute<Snake, ...>, x_node, alpha_node> for fused GEMM+Snake epilogues used in neural vocoders. Adds unit tests in test/unit/epilogue/thread/activation.cu covering f32 and bf16 paths, validated against float64 reference goldens. Closes #3141