ikawrakow
Based on PR 13095 in mainline. Did not measure, but had the impression that CUDA compile time is reduced.