custom_flashinfer

mirror of https://github.com/kvcache-ai/custom_flashinfer.git synced 2026-06-29 10:47:12 +00:00

Files

Zihao Ye 61e049a02e perf: Fix python API overhead when CUDAGraph is not enabled (#969 )

This PR fixes issue #960 , we identifies several performance bottlenecks
for our python APIs when kernels are not captured by CUDAGraph:
1. The device guard in Python is slow (`with input.device as device:`)
2. Get current cuda stream in Python is time-consuming.

These issues were introduced in JIT refactor after v0.1.6 (mainly for
accelerating JIT compilation speed). In this PR, we changed back to get
stream and device guard in C++).

@MichoChan @xiaoqi35

2025-03-23 21:19:35 -07:00

ci-flashinfer.env.example

[CI] Setup github actions for release wheel (#91 )

2024-01-28 19:54:33 -08:00

ci-flashinfer.service

[CI] Setup github actions for release wheel (#91 )

2024-01-28 19:54:33 -08:00

formatter.sh

refactor: modernize packaging (#643 )

2024-12-09 13:42:56 -08:00

run-ci-build-wheel.sh

set pip path (#834 )