blzheng
|
0dcfae5553
|
[CPU] Add gemma4_rmsnorm_cpu kernel (#22842)
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-04-17 13:03:16 +08:00 |
|
Ma Mingfei
|
af62bd9486
|
[CPU] Implement MXFP4 Gemm kernels for intel AMX to support GPT OSS series. (#14385)
|
2026-03-29 23:44:12 -07:00 |
|
blzheng
|
ed01e1d5d6
|
[CPU] add kernel apply_rotary_pos_emb_cpu for Qwen3-VL and Qwen3-Omni (#13121)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-03-29 23:43:46 -07:00 |
|
Cao E
|
274581fb77
|
Add support for more batch sizes in cpu_graph_runner (#13881)
|
2026-03-19 09:50:56 -07:00 |
|
blzheng
|
cd22aa27a9
|
[CPU] Add FP8 Bmm support (#9744)
Co-authored-by: Fan Yin <1106310035@qq.com>
|
2026-03-18 22:19:48 -07:00 |
|
Ma Mingfei
|
687d9eb66f
|
[CPU] Optimize image preprocessor performance for Qwen2VLImageProcessorFast (#15168)
|
2026-03-18 22:18:15 -07:00 |
|
Ma Mingfei
|
62d7454976
|
optimize conv3d used in patch embedding (#16040)
|
2026-03-18 22:17:53 -07:00 |
|
Zaili Wang
|
2f4babe32b
|
[CPU] support LayerNorm with 3D shape (#15075)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-03-18 22:15:24 -07:00 |
|
blzheng
|
dc6aa26ce9
|
[CPU] Add mrope kernel for Qwen3-vl (#12531)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-03-18 22:12:48 -07:00 |
|
Cao E
|
4f0f6cd9d0
|
Add torch.compile support for qwen3-next on CPU (#12444)
|
2026-02-26 23:28:03 -08:00 |
|
jianan-gu
|
c35aa0238c
|
[CPU][INT4] Add INT4 kernels for CPU (#8226)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-29 22:30:13 -08:00 |
|
Ma Mingfei
|
88f7759402
|
[CPU] optimize flash_attn_varlen_func (#15708)
|
2026-01-29 22:07:05 -08:00 |
|
blzheng
|
d16ff357db
|
[CPU] Add Gemma3RMSNorm kernel in sgl-kernel and add ut (#9324)
|
2025-12-15 00:24:02 -08:00 |
|
Zaili Wang
|
d6bd2d1126
|
[CPU] layernorm & fused add-layernorm kernels (#14074)
|
2025-12-11 16:58:23 -08:00 |
|
blzheng
|
d257bf87b9
|
[CPU] add mamba fla kernels for Qwen3-next (#12324)
|
2025-12-06 14:16:23 +08:00 |
|
jianan-gu
|
70d2587324
|
[CPU] Optimize small oc GEMM for Qwen3-next on CPU (#12446)
Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com>
|
2025-12-04 00:38:47 -08:00 |
|
Ma Mingfei
|
f90b400431
|
[CPU] add support for mamba causal conv1d for qwen3-next (#12309)
|
2025-12-04 13:41:42 +08:00 |
|
blzheng
|
974c562a25
|
[CPU] add fused_qkvzba_split_reshape_cat kernel for Qwen3-next (#12330)
|
2025-12-03 23:46:08 +08:00 |
|
Xuan Liao
|
c233e9d7a9
|
[CPU] Support chunk_gated_delta_rule kernel for Qwen3-Next (#12441)
|
2025-12-03 17:03:48 +08:00 |
|
YanbingJiang
|
acde21d8d5
|
Add fused_rmsnorm_gated_cpu kernel for CPU to support Qwen3-Next (#11577)
|
2025-11-21 01:33:31 +08:00 |
|
blzheng
|
d1d4074c4e
|
[CPU] Add gelu_and_mul kernel in sgl-kernel and add ut (#9300)
|
2025-09-08 23:23:13 -07:00 |
|
Cao E
|
7577f0e40f
|
Add graph runner support with torch compile on CPU (#7843)
|
2025-09-07 21:33:58 -07:00 |
|
Chunyuan WU
|
36cc3ffdc7
|
[CPU] [sgl-kernel] set dispatch key of initialize to CatchAll (#7734)
|
2025-07-02 22:39:24 -07:00 |
|
Chunyuan WU
|
6005eceee3
|
[CPU] remove process_group from inputs of shm_allreduce and shm_allgather (#7486)
|
2025-06-30 21:54:11 -07:00 |
|
Chunyuan WU
|
c5131f7a2f
|
[CPU] add c++ kernel to bind CPU cores and memory node (#7524)
|
2025-06-29 19:45:25 -07:00 |
|
YanbingJiang
|
fcde67b016
|
CPU: map changes from developing branch in sgl-kernel (#6833)
Co-authored-by: mingfeima <mingfei.ma@intel.com>
|
2025-06-10 01:08:15 -07:00 |
|
jianan-gu
|
ff00895c46
|
Add CPU optimized kernels for topk and rope fusions (#6456)
|
2025-06-02 17:37:34 -07:00 |
|
Chunyuan WU
|
3ded6235c9
|
Add fp8 fused_experts kernel for CPU in sgl-kernel and add UT (#6404)
|
2025-05-23 02:01:55 -07:00 |
|
blzheng
|
4ba1eea83f
|
Add fp8 qkv_proj_with_rope kernel for CPU in sgl-kernel and add UT (#6493)
|
2025-05-23 00:14:46 -07:00 |
|
blzheng
|
cfe48c5902
|
[CPU] Fix build issue (#6419)
|
2025-05-21 11:17:10 -07:00 |
|
YanbingJiang
|
32cc66efa5
|
Update extend/decode attention kernel for CPU in sgl-kernel and add UTs (#6405)
Co-authored-by: mingfeima <mingfei.ma@intel.com>
|
2025-05-19 21:23:17 -07:00 |
|
Chunyuan WU
|
5dd62c3a6f
|
Add fp8 shared_expert kernel for CPU in sgl-kernel and add UT (#6339)
Co-authored-by: Jiang, Yanbing <yanbing.jiang@intel.com>
Co-authored-by: mingfeima <mingfei.ma@intel.com>
|
2025-05-18 12:42:15 -07:00 |
|
Chunyuan WU
|
fb4959b2c5
|
Add fp8 gemm kernel for CPU in sgl-kernel and add gemm UT (#6216)
Co-authored-by: YanbingJiang <yanbing.jiang@intel.com>
Co-authored-by: mingfeima <mingfei.ma@intel.com>
|
2025-05-15 09:10:40 -07:00 |
|
blzheng
|
0f75b907c6
|
[CPU] Add CMakeLists.txt for sgl-kernel (#6115)
|
2025-05-13 15:30:37 -07:00 |
|
Ma Mingfei
|
a73c4df438
|
Add optimized native kernels in sgl-kernel (#5150)
Co-authored-by: Chunyuan WU <chunyuan.wu@intel.com>
Co-authored-by: YanbingJiang <yanbing.jiang@intel.com>
Co-authored-by: blzheng <beilei.zheng@intel.com>
|
2025-04-08 09:37:46 -07:00 |
|