Commit Graph

1797 Commits

Author SHA1 Message Date
coderfeli
d8731a7599 add logits 2025-04-16 09:31:37 +00:00
Bernard
9f73300e80 hack for cap_logits naive 2025-04-16 09:25:10 +00:00
coderfeli
795e2b1e43 fix 2 2025-04-15 09:52:10 +00:00
coderfeli
ff281e135d fix multi batch and hack page idx core 2025-04-15 01:21:23 +00:00
coderfeli
234528e06c tile gather support async pipe ok 2025-04-12 14:45:38 +00:00
coderfeli
c2cdfda718 merge pa 2025-04-11 02:40:23 +00:00
coderfeli
b9d4a20a42 v cache ok 2025-04-09 12:00:59 +00:00
coderfeli
4b5859f717 refine apis 2025-04-08 03:42:46 +00:00
coderfeli
8d62ff557a rename to tile_scatter_gather 2025-04-08 02:50:46 +00:00
coderfeli
bfbd28c9d8 support page dim configure in tile window 2025-04-08 01:53:17 +00:00
coderfeli
738a7427cb fix hacks 2025-04-07 12:10:55 +00:00
coderfeli
4e644a33ab fix k origin 2025-04-07 10:04:22 +00:00
coderfeli
57c9d84eb1 run ok 2025-04-07 03:45:21 +00:00
coderfeli
c3a84fa680 merge 2025-04-05 13:46:49 +00:00
coderfeli
fe2ea699e5 paged window run ok 2025-04-05 13:45:46 +00:00
coderfeli
867a4e527c fix bugs 2025-04-04 13:40:10 +00:00
coderfeli
491178276f fix fp8 scale 2025-04-03 11:10:37 +00:00
lalala-sh
3037d8bac1 mul int4 scale 2025-04-03 18:06:12 +08:00
root
20f6674bf6 fix no quant case 2025-04-03 02:46:01 +00:00
coderfeli
86d172cdfa add fa hack 2025-04-03 01:22:25 +00:00
root
b2b34fffbb fix fp8 16x16 2025-04-02 16:27:52 +00:00
root
85f83330b5 fuse moe activation 2025-04-02 07:02:09 +00:00
root
45a0463f1f moe gemm draft v0.1 2025-03-26 08:14:11 +00:00
lalala-sh
938f4234f6 scatter gemm v1 2025-03-25 07:28:50 +00:00
coderfeli
e285c77c5f fix buid 2025-03-18 06:58:54 +00:00
coderfeli
1c90d50b5b update moe api fix aiter build 2025-03-18 05:59:24 +00:00
coderfeli
98cee8d02b fix merge 2025-03-18 05:45:04 +00:00
coderfeli
5f49b91237 merge develop 2025-03-18 04:49:40 +00:00
Illia Silin
1342ecf7fb Add a daily CI build on gfx908. (#1987)
* add one daily ci build on gfx908

* add redis invocation tag for gfx908

* make ci build for gfx908 conditional

* fix groovy logic

* add option to run perf tests for gfx908

* disable a few tests on mi100
2025-03-17 18:08:53 -07:00
Illia Silin
07f25186b2 disable ck_tile basic gemm (#1986) 2025-03-17 15:26:43 -07:00
aledudek
5095906975 Async grouped gemm v3 (#1940)
* Fully async grouped gemm

* Remove commented code

* Remvoe maybe_unused

* host kernel args

* Checkpoint segfault debugging...

* Working part1

* Working part2

* Remvoe comments...

* Use void ptr for gemm kernel host args

* Fix device_grouped_gemm_multiple_d_dl build issue

* Fix device_grouped_gemm_xdl build issue
2025-03-17 16:42:43 +01:00
Bartłomiej Kocot
c2e4898b4b Grouped conv bwd data NGCHW (#1967)
* Grouped conv bwd data NGCHW

* fixes

* fix

* Improvements

* Fix

* Fix

* add client example
2025-03-17 13:32:00 +01:00
coderfeli
7dbdff9f9f moe sorting fix moebuf 2025-03-17 06:20:57 +00:00
coderfeli
5eaa36be18 mork to support 13w tokens 2025-03-17 01:45:34 +00:00
coderfeli
ef8c1333b9 use uint32 2025-03-17 01:45:09 +00:00
coderfeli
6c0e021235 revert v1 test 2025-03-17 01:39:57 +00:00
coderfeli
bccc5192cf fix uint32 2025-03-17 01:18:32 +00:00
coderfeli
da2659d502 input output all ok 2025-03-15 14:26:30 +00:00
coderfeli
d1e999c05c int64 index ok now 2025-03-15 13:28:49 +00:00
coderfeli
f911cf7396 impl int64 but result not correct 2025-03-14 13:01:07 +00:00
coderfeli
d4925e1637 fix output oob 2025-03-14 03:19:26 +00:00
valarLip
52b1cd7780 hotfix fmoe build issue (#1976) 2025-03-13 15:11:59 +08:00
dependabot[bot]
de7a745ca6 Bump rocm-docs-core from 1.17.1 to 1.18.1 in /docs/sphinx (#1977)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.17.1 to 1.18.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.17.1...v1.18.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-12 23:36:36 -07:00
carlushuang
3e81279d26 Reapply "[CK_TILE] support hdim=192/128 pair for deepseekv3 (#1961)" … (#1971)
* Reapply "[CK_TILE] support hdim=192/128 pair for deepseekv3 (#1961)" (#1969)

This reverts commit 8cbcd3e0d0.

* fix codegen problem

* Update config.hpp

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2025-03-13 11:41:39 +08:00
lalala-sh
948aaa5641 prepare moe gemm 2025-03-13 03:33:55 +00:00
illsilin
f8464d2087 fix clang format 2025-03-12 20:21:14 -07:00
coderfeli
d85c034977 fix2 2025-03-13 02:30:07 +00:00
coderfeli
8b05fa935d fix coredump in e2e test 2025-03-13 02:12:18 +00:00
Illia Silin
d4a6d69643 disable tests that take too long to build for gfx90a (#1975) 2025-03-12 17:54:03 -07:00
feli
251afab3b7 ck_moe: fix useless code and remove usless oob (#1972)
* fix useless code and remove usless oob

* clang format

---------

Co-authored-by: coderfeli <coderfeli@163.com>
2025-03-12 09:22:42 -07:00