illsilin
d3fb5a9b8d
fix clang format
2025-03-28 11:52:40 -07:00
lalala-sh
656a3657cb
remove useless code change
2025-03-28 16:25:00 +08:00
root
7054d45579
remove useless changes
2025-03-28 08:21:42 +00:00
root
c9ce26837c
Merge remote-tracking branch 'origin/develop' into moe_gemm_activation
2025-03-28 07:57:53 +00:00
felix
a82f338fb9
hotfix fix sorting int64 ( #2025 )
...
* fix sorting int64
* clang format
* fix example issue
* update WA issue #
---------
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
2025-03-28 11:31:52 +08:00
root
529a1732cd
fp8 with act ready
2025-03-28 02:34:38 +00:00
lalala-sh
d0f3e87129
Merge remote-tracking branch 'origin/develop' into moe_gemm_activation
2025-03-28 10:02:27 +08:00
root
4f4bce30cf
fuse gelu silu act in moe gemm1
2025-03-27 16:59:53 +08:00
coderfeli
bd4a8c71d4
i4 gemm2 ok and i4 gemm1 build
2025-03-27 06:41:31 +00:00
Rostyslav Geyyer
441343a23d
Add MX FP4 device conversion tests ( #1889 )
...
* Add conversion tests
* Fix ctor
* Fix nan logic
* Fix conversion logic
* Permute packed f4_t values
* Fix conversion to float, repack vector elements
* Fix device tests
* Permute elements in a vector
* Add a repro test
* Add a conversion for a repro test
* Update test vectors
* Update conversion
* Fix the test
* Update test vector generator
* Fix vector sr conversion
* Permute conversion args
* Update conversion
* Test
* Fix packing
* Simplify conversion function
* Pack conversion in a loop
* Pack conversion in a loop
* Pack another conversion in a loop
* Pack one more conversion in a loop
* Pack the last conversion in a loop
* Clean up
* Add printf to fix intrinsic
* Add a sw-based workaround
2025-03-26 19:23:01 -05:00
Bartłomiej Kocot
54c81a1fcf
Add support for GKCYX grouped conv fwd ( #2015 )
...
* Add support for GKCYX grouped conv fwd
* fixes
* fix
* changelog
* Fixes
2025-03-26 21:13:38 +01:00
coderfeli
9729c9e3f7
i4 gemm2 ok
2025-03-26 11:41:21 +00:00
coderfeli
6a0cc4aad1
gu fusion for m32 m64 ok
2025-03-26 05:58:22 +00:00
coderfeli
74d8ac608f
gufusion compatible ok, fix warnings
2025-03-26 02:20:30 +00:00
Andriy Roshchenko
72d888821c
MX GEMM examples with FP8, FP16, and E8M0 scales ( #2016 )
...
* Add `scalar_type` specification for E8M0 exponent
* Specialize `nnvb_data_t_selector` for E8M0 exponent
* Remove partial specializations for `scalar_type` of `non_native_vector_base` template
* Reword command line helper string
* Create MX GEMM examples for different scales
2025-03-25 15:33:03 -06:00
Max Podkorytov
1a58522f01
use fast path for sequence generation in old CK ( #1993 )
2025-03-25 11:28:44 -07:00
coderfeli
6ca5892256
gemm2 ok
2025-03-25 15:01:10 +00:00
ruanjm
d49abdaa87
[CK_TILE] Improve RMS/Layer Normalization 2 Pass Pipeline Performance ( #1861 )
...
* 50ms -> 28ms
* Fix bug in non fuse_add_store cases
* Fine tuned setting for 2 pass pipeline
* adjust workload
* remove unnecessary change
* add layernorm
* Adding output quant and unquant results at the same time.
* fix test
* fix format
* tune for cases 128x640 and 128x1024
* bug ifx
2025-03-25 20:09:45 +08:00
coderfeli
234b8d415c
change code
2025-03-25 09:44:32 +00:00
coderfeli
0d266bfd65
add silu
2025-03-25 03:01:27 +00:00
coderfeli
2b15b67b3f
acale ok
2025-03-25 02:52:04 +00:00
Illia Silin
d2eab23958
Split up data_type header. ( #1996 )
...
* split fp64 vector data type
* add missing header
* move e8m0 structs
* split off numeric_utils header
* fix typo
* split off numeric limits header
* update data_type header
* fix clang format
* split off vector type header
* fix clang format
* fix typo for binary_inf
2025-03-24 15:08:54 -07:00
Andriy Roshchenko
6660dc6b8e
Introduce MX GEMM for FP8 data type ( #2000 )
2025-03-24 15:41:07 -06:00
MHYang-gh
c027637a8f
Fix A/B lds transform ( #2007 )
2025-03-22 23:13:50 -07:00
Bartłomiej Kocot
5b0873c31a
Fix split N for large images in groupd conv fwd ( #2004 )
...
* Fix split N for large images in groupd conv fwd
* Fix comments
2025-03-22 23:19:49 +01:00
coderfeli
b865e2cf83
silu ok
2025-03-22 14:03:45 +00:00
coderfeli
d69c1c9590
fuse silu
2025-03-21 07:31:49 +00:00
BingYuan.Zhou
5a0d693b86
fix ck_tile/basic_gemm build error ( #1988 )
2025-03-20 22:01:14 -07:00
Attila T. Áfra
c79bf11148
Fix compile errors on Windows and Linux ( #2002 )
...
* Fix compile error on Windows (call to 'amd_wave_read_first_lane' is ambiguous)
* Fix compile error (no matching function for call to 'cast_to_f32_from_f8')
2025-03-20 12:37:25 -07:00
carlushuang
e3c9886cdf
[CK_TILE] return value with macro in ck_tile::kernel_launch API ( #1982 )
...
* return value with macro and revert the return value
* [CK-TILE] no-macro launch api solution (#1992 )
* no-macro solution
* address -Wcomma
---------
Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com >
2025-03-20 11:00:29 -07:00
jakpiase
0e91d32c61
[CK_TILE] Switch to universal gemm for batched and grouped gemms ( #1919 )
...
* switch to universal gemm for batched and grouped gemms
* added reviewer comments
* fixed grouped gemm tests
2025-03-20 11:17:04 +01:00
rocking
b819c217e4
Sync the kname with instance name ( #1989 )
...
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
2025-03-20 00:06:45 +08:00
felix
7eaedeb36c
Ck moe hot fix ( #1979 )
...
* fix useless code and remove usless oob
* clang format
* fix coredump in e2e test
* fix2
* fix clang format
* fix output oob
* clang format
* rm useless comments
---------
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
2025-03-19 22:58:27 +08:00
coderfeli
98cee8d02b
fix merge
2025-03-18 05:45:04 +00:00
coderfeli
5f49b91237
merge develop
2025-03-18 04:49:40 +00:00
aledudek
5095906975
Async grouped gemm v3 ( #1940 )
...
* Fully async grouped gemm
* Remove commented code
* Remvoe maybe_unused
* host kernel args
* Checkpoint segfault debugging...
* Working part1
* Working part2
* Remvoe comments...
* Use void ptr for gemm kernel host args
* Fix device_grouped_gemm_multiple_d_dl build issue
* Fix device_grouped_gemm_xdl build issue
2025-03-17 16:42:43 +01:00
Bartłomiej Kocot
c2e4898b4b
Grouped conv bwd data NGCHW ( #1967 )
...
* Grouped conv bwd data NGCHW
* fixes
* fix
* Improvements
* Fix
* Fix
* add client example
2025-03-17 13:32:00 +01:00
coderfeli
7dbdff9f9f
moe sorting fix moebuf
2025-03-17 06:20:57 +00:00
coderfeli
5eaa36be18
mork to support 13w tokens
2025-03-17 01:45:34 +00:00
coderfeli
ef8c1333b9
use uint32
2025-03-17 01:45:09 +00:00
coderfeli
6c0e021235
revert v1 test
2025-03-17 01:39:57 +00:00
coderfeli
bccc5192cf
fix uint32
2025-03-17 01:18:32 +00:00
coderfeli
da2659d502
input output all ok
2025-03-15 14:26:30 +00:00
coderfeli
d1e999c05c
int64 index ok now
2025-03-15 13:28:49 +00:00
coderfeli
f911cf7396
impl int64 but result not correct
2025-03-14 13:01:07 +00:00
coderfeli
d4925e1637
fix output oob
2025-03-14 03:19:26 +00:00
carlushuang
3e81279d26
Reapply "[CK_TILE] support hdim=192/128 pair for deepseekv3 ( #1961 )" … ( #1971 )
...
* Reapply "[CK_TILE] support hdim=192/128 pair for deepseekv3 (#1961 )" (#1969 )
This reverts commit 8cbcd3e0d0 .
* fix codegen problem
* Update config.hpp
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2025-03-13 11:41:39 +08:00
illsilin
f8464d2087
fix clang format
2025-03-12 20:21:14 -07:00
coderfeli
d85c034977
fix2
2025-03-13 02:30:07 +00:00
coderfeli
8b05fa935d
fix coredump in e2e test
2025-03-13 02:12:18 +00:00