Commit Graph

407 Commits

Author SHA1 Message Date
Ville Pietilä
7280df1bc3 Add one more unit test for tensor view. 2025-09-24 12:10:26 +00:00
Ville Pietilä
73fb5a026a Initial unit tests for tensor descriptor. 2025-09-24 08:31:42 +00:00
Ville Pietilä
437599c517 Improved CShuffle test. 2025-09-19 13:45:39 +00:00
Ville Pietilä
af6838e5dc Integration test for CShuffle epilogue. 2025-09-19 12:09:08 +00:00
Ville Pietilä
9db02f2564 Added integration test for cshuffle epilogue LDS tile distribution. 2025-09-17 11:45:00 +00:00
Ville Pietilä
4eba92c290 Improve test for accessing diagonal blocks. 2025-09-17 08:27:50 +00:00
Ville Pietilä
9175bef679 Improved integration test. 2025-09-16 15:29:01 +00:00
Ville Pietilä
0d802a305f Integration test for reading diagonal blocks from grouped distributed tensor. 2025-09-16 13:57:57 +00:00
Ville Pietilä
13e4ad093e Add more tests for tensor view. 2025-09-15 15:32:51 +00:00
Ville Pietilä
ff9732b937 Add tests for tile_distribution_encoding. 2025-09-12 14:03:37 +00:00
Ville Pietilä
0d5c1b9638 WIP: Merged conv groups epilogue. 2025-09-11 15:24:36 +00:00
Ville Pietilä
419fd88494 Adde unit tests for tensor view. 2025-09-10 13:33:08 +00:00
Ville Pietilä
d9f0a9cdd0 Fully working conv group merging for TransformConvBwdWeightToGemm. 2025-09-09 09:58:43 +00:00
Ville Pietilä
bc63757ad6 Add unit tests for coordinate transforms. 2025-09-09 07:09:07 +00:00
Ville Pietilä
8845b23254 WIP: Tensor transformations. 2025-09-08 15:41:54 +00:00
Ville Pietilä
1a2b0dcb44 Added unit tests for TransformConvBwdWeightToGemm conv groups are merged. 2025-09-05 13:05:25 +00:00
Ville Pietilä
81a617c108 Initial set of tests for TransformConvBwdWeightToGemm. 2025-09-05 12:32:38 +00:00
kylasa
80ce6a573b gtest to test atomic_add for a tensor (#2716)
* Code drop for gtest to test atomic_add for a tensor

* Adding additional test cases

* Fix clang errors in CI pipeline

* Updated test cases

* Fix the Navi card atomic add problem

* solved the define problem

* add more print out traces

* Fix the float4 missing case

* solved the gfx9 errors

* Address the comment

---------

Co-authored-by: Khushbu <khuagarw@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-09-03 15:32:54 -07:00
msaffari-amd
47d020a993 refactor: use snake_case naming in ck_tile/core components (#2766) 2025-09-03 09:34:11 +02:00
rahjain-amd
4d041837ad Add json dump support to output details from CK/CKTile Examples. (#2551)
* Adding RapidJson Library

* Adding Json Dumps in all CK_Tile Examples

Not verified yet

* Adding json to cktile Batched Transpose

* adding json dumps to layernorm2d_fwd

* Adding  json dump to flatmm_basic

* Adding RapidJson Library

* Adding Json Dumps in all CK_Tile Examples

Not verified yet

* Adding json to cktile Batched Transpose

* adding json dumps to layernorm2d_fwd

* Adding  json dump to flatmm_basic

* Adding json in 03_gemm

* Add json dump to 16_batched_gemm

* Add json dump to gemm_multi_d_fp16

* Add json dump to grouped_gemm

* fix fmha_bwd/fwd

* Fix clang-format errors

exclude include/rapidjson in jenkins as its a third-party library

* Saparating function and defination.

* Update Documentation of 03_gemm

* Refactoring as per code review

* Disable fp8 instances on unsupported targets (#2592)

* Restrict building of gemm_universal_preshuffle_f8 instances to specific targets in CMakeLists.txt

* Add condition to skip gemm_xdl_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt

* Add conditions to skip unsupported targets for gemm_universal_preshuffle_f8 and gemm_xdl_universal_preshuffle_f8 instances in CMakeLists.txt

* Refine conditions to exclude gemm_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt

---------

Co-authored-by: AviralGoelAMD <aviralgoel@amd.com>

* fix clang format

* remove duplicate lines of code from library/src/tensor_operation_instance/gpu/CMakeLists.txt

* Fixing Readme and unifying jsondumps

* adding moe_smoothquant

* adding fused_moe

* Fixing Readme for batched_gemm

* Fixing Readme for grouped_gemm

* adding flatmm

* adding gemm_multi_d_fp16

* adding elementwise

* adding File name when json is dumped

* Fixing Reduce after merge

* adding batched_transpose

* Adding Warptile in Gemm

* Fixing Clang Format

---------

Co-authored-by: Aviral Goel <aviral.goel@amd.com>
Co-authored-by: AviralGoelAMD <aviralgoel@amd.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
2025-09-02 23:31:29 -07:00
Cong Ma
e1ab460d2d [CK TILE GEMM] Fix building issues (#2772)
- Add `WarpGemmMfma_f32_16x16x128_[fp8|bf8]_[fp8|bf8]_CTransposed`
- Replace `__gfx950__` with `CK_GFX950_SUPPORT`
2025-09-02 22:40:18 -07:00
Cong Ma
428090f749 Support transposed C tile in Aquant (#2679)
The performance of Aquant has increased after enabling transposed C.

Do not need to exchange AQ elements among lanes after enabling
transposed C as one thread only holds data from one row.
2025-08-28 13:28:09 -07:00
Mateusz Ozga
0758883fa4 [CK-TILE] Default2DEpilogue, example and adding nullptr_t type for D (#2752)
* Init commit

* Quick fix, CI fails

* Remove CDElementWise

* Add CDEELementWise

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-08-28 12:45:50 -07:00
msaffari-amd
b951416cdb Ck tile gemm low prec data types int4 int8 unit tests (#2718)
* add gemm unit tests for int4, int8 datatypes

* minor changes based on reviews

---------

Co-authored-by: msaffari-amd <msaffari@banff-cyxtera-s78-2.ctr.dcgpu>
2025-08-28 10:47:16 +02:00
John Shumway
9751583f95 Replace auto with function template for c++17. (#2754)
In #2443, a helper function was added test new print functionality, but it used auto for the function parameter types.

We need to support c++17 for downstream libraries, so we cannot use auto there. This PR replaces it witht the equivalent function template implementation.
2025-08-27 17:10:39 -07:00
Cong Ma
245467f359 [CK TILE] Fix bugs in AQuant preshuffle (#2700)
* [CK TILE] Fix bugs in AQuant preshuffle

- Make Aquant works with block Mx64x256. `M` could be 16, 32, 64
- Make Aquant works with warp 16x16x32 and 32x32x16.

* [CK TILE] Rename Preshuffle to PreshuffleQuant

The new name, PreshuffleQuant, explicitly states the function's purpose:
to preshuffle the quantization matrix.

* [CK TILE Block Scale] Use GemmConfig to save tile properties

- Remove specialization of GemmQuantTypeConfig
- Pass GemmConfig around which contains tile properties. Stop using hard
  coded tile properties in `gemm_calc_aquant()`

* [CK TILE Block Scale] Rename GemmConfig used in block scale

    - Remove unused GemmConfig
    - Rename GemmConfig used in block scale

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>
2025-08-27 00:05:54 -07:00
JH-Leon-KIM-AMD
19d5327c45 Test comprehensive dataset (#2685)
* Add CSV-driven convolution test pipeline

- Add test_grouped_convnd_fwd_dataset_xdl.cpp with CSV reader functionality
- Add complete dataset generation toolchain in test_data/
- Add Jenkins integration with RUN_CONV_COMPREHENSIVE_DATASET parameter
- Ready for comprehensive convolution testing with scalable datasets

* Update convolution test dataset generation pipeline

* add 2d, 3d dataset csv files

* Remove CSV test dataset files from repository

* Update generate_test_dataset.sh

* Fix channel division for MIOpen to CK conversion

* Remove unnecessary test files

* Fix clang-format-18 formatting issues

* TEST: Enable comprehensive dataset tests by default

* Fix test_data path in Jenkins - build runs from build directory

* Add Python dependencies and debug output for CSV generation

* Remove Python package installation - not needed

* Add better debugging for generate_test_dataset.sh execution

* Fix Jenkinsfile syntax error - escape dollar signs

* Add PyTorch to Docker image for convolution test dataset generation

- Install PyTorch CPU version for lightweight model execution
- Fixes Jenkins CI failures where CSV files were empty due to missing PyTorch
- Model generation scripts require PyTorch to extract convolution parameters

* Add debugging to understand Jenkins directory structure and CSV file status

- Print current working directory
- List CSV files in test_data directory
- Show line counts of CSV files
- Will help diagnose why tests fail in Jenkins

* Fix clang-format-18 formatting issues

- Applied clang-format-18 to test file
- Fixed brace placement and whitespace issues

* Add detailed debugging for CSV dataset investigation

- Check generated_datasets directory contents
- List all CSV files with line counts
- Show first 5 lines of main CSV file
- Applied clang-format-18 formatting
- This will help identify why CSV files are empty in Jenkins

* keep testing add pytorch installation in shell script

* Use virtual environment for PyTorch installation

- Jenkins user doesn't have permission to write to /.local
- Create virtual environment in current directory (./pytorch_venv)
- Install PyTorch in virtual environment to avoid permission issues
- Use PYTHON_CMD variable to run all Python scripts with correct interpreter
- Virtual environment will be reused if it already exists

* Remove debug code and reduce verbose logging in Jenkins

- Remove bash -x and debug commands from Jenkinsfile execute_args
- Remove all debug system() calls and getcwd from C++ test file
- Remove unistd.h include that was only needed for getcwd
- Remove debug print in CSV parser
- Add set +x to generate_test_dataset.sh to disable command echo
- Redirect Python script stdout to /dev/null for cleaner output

This makes Jenkins logs much cleaner while still showing progress messages.

* install gpu torch

* Clean up and optimize comprehensive dataset test pipeline

- Reorder Jenkinsfile execution: build -> generate data -> run test
- Remove commented-out debug code from generate_test_dataset.sh
- Ensure all files end with proper newline character (POSIX compliance)
- Keep useful status messages while removing development debug prints
- Set MAX_ITERATIONS=0 for unlimited test generation in production

* Add configuration modes to reduce test execution time

- Add --mode option (half/full) to generate_model_configs.py
  - half mode (default): ~278 configs (224 2D + 54 3D) -> ~1,058 total tests
  - full mode: ~807 configs (672 2D + 135 3D) -> ~3,093 total tests
- Update generate_test_dataset.sh to use CONFIG_MODE environment variable
- Keeps all model types but reduces parameter combinations intelligently
- Fixes Jenkins timeout issue (was running 3,669 tests taking 17+ hours)
- Default half mode should complete in ~4-5 hours instead of 17+ hours

* Add small mode for quick testing of comprehensive dataset

* jenkins pipeline test done

* jenkins test done

* Trigger CI build

* remove test comment and update data generation option as half

---------

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
2025-08-26 22:18:05 +02:00
John Afaganis
508e7912f9 Revert "[CK-TILE] Default epilogue, adding support for D (#2629)" (#2746)
This reverts commit d43228fbca.
2025-08-26 09:48:49 -07:00
SamiAario-AMD
5e85c38d7d Lwpck 3548 gemm test cleanups (#2717)
* Remove some unnecessary calls to create_args in basic and universal GEMM tests

* Remove unnecessary include statements in universal GEMM tests

* Improve compilation time of basic GEMM tests by only compiling the precision variants that we need

* Universal GEMM PrecType should be the same as CDataType

* Improve compilation time of universal GEMM tests by only compiling the precision variants that we need

* Revert to constexpr when defining some constants
2025-08-26 13:25:48 +03:00
Mateusz Ozga
d43228fbca [CK-TILE] Default epilogue, adding support for D (#2629)
* Extend 2d-epilogue, D support

* Added tests & update

* Remove unused attribute

* Extend tests

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-08-25 19:29:35 -07:00
Max Podkorytov
f38751fc2a invoke script directly (#2687) 2025-08-19 00:23:07 -07:00
linqunAMD
9fcc1ee9fd Support Wave32 in CK_TILE - Part 1 (#2594)
* Support wave32/wave64 in CK_TILE - Part 1

* remove blocksize in kernel launch

* fix build error

* fix clang format

* fix clang format 2

* fix clang format 3

* fix fmha build error

* fix fmha build 2

* fix fmha build 3

* fix build error 4

* address review comment

* update change log

* replace KernelBlockSize with kBlockSize

* fix CI fail

* fix clang format

* address review comment and rebase code.

* fix universal test fail

---------

Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-08-18 10:08:31 -07:00
Tianyuan Wu
68134b60e4 [CK_TILE] CK_TILE GEMM WMMA Support for GFX11/GFX12 (#2466)
* WMMA GEMM F16 Implementation

Signed-off-by: root <tianyuwu@amd.com>

* Self-review

Signed-off-by: root <tianyuwu@amd.com>

* ASIC check minor tweak

Signed-off-by: root <tianyuwu@amd.com>

* add missing include file

* Set GPU_TARGETS to gfx11/12 generic

Signed-off-by: root <tianyuwu@amd.com>

* INT8 GFX12

Signed-off-by: root <tianyuwu@amd.com>

* add int8x16 branch

* Fix CI script

Signed-off-by: root <tianyuwu@amd.com>

* Fix typo

Signed-off-by: root <tianyuwu@amd.com>

* Add CK_Tile WMMA example

Signed-off-by: Tianyuan Wu <tianyuwu@amd.com>

* Fix CI

Signed-off-by: Tianyuan Wu <tianyuwu@amd.com>

* fix clang format

* Set M/N_Warp Back to Constant

Signed-off-by: Tianyuan Wu <tianyuwu@amd.com>

* Use GemmConfigComputeV3 by default

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Enable CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT for gfx12

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Remove CK_Tile wmma gemm examples from the CI list

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Add atomic add fallback method for gfx11

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Fix typo

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Omit copyright year

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Support non-square cases

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Fix CI

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Add get_device_ip()

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Revert "Add atomic add fallback method for gfx11"

This reverts commit 07a79e797d.

Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>

* Revert "Enable CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT for gfx12"

This reverts commit ceee918007.

* Revise method name and typos

Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>

* clang-format

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Try fix CI

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Revert "Try fix CI"

This reverts commit 7a7241085e.

* clang-format

Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>

* Fix typo caused by merge

Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>

* Fix typo caused by merging

Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>

---------

Signed-off-by: root <tianyuwu@amd.com>
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com>
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>
Co-authored-by: joye <joye@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
2025-08-15 16:22:27 -07:00
Emily Martins
10395fc895 [CK_Tile] Refactor Permute and MOE Smoothquant ctests to gtests (#2622)
* Refactor CK tile permute ctests to gtests

* Refactor CK tile MOE smoothquant ctests to gtests

* fix typo in comment

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update invalid case in else clause for get_precision_string

* Refactor permute gtests to use templated versions of matrix_core_swizzle and permute functions

---------

Co-authored-by: root <root@splinter-126-wr-c2.aus.dcgpu>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-14 11:01:54 -07:00
Emily Martins
70dce4e0c6 [CK_Tile] Refactor MOE Sorting and Smoothquant ctests to gtests (#2596)
* refactor moe_sorting ctests to use gtest framework

* Refactor ctests for smoothquant to gtests

* fix clang format to use version 18

* Print local_eid in MOE sorting gtests

* Remove extra space in smoothquant output
2025-08-14 10:54:57 -07:00
joyeamd
bcc38deff7 [CK_TILE]fix elementwise example in gfx11/12 (#2676)
* fix elementwise examples

* improve the robust

* fix ck_tile's elementwise test

* update elementwise test
2025-08-13 15:21:46 -07:00
JH-Leon-KIM-AMD
b963478759 CSV-driven convolution test pipeline (#2581)
* Add CSV-driven convolution test pipeline

- Add test_grouped_convnd_fwd_dataset_xdl.cpp with CSV reader functionality
- Add complete dataset generation toolchain in test_data/
- Add Jenkins integration with RUN_CONV_COMPREHENSIVE_DATASET parameter
- Ready for comprehensive convolution testing with scalable datasets

* Update convolution test dataset generation pipeline

* add 2d, 3d dataset csv files

* Remove CSV test dataset files from repository

* Update generate_test_dataset.sh

* Fix channel division for MIOpen to CK conversion

* Remove unnecessary test files

* Fix clang-format-18 formatting issues

---------

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
2025-08-13 16:24:34 +02:00
Cong Ma
452791a3ba Preshuffle AQ matrix in block scale gemm (#2624)
* Preshuffle AQ matrix in block scale gemm

* turns the output to fp16. Increase the repetition time.

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>
2025-08-12 21:32:51 -07:00
Yashvardhan Agarwal
191c62967b Fixes to "General 2D Reduction Kernel" (#2535) (#2656)
* fix reduce2d

- revret the combine_partial_results() chnages
- remove auto from function def

* clang-format
2025-08-11 15:01:33 +02:00
Max Podkorytov
ab26026835 [CK-tile] add more tests for batched transpose testing the rectangular block tile sizes (#2634)
* add failing tests

* swap out and reference

* add constraint assert to transpose input distribution

* test both pipelines with rectangular block tile

* print mismatched indices

* add a smaller failing test for old pipeline

* print grid and block

* fill output before operating on it

* swap m/n tile sizes and make one test pass

* add device syncs

* add one more flipped test case

* flip block tile at host arg init

* fix tiles for lds pipeline

* clang-format

* rename tests

* roll back error check

* remove device syncs

* reduce large test case's size
2025-08-07 16:51:53 -07:00
Gino Lu
5d6d236b25 Add e8m0 scaled convert into CK_TILE (#2617)
* first commit

* remove redundent code

* modify according to comments.

* fix type_convert error with scaled_type_convert
2025-08-07 21:37:28 +08:00
Sami Remes
ffdee5e774 [CK_TILE] Enable printing more structures in CK-Tile (#2443)
* Add more printing to core cktile

* Revert other changes in static encoding pattern

* Refactor to using a free print() function

* Remove loops and print just the containers

* Print tuple with better formatting, fix sequence compilation

* Add some tests for print utility

* Add print utility header

* Print for static_encoding_pattern

* add buffer_view printing

* Align vector_traits

* Fix formatting

* Lower-case enum strings

Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com>

* Remove empty comment lines

* Fix test with lower-case too

* Reduce repeated code in print tests, move helper function closer to type definition, test X&Y

* Add test_print_common.hpp

* add print.hpp in core.hpp

---------

Co-authored-by: Aviral Goel <aviral.goel@amd.com>
Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
2025-08-07 15:45:27 +03:00
Bartłomiej Kocot
5328b232b2 Grouped Convolution Forward Infer Bias Bnorm Activ (#2621)
* Grouped Convolution Forward Infer Bias Bnorm Activ

* 3d
2025-08-07 08:36:47 +02:00
Yashvardhan Agarwal
4750b293fe General 2D Reduction Kernel (#2535)
* General 2D Reduction Kernel

* Move the reduction kernel from the example
* Split the code and add the necessary policy, problem, shape files as
per ck_tile convention
* Add/modify the headers
* Modified the example to work with the 'new' kernel
* Added tests for the kernel
* N-D refernce reduce
* Added support for N-D input with transform to 2D
* Added padding to support various input sized tensors
* Bug fix in the thread buffer constructor
* Some comments to explain the reduce2d block kernel

* comments resolution

* clang-format

* comments resolution

* clang-format

* clang-format

* comments resolution

* clang-format
2025-08-06 15:36:59 +02:00
Max Podkorytov
2a78da4708 fix build for test_ck_tile_fp8 on rhel8 (#2615) 2025-08-04 17:43:15 -07:00
Illia Silin
fb96b49666 fix test_mx_mfma errors (#2614) 2025-08-04 11:43:47 -07:00
Max Podkorytov
0d9439760f remove std::format (#2604) 2025-08-01 19:22:07 -07:00
Max Podkorytov
f36cb5b2aa [CK-tile] remove old ck-tile transpose test (#2591)
* remove old ck-tile transpose test

* rename test exe for consistency

* replace batched transpose regression test
2025-08-01 14:50:09 -07:00
Thomas Ning
7c44a763fa Fix the GFX 950 Universal GEMM (#2597)
* solve the gfx950 error

* clang format

* fix a typo error

---------

Co-authored-by: ThomasNing <thomasning@amd.com>
2025-08-01 09:32:24 -07:00
Ville Pietilä
e962a41638 Automatic deduction of split-K value for grouped convolution (#2491)
* Split-K autodeduction for DeviceGroupedConvBwdWeight_Xdl_CShuffle and DeviceGroupedConvBwdWeight_Xdl_CShuffleV3.

* Split-K autodeduction for DeviceGroupedConvBwdWeightTwoStage_Xdl_CShuffle.

* Use simple best occupancy model to calculate the split-K.

* Handle split-K autodeduction in explicit gemm conv.

* Add unit tests for split-K autodeduction.

* Remove oversubscription.

* Small fixes.

* Added split-K autodeduction for DeviceGroupedConvBwdWeightMultipleD_Xdl_CShuffle.

* Run clang formatting.

* Fix error handling in the conv profiler.

* Add missing documentation for the autodeducted split-K values.

* Add split-K autodeduction to DeviceGroupedConvBwdWeight_Explicit_Xdl solver.

* Fix clang formatting and split-K profiler documentation.

* Rename max_occupancy value variable.

* Calculate grid size for split-K autodeduction directly from input array shapes and template params.

---------

Co-authored-by: Ville Pietilä <>
2025-07-31 12:08:45 +02:00