Commit Graph

24 Commits

Author SHA1 Message Date
linqunAMD
07def6b13d Extend XDL kernel to Support RDNA3/4 - Part 4 (#2724)
* Fix example

* fix build error

* update pk_i4 & moe test case

* fix all instance build (examples)

* fix batched_gemm_gemm (example)

* disable example_gemm_bias_softmax_gemm_permute on gfx11

* remove unnecessary disable gfx11

* update tests

* update tests2

[ROCm/composable_kernel commit: 321627aec5]
2025-09-12 08:17:07 -07:00
JH-Leon-KIM-AMD
b22e283e7c Test comprehensive dataset (#2685)
* Add CSV-driven convolution test pipeline

- Add test_grouped_convnd_fwd_dataset_xdl.cpp with CSV reader functionality
- Add complete dataset generation toolchain in test_data/
- Add Jenkins integration with RUN_CONV_COMPREHENSIVE_DATASET parameter
- Ready for comprehensive convolution testing with scalable datasets

* Update convolution test dataset generation pipeline

* add 2d, 3d dataset csv files

* Remove CSV test dataset files from repository

* Update generate_test_dataset.sh

* Fix channel division for MIOpen to CK conversion

* Remove unnecessary test files

* Fix clang-format-18 formatting issues

* TEST: Enable comprehensive dataset tests by default

* Fix test_data path in Jenkins - build runs from build directory

* Add Python dependencies and debug output for CSV generation

* Remove Python package installation - not needed

* Add better debugging for generate_test_dataset.sh execution

* Fix Jenkinsfile syntax error - escape dollar signs

* Add PyTorch to Docker image for convolution test dataset generation

- Install PyTorch CPU version for lightweight model execution
- Fixes Jenkins CI failures where CSV files were empty due to missing PyTorch
- Model generation scripts require PyTorch to extract convolution parameters

* Add debugging to understand Jenkins directory structure and CSV file status

- Print current working directory
- List CSV files in test_data directory
- Show line counts of CSV files
- Will help diagnose why tests fail in Jenkins

* Fix clang-format-18 formatting issues

- Applied clang-format-18 to test file
- Fixed brace placement and whitespace issues

* Add detailed debugging for CSV dataset investigation

- Check generated_datasets directory contents
- List all CSV files with line counts
- Show first 5 lines of main CSV file
- Applied clang-format-18 formatting
- This will help identify why CSV files are empty in Jenkins

* keep testing add pytorch installation in shell script

* Use virtual environment for PyTorch installation

- Jenkins user doesn't have permission to write to /.local
- Create virtual environment in current directory (./pytorch_venv)
- Install PyTorch in virtual environment to avoid permission issues
- Use PYTHON_CMD variable to run all Python scripts with correct interpreter
- Virtual environment will be reused if it already exists

* Remove debug code and reduce verbose logging in Jenkins

- Remove bash -x and debug commands from Jenkinsfile execute_args
- Remove all debug system() calls and getcwd from C++ test file
- Remove unistd.h include that was only needed for getcwd
- Remove debug print in CSV parser
- Add set +x to generate_test_dataset.sh to disable command echo
- Redirect Python script stdout to /dev/null for cleaner output

This makes Jenkins logs much cleaner while still showing progress messages.

* install gpu torch

* Clean up and optimize comprehensive dataset test pipeline

- Reorder Jenkinsfile execution: build -> generate data -> run test
- Remove commented-out debug code from generate_test_dataset.sh
- Ensure all files end with proper newline character (POSIX compliance)
- Keep useful status messages while removing development debug prints
- Set MAX_ITERATIONS=0 for unlimited test generation in production

* Add configuration modes to reduce test execution time

- Add --mode option (half/full) to generate_model_configs.py
  - half mode (default): ~278 configs (224 2D + 54 3D) -> ~1,058 total tests
  - full mode: ~807 configs (672 2D + 135 3D) -> ~3,093 total tests
- Update generate_test_dataset.sh to use CONFIG_MODE environment variable
- Keeps all model types but reduces parameter combinations intelligently
- Fixes Jenkins timeout issue (was running 3,669 tests taking 17+ hours)
- Default half mode should complete in ~4-5 hours instead of 17+ hours

* Add small mode for quick testing of comprehensive dataset

* jenkins pipeline test done

* jenkins test done

* Trigger CI build

* remove test comment and update data generation option as half

---------

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

[ROCm/composable_kernel commit: 19d5327c45]
2025-08-26 22:18:05 +02:00
JH-Leon-KIM-AMD
68eecd0d5e CSV-driven convolution test pipeline (#2581)
* Add CSV-driven convolution test pipeline

- Add test_grouped_convnd_fwd_dataset_xdl.cpp with CSV reader functionality
- Add complete dataset generation toolchain in test_data/
- Add Jenkins integration with RUN_CONV_COMPREHENSIVE_DATASET parameter
- Ready for comprehensive convolution testing with scalable datasets

* Update convolution test dataset generation pipeline

* add 2d, 3d dataset csv files

* Remove CSV test dataset files from repository

* Update generate_test_dataset.sh

* Fix channel division for MIOpen to CK conversion

* Remove unnecessary test files

* Fix clang-format-18 formatting issues

---------

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

[ROCm/composable_kernel commit: b963478759]
2025-08-13 16:24:34 +02:00
aledudek
c26c2b1fdc Add grouped conv fwd 3d GKCYX instances for f32, f16, bf16 (#2069)
* Part1

* Add grouped conv fwd 3d GKCYX instances for f32, f16, bf16

* Add missing coma

* Add missing cpp instance files

* Fix 3d layout

* Add missing closing bracket

* Add missing comp x2 and part2 instances

* Fix typo in instance name

* fix

* Fix

---------

Co-authored-by: Bartlomiej Kocot <barkocot@amd.com>

[ROCm/composable_kernel commit: 7c32652e03]
2025-04-16 11:00:55 +02:00
Bartłomiej Kocot
67c3bcfce1 Grouped conv fwd v3 fix for SplitN an G > 1 (#2038)
* Grouped conv fwd v3 fix for SplitN an G > 1

* Remove int8 large test

* Retore int8 test

[ROCm/composable_kernel commit: ec742908bd]
2025-04-01 13:19:35 -07:00
Bartłomiej Kocot
6ccfb817e4 Add support for GKCYX grouped conv fwd (#2015)
* Add support for GKCYX grouped conv fwd

* fixes

* fix

* changelog

* Fixes

[ROCm/composable_kernel commit: 54c81a1fcf]
2025-03-26 21:13:38 +01:00
Bartłomiej Kocot
d72d2475fd Add NGCHW bf16 grouped conv fwd instances (#1783)
* Add NGCHW bf16 grouped conv fwd instances

* add missed cmake

[ROCm/composable_kernel commit: 159fa31946]
2025-01-01 18:00:06 +01:00
Lin Sun
6cc9f5e486 Linsun/convint8 fwd instances (#1626)
Add instances for int8 grouped conv2d fwd
---------

Co-authored-by: root <root@dell300x-pla-t28-03.pla.dcgpu>
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

[ROCm/composable_kernel commit: 0c9012fb70]
2024-11-04 16:33:20 -08:00
Bartłomiej Kocot
e4f4e04add Add support for NGCHW in grouped conv fwd (#1499)
* Support NGCHW in grouped conv fwd

* Remove not needed variable

* Fixes

[ROCm/composable_kernel commit: 4ba52b35dc]
2024-09-20 10:45:46 +02:00
Bartłomiej Kocot
0f7c45915c Add performance and large tensor tests for grouped conv (#1456)
* Add performance and large tensor tests for grouped conv

* Resize tests

* Resize tests

* update the python script to parse the grouped_conv results

* Remove int8 tests

* change bwd wei layout

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 2581727d2a]
2024-08-16 07:48:30 -07:00
Bartłomiej Kocot
458d8bef26 Add Grouped Conv Fwd Large Tensor kernel (#1432)
* Support 64 bit indexing

* Add new grouped conv fwd kernel for large tensors

* Add instances large tensor

* Fixes for transform conv to gemm

* Fixes

* fixes

* Remove not needed instances

* examples fixes

* Remove not need ds arrays

* Fix tests

* Add 2GB check in gridwise dl

* Fixes

[ROCm/composable_kernel commit: 4ec5c52a0c]
2024-08-06 10:06:10 +02:00
Bartłomiej Kocot
c885afdaae Support access per groups and filter3x3 in grouped conv fwd (#1382)
* Support access per groups and filter3x3 in grouped conv fwd

* Fixes for large cases

* Fixes for large tensors

[ROCm/composable_kernel commit: 82e8a78a3f]
2024-07-12 11:08:42 -07:00
Jun Liu
fa73739812 Fix issue with multiple targets and remove smfmac tests from unsupported test targets (#1372)
[ROCm/composable_kernel commit: 959073842c]
2024-07-03 23:34:38 -07:00
Bartłomiej Kocot
cc0dd8a45e Fix cmake warnings (#1342)
* Cmake add -Wno-nvcc-compt

* Remove template without initialization list

* dpp remove template without init list

* Fixes

[ROCm/composable_kernel commit: 510325a468]
2024-06-21 09:47:58 +02:00
Bartłomiej Kocot
d413c30ff4 Support large tensors in grouped conv fwd (#1332)
* Support large tensors in grouped conv fwd

* Multi ABD fixes

* Fix calculate element space size

[ROCm/composable_kernel commit: dc1e9c5df9]
2024-06-14 09:53:03 -05:00
Bartłomiej Kocot
41c68496e6 Integrate universal gemm with conv forward (#1320)
* Integrate universal gemm with conv fwd

* Fix conv fwd wmma test

* Fix instances

* Remove direct load check

[ROCm/composable_kernel commit: ac58cc5d1d]
2024-06-05 13:01:29 -05:00
Illia Silin
1f4d13b2b5 Split the instances by architecture. (#1223)
* parse examples inside the add_example_executable function

* fix the example 64 cmake file

* add xdl flag to the gemm_bias_softmax_gemm_permute example

* add filtering of tests based on architecture type

* enable test_grouped_gemm for gfx9 only

* enable test_transpose only for gfx9

* only linnk test_transpose if it gets built

* split the gemm instances by architectures

* split gemm_bilinear,grouped_conv_bwd_weight instances by targets

* split instances by architecture

* split grouped_conv instances by architecture

* fix clang format

* fix the if-else logic in group_conv headers

* small fix for grouped convolution instances

* fix the grouped conv bwd weight dl instances

* fix client examples

* only enable client examples 3 and 4 on gfx9

* set the gfx9 macro

* make sure the architecture macros are set by cmake

* use separate set of xdl/wmma flags for host code

* sinmplify the main cmake file

* add conv_fwd_bf8 instance declaration

[ROCm/composable_kernel commit: ae57e5938e]
2024-04-02 09:42:17 -07:00
Bartłomiej Kocot
6a98ad9d89 Introduce multiABD api and deprecate multiD (#1035)
* Introduce multiABD api and deprecate multiD

* Replace multiD with multiABD

* Mark structures as deprecated

* Change doxygen deprecated to note to avoid warnings

[ROCm/composable_kernel commit: f2398f612d]
2023-11-14 17:00:40 +01:00
Bartłomiej Kocot
4f95517ccc Support multi AB for grouped conv fwd xdl (#1027)
* Support multi AB for grouped conv fwd xdl

* Add instances

* Add client example

* Add example

* Add interface test

* Minor fixes

Minor fixes

Minor fixes

* Comment fixes

* Fixes

* Reference fix

* Test xdl fixes

* Improve multi_ab interface test

[ROCm/composable_kernel commit: 49e52bb357]
2023-11-10 15:54:44 +01:00
Bartłomiej Kocot
a85bb57471 Add 3d grouped conv fwd wmma instances (#935)
* Add 3d grouped conv fwd wmma instances

* Refactor fwd conv tests

* Split wmma instances for each specialization

* Minor stylistic fixes

[ROCm/composable_kernel commit: c95538325b]
2023-09-23 18:56:31 +02:00
Bartłomiej Kocot
ac574360c7 Enable grouped conv with small K or C (#822)
* Enable grouped conv with small K or C

* Add missing instances

* Refactor grouped conv fwd instances

* Fix fp16 instances since it supports src_per_vec %2 = 0

* Add generic instances

[ROCm/composable_kernel commit: 472fa029ba]
2023-08-09 10:40:55 -05:00
Illia Silin
b57fbee2f1 update copyright headers (#726)
[ROCm/composable_kernel commit: b94fd0b227]
2023-05-31 18:46:57 -05:00
Po Yen Chen
02db748e74 Modularize ckProfiler operations (#514)
* Re-structure ckProfiler source files

* Rename profiler.cpp to main.cpp

* Modularize ckProfiler operations

* Add description for profiler operations

* Use longer name to avoid name collision

* Use macro to delay expansion

* Use std::move() to avoid object copying

* Prohibit users from calling dtor

* Use macro to eliminate redundant code

* Make friend function hidden

* Add missing include directive <iostream>

* Fix wrong include directives

* Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test

Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>

[ROCm/composable_kernel commit: 8784a72e23]
2022-12-01 15:15:02 -06:00
Chao Liu
236f946292 Clean up conv example, Instances, profiler and test (#324)
* convnd_fwd fp16 example

* update example

* update example

* update instance

* updating refernce conv

* update reference conv

* update conv fwd profiler

* update conv 1d and 3d instance

* update include path

* clean

* update profiler for conv bwd data and weight

* update conv bwd weight

* clean

* update conv example

* update profiler for conv bwd weight

* update ckprofiler for conv bwd data

* fix reference conv bwd data bug; update conv bwd data test

* update examples

* fix initialization issue

* update test for conv fwd

* clean

* clean

* remove test case too sensitive to error threshhold

* fix test

* clean

* fix build

* adding conv multiple d

* adding conv multiple D

* add matrix padder

* add gemm padding to convnd

* adding group conv

* update gemm multi-d

* refactor

* refactor

* refactor

* clean

* clean

* refactor

* refactor

* reorg

* add ds

* add bias

* clean

* add G

* adding group

* adding group

* adding group

* update Tensor

* clean

* update example

* update DeviceGemmMultipleD_Xdl_CShuffle

* update conv bwd-data and bwd-weight

* upate contraction example

* update gemm and batch gemm with e permute

* fix example build

* instance for grouped conv1d

* update example

* adding group conv instance

* update gemm bilinear instance

* update gemm+add+add+fastgelu instance

* update profiler

* update profiler

* update test

* update test and client example

* clean

* add grouped conv into profiler

* update profiler

* clean

* add test grouped conv, update all conv test to gtest

* update test

[ROCm/composable_kernel commit: 500fa99512]
2022-07-29 18:19:25 -05:00