composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-19 04:19:36 +00:00

Author	SHA1	Message	Date
linqunAMD	07def6b13d	Extend XDL kernel to Support RDNA3/4 - Part 4 (#2724 ) * Fix example * fix build error * update pk_i4 & moe test case * fix all instance build (examples) * fix batched_gemm_gemm (example) * disable example_gemm_bias_softmax_gemm_permute on gfx11 * remove unnecessary disable gfx11 * update tests * update tests2 [ROCm/composable_kernel commit: `321627aec5`]	2025-09-12 08:17:07 -07:00
JH-Leon-KIM-AMD	b22e283e7c	Test comprehensive dataset (#2685 ) * Add CSV-driven convolution test pipeline - Add test_grouped_convnd_fwd_dataset_xdl.cpp with CSV reader functionality - Add complete dataset generation toolchain in test_data/ - Add Jenkins integration with RUN_CONV_COMPREHENSIVE_DATASET parameter - Ready for comprehensive convolution testing with scalable datasets * Update convolution test dataset generation pipeline * add 2d, 3d dataset csv files * Remove CSV test dataset files from repository * Update generate_test_dataset.sh * Fix channel division for MIOpen to CK conversion * Remove unnecessary test files * Fix clang-format-18 formatting issues * TEST: Enable comprehensive dataset tests by default * Fix test_data path in Jenkins - build runs from build directory * Add Python dependencies and debug output for CSV generation * Remove Python package installation - not needed * Add better debugging for generate_test_dataset.sh execution * Fix Jenkinsfile syntax error - escape dollar signs * Add PyTorch to Docker image for convolution test dataset generation - Install PyTorch CPU version for lightweight model execution - Fixes Jenkins CI failures where CSV files were empty due to missing PyTorch - Model generation scripts require PyTorch to extract convolution parameters * Add debugging to understand Jenkins directory structure and CSV file status - Print current working directory - List CSV files in test_data directory - Show line counts of CSV files - Will help diagnose why tests fail in Jenkins * Fix clang-format-18 formatting issues - Applied clang-format-18 to test file - Fixed brace placement and whitespace issues * Add detailed debugging for CSV dataset investigation - Check generated_datasets directory contents - List all CSV files with line counts - Show first 5 lines of main CSV file - Applied clang-format-18 formatting - This will help identify why CSV files are empty in Jenkins * keep testing add pytorch installation in shell script * Use virtual environment for PyTorch installation - Jenkins user doesn't have permission to write to /.local - Create virtual environment in current directory (./pytorch_venv) - Install PyTorch in virtual environment to avoid permission issues - Use PYTHON_CMD variable to run all Python scripts with correct interpreter - Virtual environment will be reused if it already exists * Remove debug code and reduce verbose logging in Jenkins - Remove bash -x and debug commands from Jenkinsfile execute_args - Remove all debug system() calls and getcwd from C++ test file - Remove unistd.h include that was only needed for getcwd - Remove debug print in CSV parser - Add set +x to generate_test_dataset.sh to disable command echo - Redirect Python script stdout to /dev/null for cleaner output This makes Jenkins logs much cleaner while still showing progress messages. * install gpu torch * Clean up and optimize comprehensive dataset test pipeline - Reorder Jenkinsfile execution: build -> generate data -> run test - Remove commented-out debug code from generate_test_dataset.sh - Ensure all files end with proper newline character (POSIX compliance) - Keep useful status messages while removing development debug prints - Set MAX_ITERATIONS=0 for unlimited test generation in production * Add configuration modes to reduce test execution time - Add --mode option (half/full) to generate_model_configs.py - half mode (default): ~278 configs (224 2D + 54 3D) -> ~1,058 total tests - full mode: ~807 configs (672 2D + 135 3D) -> ~3,093 total tests - Update generate_test_dataset.sh to use CONFIG_MODE environment variable - Keeps all model types but reduces parameter combinations intelligently - Fixes Jenkins timeout issue (was running 3,669 tests taking 17+ hours) - Default half mode should complete in ~4-5 hours instead of 17+ hours * Add small mode for quick testing of comprehensive dataset * jenkins pipeline test done * jenkins test done * Trigger CI build * remove test comment and update data generation option as half --------- Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> [ROCm/composable_kernel commit: `19d5327c45`]	2025-08-26 22:18:05 +02:00
JH-Leon-KIM-AMD	68eecd0d5e	CSV-driven convolution test pipeline (#2581 ) * Add CSV-driven convolution test pipeline - Add test_grouped_convnd_fwd_dataset_xdl.cpp with CSV reader functionality - Add complete dataset generation toolchain in test_data/ - Add Jenkins integration with RUN_CONV_COMPREHENSIVE_DATASET parameter - Ready for comprehensive convolution testing with scalable datasets * Update convolution test dataset generation pipeline * add 2d, 3d dataset csv files * Remove CSV test dataset files from repository * Update generate_test_dataset.sh * Fix channel division for MIOpen to CK conversion * Remove unnecessary test files * Fix clang-format-18 formatting issues --------- Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> [ROCm/composable_kernel commit: `b963478759`]	2025-08-13 16:24:34 +02:00
aledudek	c26c2b1fdc	Add grouped conv fwd 3d GKCYX instances for f32, f16, bf16 (#2069 ) * Part1 * Add grouped conv fwd 3d GKCYX instances for f32, f16, bf16 * Add missing coma * Add missing cpp instance files * Fix 3d layout * Add missing closing bracket * Add missing comp x2 and part2 instances * Fix typo in instance name * fix * Fix --------- Co-authored-by: Bartlomiej Kocot <barkocot@amd.com> [ROCm/composable_kernel commit: `7c32652e03`]	2025-04-16 11:00:55 +02:00
Bartłomiej Kocot	67c3bcfce1	Grouped conv fwd v3 fix for SplitN an G > 1 (#2038 ) * Grouped conv fwd v3 fix for SplitN an G > 1 * Remove int8 large test * Retore int8 test [ROCm/composable_kernel commit: `ec742908bd`]	2025-04-01 13:19:35 -07:00
Bartłomiej Kocot	6ccfb817e4	Add support for GKCYX grouped conv fwd (#2015 ) * Add support for GKCYX grouped conv fwd * fixes * fix * changelog * Fixes [ROCm/composable_kernel commit: `54c81a1fcf`]	2025-03-26 21:13:38 +01:00
Bartłomiej Kocot	d72d2475fd	Add NGCHW bf16 grouped conv fwd instances (#1783 ) * Add NGCHW bf16 grouped conv fwd instances * add missed cmake [ROCm/composable_kernel commit: `159fa31946`]	2025-01-01 18:00:06 +01:00
Lin Sun	6cc9f5e486	Linsun/convint8 fwd instances (#1626 ) Add instances for int8 grouped conv2d fwd --------- Co-authored-by: root <root@dell300x-pla-t28-03.pla.dcgpu> Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> [ROCm/composable_kernel commit: `0c9012fb70`]	2024-11-04 16:33:20 -08:00
Bartłomiej Kocot	e4f4e04add	Add support for NGCHW in grouped conv fwd (#1499 ) * Support NGCHW in grouped conv fwd * Remove not needed variable * Fixes [ROCm/composable_kernel commit: `4ba52b35dc`]	2024-09-20 10:45:46 +02:00
Bartłomiej Kocot	0f7c45915c	Add performance and large tensor tests for grouped conv (#1456 ) * Add performance and large tensor tests for grouped conv * Resize tests * Resize tests * update the python script to parse the grouped_conv results * Remove int8 tests * change bwd wei layout --------- Co-authored-by: illsilin <Illia.Silin@amd.com> [ROCm/composable_kernel commit: `2581727d2a`]	2024-08-16 07:48:30 -07:00
Bartłomiej Kocot	458d8bef26	Add Grouped Conv Fwd Large Tensor kernel (#1432 ) * Support 64 bit indexing * Add new grouped conv fwd kernel for large tensors * Add instances large tensor * Fixes for transform conv to gemm * Fixes * fixes * Remove not needed instances * examples fixes * Remove not need ds arrays * Fix tests * Add 2GB check in gridwise dl * Fixes [ROCm/composable_kernel commit: `4ec5c52a0c`]	2024-08-06 10:06:10 +02:00
Bartłomiej Kocot	c885afdaae	Support access per groups and filter3x3 in grouped conv fwd (#1382 ) * Support access per groups and filter3x3 in grouped conv fwd * Fixes for large cases * Fixes for large tensors [ROCm/composable_kernel commit: `82e8a78a3f`]	2024-07-12 11:08:42 -07:00
Jun Liu	fa73739812	Fix issue with multiple targets and remove smfmac tests from unsupported test targets (#1372 ) [ROCm/composable_kernel commit: `959073842c`]	2024-07-03 23:34:38 -07:00
Bartłomiej Kocot	cc0dd8a45e	Fix cmake warnings (#1342 ) * Cmake add -Wno-nvcc-compt * Remove template without initialization list * dpp remove template without init list * Fixes [ROCm/composable_kernel commit: `510325a468`]	2024-06-21 09:47:58 +02:00
Bartłomiej Kocot	d413c30ff4	Support large tensors in grouped conv fwd (#1332 ) * Support large tensors in grouped conv fwd * Multi ABD fixes * Fix calculate element space size [ROCm/composable_kernel commit: `dc1e9c5df9`]	2024-06-14 09:53:03 -05:00
Bartłomiej Kocot	41c68496e6	Integrate universal gemm with conv forward (#1320 ) * Integrate universal gemm with conv fwd * Fix conv fwd wmma test * Fix instances * Remove direct load check [ROCm/composable_kernel commit: `ac58cc5d1d`]	2024-06-05 13:01:29 -05:00
Illia Silin	1f4d13b2b5	Split the instances by architecture. (#1223 ) * parse examples inside the add_example_executable function * fix the example 64 cmake file * add xdl flag to the gemm_bias_softmax_gemm_permute example * add filtering of tests based on architecture type * enable test_grouped_gemm for gfx9 only * enable test_transpose only for gfx9 * only linnk test_transpose if it gets built * split the gemm instances by architectures * split gemm_bilinear,grouped_conv_bwd_weight instances by targets * split instances by architecture * split grouped_conv instances by architecture * fix clang format * fix the if-else logic in group_conv headers * small fix for grouped convolution instances * fix the grouped conv bwd weight dl instances * fix client examples * only enable client examples 3 and 4 on gfx9 * set the gfx9 macro * make sure the architecture macros are set by cmake * use separate set of xdl/wmma flags for host code * sinmplify the main cmake file * add conv_fwd_bf8 instance declaration [ROCm/composable_kernel commit: `ae57e5938e`]	2024-04-02 09:42:17 -07:00
Bartłomiej Kocot	6a98ad9d89	Introduce multiABD api and deprecate multiD (#1035 ) * Introduce multiABD api and deprecate multiD * Replace multiD with multiABD * Mark structures as deprecated * Change doxygen deprecated to note to avoid warnings [ROCm/composable_kernel commit: `f2398f612d`]	2023-11-14 17:00:40 +01:00
Bartłomiej Kocot	4f95517ccc	Support multi AB for grouped conv fwd xdl (#1027 ) * Support multi AB for grouped conv fwd xdl * Add instances * Add client example * Add example * Add interface test * Minor fixes Minor fixes Minor fixes * Comment fixes * Fixes * Reference fix * Test xdl fixes * Improve multi_ab interface test [ROCm/composable_kernel commit: `49e52bb357`]	2023-11-10 15:54:44 +01:00
Bartłomiej Kocot	a85bb57471	Add 3d grouped conv fwd wmma instances (#935 ) * Add 3d grouped conv fwd wmma instances * Refactor fwd conv tests * Split wmma instances for each specialization * Minor stylistic fixes [ROCm/composable_kernel commit: `c95538325b`]	2023-09-23 18:56:31 +02:00
Bartłomiej Kocot	ac574360c7	Enable grouped conv with small K or C (#822 ) * Enable grouped conv with small K or C * Add missing instances * Refactor grouped conv fwd instances * Fix fp16 instances since it supports src_per_vec %2 = 0 * Add generic instances [ROCm/composable_kernel commit: `472fa029ba`]	2023-08-09 10:40:55 -05:00
Illia Silin	b57fbee2f1	update copyright headers (#726 ) [ROCm/composable_kernel commit: `b94fd0b227`]	2023-05-31 18:46:57 -05:00
Po Yen Chen	02db748e74	Modularize ckProfiler operations (#514 ) * Re-structure ckProfiler source files * Rename profiler.cpp to main.cpp * Modularize ckProfiler operations * Add description for profiler operations * Use longer name to avoid name collision * Use macro to delay expansion * Use std::move() to avoid object copying * Prohibit users from calling dtor * Use macro to eliminate redundant code * Make friend function hidden * Add missing include directive <iostream> * Fix wrong include directives * Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com> [ROCm/composable_kernel commit: `8784a72e23`]	2022-12-01 15:15:02 -06:00
Chao Liu	236f946292	Clean up conv example, Instances, profiler and test (#324 ) * convnd_fwd fp16 example * update example * update example * update instance * updating refernce conv * update reference conv * update conv fwd profiler * update conv 1d and 3d instance * update include path * clean * update profiler for conv bwd data and weight * update conv bwd weight * clean * update conv example * update profiler for conv bwd weight * update ckprofiler for conv bwd data * fix reference conv bwd data bug; update conv bwd data test * update examples * fix initialization issue * update test for conv fwd * clean * clean * remove test case too sensitive to error threshhold * fix test * clean * fix build * adding conv multiple d * adding conv multiple D * add matrix padder * add gemm padding to convnd * adding group conv * update gemm multi-d * refactor * refactor * refactor * clean * clean * refactor * refactor * reorg * add ds * add bias * clean * add G * adding group * adding group * adding group * update Tensor * clean * update example * update DeviceGemmMultipleD_Xdl_CShuffle * update conv bwd-data and bwd-weight * upate contraction example * update gemm and batch gemm with e permute * fix example build * instance for grouped conv1d * update example * adding group conv instance * update gemm bilinear instance * update gemm+add+add+fastgelu instance * update profiler * update profiler * update test * update test and client example * clean * add grouped conv into profiler * update profiler * clean * add test grouped conv, update all conv test to gtest * update test [ROCm/composable_kernel commit: `500fa99512`]	2022-07-29 18:19:25 -05:00

24 Commits