Commit Graph

6 Commits

Author SHA1 Message Date
Aviral Goel
004784ef98 chore(copyright) update library wide CMakeLists.txt copyright header template (#3313)
* chore(copyright) update library wide CMakeLists.txt files copyright header template

* Fix build

---------

Co-authored-by: Sami Remes <samremes@amd.com>
2025-11-28 13:49:54 -08:00
Max Podkorytov
79aae7c7f7 [CK Tile] enable building examples by default (#3259)
* remove EXCLUDE_FROM_ALL from ck-tile examples
-> +15 min build time w/ 64 threads for a single arch

* fix cpp17 compile error in the ck-tile examples

---------

Co-authored-by: khuagarw <khuagarw@amd.com>
Co-authored-by: Ding, Yi <yi.ding@amd.com>
2025-11-26 16:24:44 -08:00
Aviral Goel
aed0f5880c Label CMakeLists message() as DEBUG or STATUS for clean build output (#2301)
* - elevate important build messages to log level STATUS
- comment out the rest (temporarily)

* - marked all low importance build messages as log_level=DEBUG
2025-06-10 10:46:47 -07:00
Illia Silin
9a9f59ae69 Revert "Add ck tile examples to package (#1880)" (#2150) 2025-04-30 10:20:16 -07:00
jakpiase
434d19f696 Add ck tile examples to package (#1880)
* add ck tile examples to package

* Update jenkinsfile

* fix for jenkinsfile

* fix for building ck tile code on non gfx9

* compile ck tile examples only for gfx94

* include ck tile examples in all target

* fix for basic gemm UseStructuredSparsity

* Update CMakeLists.txt

* Update gemm_pipeline_problem.hpp

* add targets to rocm install

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2025-04-28 09:53:19 -07:00
ltqin
0394f8a713 update layernorm (#1570)
* port layernorm

* change warp_welford.hpp

* Update warpshuffle

* 1. Add save mean and save std back
2. Move construction of tensor_view and tile_window to operator()

* refine welford max count calculation

* unify layernorm api

* Rename file

* Remove save mean and inv std

* Revert "refine welford max count calculation"

This reverts commit 022365802b.

* Fix order of parameter

* refine welford max count calculation again

* Remove fp32 instances

* Fix bug of padding

* refactor api

* Support bf16

* Extract common function

* Refine arg of operator()

* Add kMThreadPerBlock to template parameter

* clang format

* Refine variable name

* Refine file name

* remove redundant line

* refactor layernorm2d pipeline and add block-per-block utility

* fix name

* rename more

* add more block-per-tile instance

* remove duplicated define

* update instance for 2048, 1024 case

* support up to 2048 now

* opt loading

* add n1536

* Add two pass pipeline

* format

* Fix incorrect type

* parallel compilation

* Use smaller N

* fix 2p pass

* Support Repeat_M in distribution

* Refine nameing

* Add reduce example

---------

Co-authored-by: letaoqin <letaoqin@amd.com>
Co-authored-by: aska-0096 <haocwang@amd.com>
Co-authored-by: rocking <ChunYu.Lai@amd.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>
2024-10-22 09:26:18 +08:00