Commit Graph

4 Commits

Author SHA1 Message Date
Illia Silin
717f2efef7 [rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)
[CK] add composable kernel support on gfx1250 (#6978)

## Motivation

Add composable kernel support on gfx1250.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Qun Lin <qlin@amd.com>
Co-authored-by: jialuo12_amdeng <jia.luo@amd.com>
Co-authored-by: Andriy Roshchenko <andriy.roshchenko@amd.com>
Co-authored-by: hsivasun_amdeng <haresh.sivasuntharampillai@amd.com>
2026-05-15 06:46:51 -07:00
Aviral Goel
004784ef98 chore(copyright) update library wide CMakeLists.txt copyright header template (#3313)
* chore(copyright) update library wide CMakeLists.txt files copyright header template

* Fix build

---------

Co-authored-by: Sami Remes <samremes@amd.com>
2025-11-28 13:49:54 -08:00
rocking5566
82d7d9938f Hotfix binary elementwise (for broadcast on fastest axis) (#254)
* Support different length of ScalarPerVector

* Add example of broadcast on fastest axis

* Typo

* Refine fastest example

* Add dimension check

* Modify fastest broadcast example to 3d

* Enforce users give scalarPerVector explicitely

* 1. Add CscalarPerVedctor
2. Not only broadcast on fastest need to set scalarPerVector to 1

* Rename var

* Move IsScalarPerVectorValid() inside IsSupportedArgument()

* Separate GridDesc_M0 into A, B and C

* rename var

* Rename var of length

Co-authored-by: rocking <chunylai@amd.com>
2022-05-25 11:17:27 -05:00
rocking5566
aafc3ac27a elementwise op (#238)
* Add elementwise operation kernel and example

* Add comment

* Add template argument of dim . Prepare to support multiple dimension

* Rename example

* Support 1 dimension

* Add static assert

* Add comment

* Extract pad

* Remove redundant argument

* Support any dimension for elementwise operation

* Remove line

* Let it be the multiple number of CU

* Move thread per block to the parameter of constructor

* rename threadPerBlock with blockSize

* Support double

* rename kernel function name

* remove redundant include header

* Refine type

* Need to the final dimension

* Refine variable name

* Refine type

* Use index_t instead of int in API

Co-authored-by: rocking <chunylai@amd.com>
2022-05-18 23:34:35 -05:00