mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-15 02:27:57 +00:00
* Format
* Format
* Format
* Remove const
* Use the right template
* Format
* Format
* add row/col instances
* Add missing file
* fixed
* fixing block to etile error
* Format
* Updates
* Format
* fixed rrr layout
* generating a sample JSON file: currently contains includes, prologue/epilogue and instances
* version where the json is passed into the instances to generate a key
* updated run function to just launch kernel
* updated run function: only contains kernel object, json file is updated but still needs to be cleaned up, added front-end API to parse JSON into character buffer
* adding in testing files
* cleaned up comments, still need to work on including header files
* removed unneeded files
* removed/commented out JSON implementation
* added fusion(prologue/epilogue) into instance generation
* working on instance selection
* added instance selection, need to fix instance validation
* removed block2etile map validity check for testing purposes
* test running: failing due to incorrect files/input
* all grid descs/ptrs completed, but device file not found
* Update test and embed modules
* Restore older version
* added convolution operation, written test, debugging generated code for compilation
* attempting to include CK in host directory: _Float16 error
* CK header file issues
* slight fix
* don't crash when hip can't report total memory
* dump generated code to a file
* changing sizes
* creating tensor descriptors using CK methods: set up grid desc manually, also trying to set up an argument pointer - this needs to be fixed
* some fixes to call the device code
* separating test files for conv and gemm
* completed arg ptr, now have linking errors
* clang format fix
* resolved linker issues in conv test
* remove dependency on libutility from ck
* resolved num dim error
* properly passing arg ptr, errors with passing typenames: redefinition/redeclaration
* undo the commenting of device function
* hand created kernel code to find rtc issues
* dump the full src to file
* resolved redeclaration errors, cleaned up errors for Amber's kernel code
* debugging purposes: redeclaration error
* config files
* resolved errors for NumTensor and redeclaration, formatted version.h
* resolved most errors in manually added kernel and my own. error with calling kernel object: overloaded function type
* WIP: close to getting kernel compiled
* WIP: fixing rtc errors
* fixed sequence errors, formatting, still one error with run fcn
* yay: kernel compiles and runs
* updated templated/generated version to run and compile
* minor fixes
* working generated example, resolved memory access error due to padding
* adding in reference kernel, validation failing against reference
* debugging: printing kernel argsz
* reduced error in results
* debugged reference kernel and output errors, added to generated version, currently debugging prologue function issues
* working validation (using reference convolution) with prologue function for both hard-coded and generated version
* WIP: create an alt version that creates Argument on the device
* wip: added new duplicate files, fixed fusion templating errors from working example, setting up kernel arguments
* wip: making necessary methods device code
* added grid descs, working on grid pointers, errors with stl numerics
* wip: updating kernel args - issue, replacing some std functions
* replaced std::accumulate call with temp hardcoded version
* wip: args causing memory issue
* Construct Argument object inside the kernel and use it to call convolution device function. Code runs and verification passes
* adding object file dump
* temporary hardcoding of grid size, can remove device op inst + arg ptr
* minor fix for grid size
* added modified example where arg ptr is created on the device for generated version as well
* removed device op instance and arg ptr from modified examples
* moving device op file for testing purposes and to properly build CK
* commenting out print-outs
* adjust compiler args to produce a valid ELF file
* temporary removal of validation
* reverting compiler args back for working example
* retrieve necessary arguments from generated template parameters in correct format
* calculating grid size on host-side, still need to clean up process, pass parameters to host functions properly
* scaled up factory functions/wrapper structs to implement host-side launch parameter calculations using CK host side functions - in hard-coded example
* temporary change to generate ELF format binary object file
* removed unecessary code, added comments
* formatting fix
* cleaned up code, added new tests, restructured library: move helper into CK
* refactored launch parameter calculation to be more concise
* renamed files and variables for more clarity/uniformity
* more code cleaning, removed debug statements
* moved majority of my files into codegen directory, running properly
* updated Embed.cmake(string_view) in codegen directory
* updated host directory to match Embed.cmake as well
* added old tests in
* updated instance generation methods to be more concise
* removed layout from launch parameter calculation
* working test
* fixed issue with verification, all instances working
* updated verification in other tests
* removed duplicate matrix padder file, removed code dumps
* removed old hard-coded tests
* removed old host directory, all files in codegen directory now
* fixed copyright in files
* commenting out validation
* renamed files
* made changes for review: fixed copyright, renamed files for clarity, removed comments, refactored code
* updated headers
* removing duplicate file for fwd conv to gemm, merging with original file
* fix building codegen with clang++ directly
* resolving build error from conv_fwd_to_gemm
* fix for previous error
* renaming tests
* created common test file
* cleaned up code, added comments
* renamed device op
* fixed typos in comments
* removed extra space
* code cleanup: resolving Amber's comments
* removed wrapper struct for matrix padder, fixed template
* cleaned up if statements for better readability
---------
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: Jing Zhang <jizha@amd.com>
Co-authored-by: M. Amber Hassaan <amber_474@yahoo.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
[ROCm/composable_kernel commit: 3e9711f0cb]
107 lines
3.0 KiB
C++
107 lines
3.0 KiB
C++
#include <rtc/hip.hpp>
|
|
#include <rtc/manage_ptr.hpp>
|
|
#include <stdexcept>
|
|
#include <cassert>
|
|
#include <iostream>
|
|
|
|
namespace rtc {
|
|
|
|
using hip_ptr = RTC_MANAGE_PTR(void, hipFree);
|
|
|
|
std::string hip_error(int error) { return hipGetErrorString(static_cast<hipError_t>(error)); }
|
|
|
|
int get_device_id()
|
|
{
|
|
int device;
|
|
auto status = hipGetDevice(&device);
|
|
if(status != hipSuccess)
|
|
throw std::runtime_error("No device");
|
|
return device;
|
|
}
|
|
|
|
std::string get_device_name()
|
|
{
|
|
hipDeviceProp_t props{};
|
|
auto status = hipGetDeviceProperties(&props, get_device_id());
|
|
if(status != hipSuccess)
|
|
throw std::runtime_error("Failed to get device properties");
|
|
return props.gcnArchName;
|
|
}
|
|
|
|
bool is_device_ptr(const void* ptr)
|
|
{
|
|
hipPointerAttribute_t attr;
|
|
auto status = hipPointerGetAttributes(&attr, ptr);
|
|
if(status != hipSuccess)
|
|
return false;
|
|
return attr.type == hipMemoryTypeDevice;
|
|
}
|
|
|
|
void gpu_sync()
|
|
{
|
|
auto status = hipDeviceSynchronize();
|
|
if(status != hipSuccess)
|
|
throw std::runtime_error("hip device synchronization failed: " + hip_error(status));
|
|
}
|
|
|
|
std::size_t get_available_gpu_memory()
|
|
{
|
|
size_t free;
|
|
size_t total;
|
|
auto status = hipMemGetInfo(&free, &total);
|
|
if(status != hipSuccess)
|
|
{
|
|
std::cerr << "Failed getting available memory: " + hip_error(status) << std::endl;
|
|
return (8ull * 1024ull * 1024ull * 1024ull);
|
|
}
|
|
return free;
|
|
}
|
|
|
|
std::shared_ptr<void> allocate_gpu(std::size_t sz, bool host)
|
|
{
|
|
if(sz > get_available_gpu_memory())
|
|
throw std::runtime_error("Memory not available to allocate buffer: " + std::to_string(sz));
|
|
void* alloc_ptr = nullptr;
|
|
auto status = host ? hipHostMalloc(&alloc_ptr, sz) : hipMalloc(&alloc_ptr, sz);
|
|
if(status != hipSuccess)
|
|
{
|
|
if(host)
|
|
throw std::runtime_error("Gpu allocation failed: " + hip_error(status));
|
|
else
|
|
return allocate_gpu(sz, true);
|
|
}
|
|
assert(alloc_ptr != nullptr);
|
|
std::shared_ptr<void> result = share(hip_ptr{alloc_ptr});
|
|
return result;
|
|
}
|
|
|
|
std::shared_ptr<void> write_to_gpu(const void* x, std::size_t sz, bool host)
|
|
{
|
|
gpu_sync();
|
|
auto result = allocate_gpu(sz, host);
|
|
assert(is_device_ptr(result.get()));
|
|
assert(not is_device_ptr(x));
|
|
auto status = hipMemcpy(result.get(), x, sz, hipMemcpyHostToDevice);
|
|
if(status != hipSuccess)
|
|
throw std::runtime_error("Copy to gpu failed: " + hip_error(status));
|
|
return result;
|
|
}
|
|
|
|
std::shared_ptr<void> read_from_gpu(const void* x, std::size_t sz)
|
|
{
|
|
gpu_sync();
|
|
std::shared_ptr<char> result(new char[sz]);
|
|
assert(not is_device_ptr(result.get()));
|
|
if(not is_device_ptr(x))
|
|
{
|
|
throw std::runtime_error(
|
|
"read_from_gpu() requires Src buffer to be on the GPU, Copy from gpu failed\n");
|
|
}
|
|
auto status = hipMemcpy(result.get(), x, sz, hipMemcpyDeviceToHost);
|
|
if(status != hipSuccess)
|
|
throw std::runtime_error("Copy from gpu failed: " + hip_error(status)); // NOLINT
|
|
return std::static_pointer_cast<void>(result);
|
|
}
|
|
|
|
} // namespace rtc
|