Changes to fix errors and warnings when using gcc 16.1.0:
- Copy changes from 5c2b22da81 in upstream BLIS to extend disabling of
tree-vectorization in affected kernels to gcc 16 and later.
- Remove unused variables in bli_packm_blk_var1_md.c and bli_util_unb_var1.c
to fix warning messages.
Background
bp (base pointer) is the %rbp/%ebp register on x86/x86-64. Inline assembly
kernels in BLIS use asm volatile blocks where they manually manage registers
- including saving and restoring bp themselves to use it as a general-purpose
register for holding loop counters or matrix pointers.
When GCC's tree-vectorizer (specifically the superword-level parallelism (SLP)
pass) runs on a translation unit containing inline asm, it can generate code
that itself needs bp as a frame pointer or in the vectorized prologue/epilogue.
At that point GCC internally marks bp as unavailable and then, when it tries to
compile the inline asm block that also references bp, it throws an error.
As a workaround, disabling tree vectorization for the entire file removes the
conflict - with no vectorizer-generated code, bp stays free for the inline asm.
* Resolved memory-access issues in the SGEMM SUP kernels on AVX2 and AVX-512 by correcting instructions that could read invalid addresses in the C matrix.
* Removed k=0 kernel gtests for the native SGEMM and DGEMM paths as these tests caused spurious failures for kernels that are not intended to handle this case.
* Standardized all instruction macros to lowercase in the Zen4 kernel to improve readability and code consistency.
---------
AMD-Internal: CPUPL-8117
Co-authored-by: Rayan <rohrayan@amd.com>
* ZGEMM SUP: Add conjugate support for AVX-512 kernels on Zen4/Zen5/Zen6
- Add CONJA, CONJB and CONJA_CONJB variants to zgemm SUP micro-tiles
- Enable SUP path for conjugate cases when both are same type
- Unify RRC/CRC storage to use CV kernel variant
- Update SUP dispatch to handle conjugate flags correctly
Note: CONJ_NO_TRANSPOSE + CONJ_NO_TRANSPOSE and
CONJ_TRANSPOSE + CONJ_TRANSPOSE remain unsupported
---------
Co-authored-by: harsdave <harsdave@amd.com>
Copy of similar change in upstream BLIS (843a5e8) to fix issues
https://github.com/flame/blis/issues/873 and
https://github.com/amd/blis/issues/50
Details:
- Previously, `<omp.h>` was included in `bli_thrcomm_openmp.h` so that the
framework could access the necessary OpenMP functions.
- As @melven reported (#873), this causes issues when `blis.h` is included
in C++ code since the `<omp.h>` include happens with `extern "C"`.
- Move the include from the header to the necessary .c files so that it
does not "pollute" `blis.h`.
Thanks to @DaAwesomeP and @bartoldeman for reporting this issue in
AOCL BLIS
AMD-Internal: [CPUPL-7303]
Fixing some inefficiencies on the zen (AVX2) SUP RD kernel for SGEMM.
After performing the iteration for the 8 loop, the next loop that was being performed was the 1 loop for the k-direction.
This caused a lot of unnecessary iterations when the remainder of k < 8.
This has been fixed by introducing masked operations for k < 8
When remainder of k == 1, we handle this with the original non-masked code (with a branch) as the masked code introduces more penalty because of the masking operation.
There were also some unnecessary instructions in the zen4 kernels which have been removed.
AMD-Internal: https://amd.atlassian.net/browse/CPUPL-7775
Co-authored-by: rohrayan@amd.com
Adding SGEMM tiny path for Zen architectures.
Needed to cover some performance gaps seen wrt MKL
Only allowing matrices that all fit into the L1 cache to the tiny path
Only tuned for single threaded operation at the moment
Todo: Tune cases where AVX2 performs better than AVX512 on Zen4
Todo: The current ranges are very conservative, there may be scope to increase the matrix sizes that go into the tiny path
AMD-Internal: CPUPL-7555
Co-authored-by: Rohan Rayan rohrayan@amd.com
Previous commit (30c42202d7) for this problem turned off
-ftree-slp-vectorize optimizations for all kernels. Instead, copy
the approach of upstream BLIS commit 36effd70b6a323856d98 and disable
these optimizations only for the affected files by using GCC pragmas
AMD-Internal: [CPUPL-6579]
Naming of Zen kernels and associated files was inconsistent with BLIS
conventions for other sub-configurations and between different Zen
generations. Other anomalies existed, e.g. dgemmsup 24x column
preferred kernels names with _rv_ instead of _cv_. This patch renames
kernels and file names to address these issues.
AMD-Internal: [CPUPL-6579]
In DTRSM small code path lower triangular kernels, extra data from upper triangular region is being read.
To fix this, new macros have been added to make sure only relevant data is read.
AMD-Internal: [SWLCSG-3611]
* Improve consistency of optimized BLAS3 code
Tidy AMD optimized GEMM and TRSM framework code to reduce
differences between different data type variants:
- Improve consistency of code indentation and white space
- Added some missing AOCL_DTL calls
- Removed some dead code
- Consistent naming of variables for function return status
- GEMM: More consistent early return when k=1
- Correct data type of literal values used for single precision data
In kernels/zen/3/bli_gemm_small.c and bli_family_*.h files:
- Set default values for thresholds if not set in the relevant
bli_family_*.h file
- Remove unused definitions and commented out code
AMD-Internal: [CPUPL-6579]
- Added the support for Tiny-CGEMM as part of the existing
macro based Tiny-GEMM interface. This involved definining
the appropriate AVX2/AVX512 lookup tables and functions for
the target architectures(as per the design), for compile-time
instantiation and runtime usage.
- Also extended the current Tiny-GEMM design to incorporate packing
kernels as part of its lookup tables. These kernels will be queried
through lookup functions and used in case of wanting to support
non-trivial storage schemes(such as dot-product computation).
- This allows for a plug-and-play fashion of experimenting with
pack and outer product method against native inner product implementations.
- Further updated the existing AVX512 pack routine that packs the A matrix
(in blocks of 24xk). This utilizes masked loads/stores instructions to
handle fringe cases of the input(i.e, when m < 24).
- Also added the AVX512 outer product kernels for CGEMM as part of the
ZEN4 and ZEN5 contexts, to handle RRC and CRC storage schemes. This is
facilitated through optional packing of A matrix in the SUP framework.
AMD-Internal: [CPUPL-6498]
Co-authored-by: Vignesh Balasubramanian <vignbala@amd.com>
Co-authored-by: Varaganti, Kiran <Kiran.Varaganti@amd.com>
More changes to standardize copyright formatting and correct years
for some files modified in recent commits.
AMD-Internal: [CPUPL-5895]
Change-Id: Ie95d599710c1e0605f14bbf71467ca5f5352af12
- As part of AOCL-BLAS, there exists a set of vectorized
SUP kernels for GEMM, that are performant when invoked
in a bare-metal fashion.
- Designed a macro-based interface for handling tiny
sizes in GEMM, that would utilize there kernels. This
is currently instantiated for 'Z' datatype(double-precision
complex).
- Design breakdown :
- Tiny path requires the usage of AVX2 and/or AVX512
SUP kernels, based on the micro-architecture. The
decision logic for invoking tiny-path is specific
to the micro-architecture. These thresholds are defined
in their respective configuration directories(header files).
- List of AVX2/AVX512 SUP kernels(lookup table), and their
lookup functions are defined in the base-architecture from
which the support starts. Since we need to support backward
compatibility when defining the lookup table/functions, they
are present in the kernels folder(base-architecture).
- Defined a new type to be used to create the lookup table and its
entries. This type holds the kernel pointer, blocking dimensions
and the storage preference.
- This design would only require the appropriate thresholds and
the associated lookup table to be defined for the other datatypes
and micro-architecture support. Thus, is it extensible.
- NOTE : The SUP kernels that are listed for Tiny GEMM are m-var
kernels. Thus, the blocking in framework is done accordingly.
In case of adding the support for n-var, the variant
information could be encoded in the object definition.
- Added test-cases to validate the interface for functionality(API
level tests). Also added exception value tests, which have been
disabled due to the SUP kernel optimizations.
AMD-Internal: [CPUPL-6040][CPUPL-6018][CPUPL-5319][CPUPL-3799]
Change-Id: I84f734f8e683c90efa63f2fa79d2c03484e07956
Some kernel file names were the same for different sub-configurations,
which could result in duplicate copies of the same object being archived
depending upon the order of (re-)compiling the source files. Rename the
files to be specific to each sub-configuration to avoid this problem.
AMD-Internal: [CPUPL-5895]
Change-Id: I182ac706e04a364f1df20fd0fb5b633eb10eeafb
This patch introduces comprehensive optimizations to the DGEMM kernel, focusing on loop
efficiency and edge kernel performance. The following technical improvements have been implemented:
1. **IR Loop Optimization:**
- The IR loop has been re-implemented in hand-written assembly to eliminate the overhead associated
with `begin_asm` and `end_asm` calls, resulting in more efficient execution.
2. **JR Loop Integration:**
- The JR loop is now incorporated into the micro kernel. This integration avoids the repetitive overhead
of stack frame management for each JR iteration, thereby enhancing loop performance.
3. **Kernel Decomposition Strategy:**
- The m dimension is decomposed into specific sizes: 20, 18, 17, 16, 12, 11, 10, 9, 8, 4, 2, and 1.
- For remaining cases, masked variants of edge kernels are utilized to handle the decomposition efficiently.
1. **Interleaved Scaling by Alpha:**
- Scaling by the alpha factor is interleaved with load instructions to optimize the instruction pipeline
and reduce latency.
2. **Efficient Mask Preparation:**
- Masks are prepared within inline assembly code only at points where masked load-store operations are necessary,
minimizing unnecessary overhead.
3. **Broadcast Instruction Optimization:**
- In edge kernels where each FMA (Fused Multiply-Add) operation requires a broadcast without subsequent reuse,
the broadcast instruction is replaced with `mem_1to8`.
- This allows the compiler to optimize by assigning separate vector registers for broadcasting, thus avoiding
dependency chains and improving execution efficiency.
4. **C Matrix Update Optimization:**
- During the update of the C matrix in edge kernels, columns are pre-loaded into multiple vector registers.
This approach breaks dependency chains during FMA operations following the scaling by alpha, thereby mitigating
performance bottlenecks and enhancing throughput.
These optimizations collectively improve the performance of the DGEMM kernel, particularly in handling edge cases and
reducing overhead in critical loops. The changes are expected to yield significant performance gains in matrix multiplication
operations.
This patch also involves changes for tiny gemm interface. A light
interface for calling kernels and removing calls to avx2 dgemm kernels
as we use avx512 dgemm kernels for all the sizes for zen4 and zen5.
For zen4 and zen5 when A matrix transposed(CRC, RRC), tiny kernel does not have
the support to handle such inputs and thus such inputs are routed to
gemm_small path.
AMD-Internal: [CPUPL-6054]
Change-Id: I57b430f9969ca39aa111b54fa169e4225b900c4a
- Standardize formatting (spacing etc).
- Add full copyright to cmake files (excluding .json)
- Correct copyright and disclaimer text for frame and
zen, skx and a couple of other kernels to cover all
contributors, as is commonly used in other files.
- Fixed some typos and missing lines in copyright
statements.
AMD-Internal: [CPUPL-4415]
Change-Id: Ib248bb6033c4d0b408773cf0e2a2cda6c2a74371
- Implemented bli_zgemm_16x4_avx512_k1_nn( ... ) AVX512 kernel to
be used as part of BLAS/CBLAS calls to ZGEMM. The kernel is built
for handling the GEMM computation with inputs having k = 1,
with the transpose values being N(for column-major) and T(for
row-major).
- Updated the zgemm_blis_impl( ... ) layer to query the architecture
ID and invoke the AVX2 or AVX512 kernel accordingly.
- Added API level tests for accuracy and code-coverage, as well as
micro-kernel tests for verifying functionality and out-of-bounds
memory accesses.
AMD-Internal: [CPUPL-5249]
Change-Id: Id1f8bebff3e0da83c7febe86299564fd658b2e84
Use of AOCL_ENABLE_INSTRUCTIONS in dgemm tiny code path is
unnecessary and incorrectly caused AVX512 code to be run
on zen4 and later processors when AOCL_ENABLE_INSTRUCTIONS=avx2
or equivalent options was selected.
Replace with code to select kernel in a similar way to other
dgemm code paths and other APIs. Note that at present AVX2 code
is used the smallest matrix sizes on all zen platforms.
AMD-Internal: [CPUPL-5078]
Change-Id: Ie6b4895461cbbb915d2b48b92fc063f5cd6adb85
BUG:
When alpha real and imaginary is zero
Output is computed as C= Beta * C + A * B instead of C = Beta * C
FIX:
Updated kernel to scale A * B product with alpha in case of alpha=0
Existing framework design:
- When alpha real and imaginary value is zero, framework handles to skip
kernel call to avoid alpha * A * B operation
- SCALM is invoked to perform Beta * C
- Accuracy issue was not observed as alpha=0 was handled in framework
- If we call kernel directly with alpha=0, results would be wrong
- Issue was figured out during microkernel testing using gtestsuite
AMD-Internal: [CPUPL-4454]
Change-Id: Ib6113f5226cd7c26a63781cdd20d35660f453803
- Kernel dimensions are 4x4.
- Two kernels are implemented, Right Upper and
Right lower.
- In case of Left variants of TRSM, transpose is
induced so that Right variant kernels can be used.
- No packing is performed in these kernels.
- Changes are made in the threshold to pick ZTRSM small
code path.
- BLIS_INLINE is removed from signature of
"TRSMSMALL_KER_PROT".
- These kernels do not support "ENABLE_TRSM_PREINVERSION".
- Newly added kernels do not support conjugate
transpose.
- Added multithreading to ZTRSM small code path.
AMD-Internal: [CPUPL-4324]
Change-Id: I683b1d5239593e54f433e7f27497d72dfbd9141c
- In 2x1 fringe case in [RUN/RLT] kernel, 3 scomplex
precision numbers are being read instead of 1 scomplex.
- Fixed the code to read only one scomplex.
AMD-Internal: [CPUPL-4403]
Change-Id: If3ac03ed864618382d3a382a8cdff7ff8a94eb7d
Implement full support for zen5 as a separate BLIS sub-configuration
and code path within amdzen configuration family.
AMD-Internal: [CPUPL-3518]
Change-Id: Iaa5096e0b83bf0f0c3fd1c41e601ccd29bda3c09
- In 3x1 fringe case in [RLN/RUT] kernel, 4 double
precision floats are being read instead of 3 doubles.
- Fixed the code to read only 3 double.
AMD-Internal: [CPUPL-4403]
Change-Id: If0afb155efefabe13487cf322d479981f1838aa2
Some text files were missing a newline at the end of the file.
One has been added.
AMD-Internal: [CPUPL-3519]
Change-Id: I4b00876b1230b036723d6b56755c6ca844a7ffce
1. Prefetch only MR rows or rows required for fringe cases
2. Specify prefetching offset - the least column address supported
by masked functions
3. Removed unnecessary prefetches in fringe case for mx4 kernels
Updated gtestuite for sgemm calls
AMD_Internal: [CPUPL-4221]
Change-Id: I1e2e7d3ebce37dc54a2f0a5c1c70ce0a6d4c8d6c
Segfault was reported through nightly jenkins job.
Issue was observed when running in MT mode.
Issue was due to extra broadcast being used.
Extra broadcast would access out of bound memory on input buffer
Cleaned up cobbler list by removing unused registers.
AMD_Internal: [CPUPL-4180]
Change-Id: I1c8715b2850ef855328f2ef12f215987299bdb2b
Added all fringe kernels with mask load store support
Fringe kernels cover m direction from 5 to 1 and
n direction from 15 to 1 for row storage format
- New edge kernels that uses masked load-store
instructions for handling corner cases.
- Mask load-store instruction macros are added.
vmaskmovps, VMASKMOVPS for masked load-store.
- It improves performance by reducing branching overhead
and by being more cache friendly.
- Mask load-store is added only for row storage format
AMD-Internal: [CPUPL-4041]
Change-Id: I563c036c79bf8e476a8ebde37f8f6db751fb3456
- This commit helps improving performance for very small input
by reducing framework check and routing all such inputs to
bli_dgemm_tiny_6x8_kernel. It forces single threaded computation
for such sizes.
- It invokes bli_dgemm_tiny_6x8_kernel for ZEN, ZEN2, ZEN3 and ZEN4
code path. Except for the case AOCL_ENABLE_INSTRUCTIONS environment
variable is set to avx512. In that case, such a small inputs are
routed to bli_dgemm_tiny_24x8_kernel avx512 kernel.
AMD-Internal: [CPUPL-1701]
Change-Id: Idf59f4a8ee76ee8f2514a33be2b618e3ce02383e
- This commit implements avx512 dgemm kernel for k=1 cases.
which gets called for zen4 codepath.
- Added architecture check for k=1 kernel in dgemm code path
to pick correct kernel based on cpu arhcitecture since now
blis is having avx2 and avx512 dgemm kernels for k=1 case.
- Previously in dgemm path bli_dgemm_8x6_avx2_k1_nn kernel was
being called irrespective of architecture type.
- Added architecture check before calling the kernel for case where
k=1, so only for respective architectures this kernel is invoked.
AMD-Internal: [CPUPL-4017]
Change-Id: I418bbc933b41db41d323b331c6d89893868a6971
- For k > KC, C matrix is getting scaled by beta on each
iteration. It should be scaled only once. Fixed the scaling
of C matrix by beta in K loop.
- Corrected A and B matrix buffer offsets, for cases where k > KC.
AMD-Internal: [CPUPL-4078]
AMD-Internal: [CPUPL-4079]
AMD-Internal: [CPUPL-4081]
AMD-Internal: [CPUPL-4080]
AMD-Internal: [CPUPL-4087]
Change-Id: I27f426caf48e094fd75f1f719acb4ac37d9daeaa
- This commit focused on enhancing the performance of dgemm
for matrices for very small dimenstions.
- blis_dgemm_tiny function re-uses dgemm sup kernels, bypassing
the conventional SUP framework code path. As SUP framework code path
requires the creation and initilization of blis objects,
accessing all the needed meta-information from objects, querying contexts
which adds performance penaulty while computing for matrices with very
small dimensions.
- To avoid such performance penaulty blis_dgemm_tiny function implements
a lightweight support code so that it can re-use dgemm SUP kernels such a way
that it directly operates on input buffers. It avoids framework overhead of
creating and intializing blis objects, context intialization, accessing other
large framework data structures.
- blis_dgemm_tiny function checks for threshold condition to match before
picking the kernel. For zen, zen2, zen3 architecture tiny kernel is invoked
for any shape as long as m < 8 and k <= 1500 or m < 1000 and n <= 24 and k <=1500.
While for zen4 as long as dimensions are less than 1500 for m,n,k tiny kernel is
invoked.
-blis_dgemm_tiny function supports single threaded computation as of now.
AMD-Internal: [CPUPL-3574]
Change-Id: Ife66d35b51add4fccbeebd29911e0c957e59a05f
- Added 2x6 ZGEMM row-preferred kernel.
- Kernel supports prefetch_a, prefetch_b,
prefetch_a_next and prefetch_b_next.
- Multiple Ways to prefetch c are supported.
- prefetch_a and prefetch_c are enabled by
default.
- K loop is divided into multiple subloops for
better c prefetch.
- Added 2x6 ZTRSM row-preferred lower
and upper kernels using AVX2 ISA.
- These kernels are used for ZTRSM only, zgemm
still uses 3x4 kernel.
- Kernels support row/col/gen storage.
- Updated the zen3 and zen4 config to enable
use of these kernels for TRSM in zen3 and
zen4 path.
- Updated CMakeLists.txt with ZGEMM kernels for
windows build.
AMD-Internal: [CPUPL-3781]
Change-Id: I236205f63a7f6b60bf1a5127a677d27425511e73
* commit 'b683d01b':
Use extra #undef when including ba/ex API headers.
Minor preprocessor/header cleanup.
Fixed typo in cpp guard in bli_util_ft.h.
Defined eqsc, eqv, eqm to test object equality.
Defined setijv, getijv to set/get vector elements.
Minor API breakage in bli_pack API.
Add err_t* "return" parameter to malloc functions.
Always stay initialized after BLAS compat calls.
Renamed membrk files/vars/functions to pba.
Switch allocator mutexes to static initialization.
AMD-Internal: [CPUPL-2698]
Change-Id: Ied2ca8619f144d4b8a7123ac45a1be0dda3875df
- Implemented bli_zgemm_4x4_avx2_k1_nn( ... ) kernel to replace
bli_zgemm_4x6_avx2_k1_nn( ... ) kernel in the BLAS layer of
ZGEMM. The kernel is built for handling the GEMM computation
with inputs having k = 1, and the transpose values for A and
B as N.
- The kernel dimension has been changed from 4x6 to 4x4,
due to the following reasons :
- The 1xNR block of B in the n-loop can be reused over multiple
MRx1 blocks of A in the m-loop during computation. Similar
analogy exists for the fringe cases.
- Every 1xNR block of B was scaled with alpha and stored in
registers before traversing in the m-dimension. Similar change
was done for fringe cases in n-dimension.
- These registers should not be modified during compute, hence
the kernel dimension was changed from 4x6 to 4x4.
- The check for early exit(with regards to BLAS mandate) has been
removed, since it is already present in the BLAS layer.
- The check for parallel ZGEMM has been moved post the redirection to
this kernel, since the kernel is single-threaded.
- The bli_kernels_zen.h file was updated with the new kernel signature.
AMD-Internal: [CPUPL-3622]
Change-Id: Iaf03b00d5075dd74cc412290d77a401986ba0bea
Defining BLIS_IS_BUILDING_LIBRARY if BUILD_SHARED_LIBS=ON for the object libraries created in kernels/ directory.
The macro definition was not propagated from high level CMake, so we need to define explicitly for the object libraries.
AMD-Internal: [CPUPL-3241]
Change-Id: Ifc5243861eb94670e7581367ef4bc7467c664d52
Prevented calling avx2 based bli_zgemm_ref_k1_nn code on
non-supported systems.
Changed the name of the function bli_zgemm_ref_k1_nn to bli_zgemm_4x6_avx2_k1_nn().
Changed the name of the function bli_dgemm_ref_k1_nn to bli_dgemm_8x6_avx2_k1_nn().
Thanks to Kiran Varaganti <Kiran.Varaganti@amd.com>
for identifying and helping to fix the issue.
AMD-Internal: [CPUPL-3352]
Change-Id: I02530ab197ed84c96cbad4f7dd56eedca0109c35
- In C/Z TRSM small, packing in case of unit diagonal
is not handled properly.
- Diagonal elements are still being read even in case of
unit diagonal.
- This causes "Conditional jump or move depends on
uninitialised value" error during valgrind tests.
- To fix this, diagonal elements should not be read
in case of unit diagonal.
AMD-Internal: [CPUPL-3406]
Change-Id: If3d6965299998a83d87f3a032f654fc7f8c43d4e
- Previously, this flag was set as a default at the high-level CMakeLists.txt which means that this flag is used to build everything,all files and all subdirectories, including ref_kernels and testsuite. Also, all files as target sources for this project and compiled with the same flags.
- Now, we create object files using the source in kernels/ directory and add to the object files the AVX2 flag explicitly. So, now only those files will have this flag and it should not be used to compile ref_kernels, etc.
- This is a quick solution to enable runs on non-AVX2 machines.
AMD-Internal: [CPUPL-3241]
Change-Id: Id569b26ffeea40eaa36ab4465b0c52b6446d7650
Some text files were missing a newline at the end of the file.
One has been added.
Also correct file format of windows/tests/inputs.yaml, which
was missed in commit 0f0277e104
AMD-Internal: [CPUPL-2870]
Change-Id: Icb83a4a27033dc0ff325cb84a1cf399e953ec549
Source and other files in some directories were a mixture of
Unix and DOS file formats. Convert all relevant files to Unix
format for consistency. Some Windows-specific files remain in
DOS format.
AMD-Internal: [CPUPL-2870]
Change-Id: Ic9a0fddb2dba6dc8bcf0ad9b3cc93774a46caeeb
Corrections for spelling and other mistakes in code comments
and doc files.
AMD-Internal: [CPUPL-2870]
Change-Id: Ifbb5df7df2d6312fe73e06ee6d41c00b16c593ce
- Set the variables to zero to avoid the compiler warning
(-Wmaybe-uninitialized) in bli_dgemm_ref_k1.c,
bli_gemm_small.c, bli_trsm_small.c, bli_zgemm_ref_k1.c and
bli_trsm_small_AVX512.c
- Changed the datatype from dim_t to siz_t for i,k,j
in bli_hemv_unf_var1_amd.c and bli_hemv_unf_var3_amd.c to
avoid the compiler warning (-Waggressive-loop-optimizations)
AMD-Internal: [CPUPL-2870]
Change-Id: Ib2bc050fa47cb8a280d719283ab4539c70e19d03