This patch removes the classes GenericStreamManagerTy and GenericEventManagerTy
from the PluginInterface header.
Differential Revision: https://reviews.llvm.org/D138769
This patch modifies the PluginInterface to define functions for initializing
and deinitializing GenericPluginTy instances instead of using the constructor
and destructor. This way, we can return errors from these functions. Also, it
defines some functions that each plugin should implement for creating
plugin-specific objects.
This patch prepares the PluginInterface for the new AMDGPU NextGen plugin.
Differential Revision: https://reviews.llvm.org/D138625
List of fixes:
- omptarget_device_environment symbol is not mandatory in device images
- Do not synchronize in ~AsyncInfoWrapperTy() if the async info's queue is null
- GenericDeviceResourceRef's create() and destroy() require the device as parameter
Differential Revision: https://reviews.llvm.org/D138619
The OpenMP target's NextGen plugins retrieve symbol information in the ELF image
(i.e., address and size) through the ELF section and ELF symbol objects. However,
the images of CUDA programs compute the address differently from the images of
AMDGPU programs:
- Address for CUDA symbols: image begin + section's offset + symbol's st_value
- Address for AMDGPU symbols: image + begin + symbol's st_value
Differential Revision: https://reviews.llvm.org/D138604
The purpose of this patch is to have tool-provided callbacks registered
in libomptarget. The overall design document is in
https://rice.app.box.com/s/pf3gix2hs4d4o1aatwir1set05xmjljc
Defined a class OmptDeviceCallbacksTy that will be used by libomptarget
and a plugin for callbacks registered by a tool. Once the callbacks are
registered in libomp, a lookup function is passed to libomptarget that is
used to retrieve the callbacks and register them in libomptarget.
Patch from John Mellor-Crummey <johnmc@rice.edu>
(With contributions from Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com>)
Reviewed By: jplehr
Differential Revision: https://reviews.llvm.org/D123974
Summary:
This header file uses the `DP` prefixes but does not define
`DEBUG_PREFIX`. This patch adds a simple fix, but realistically the `DP`
system isn't ideal. Now that we have access to LLVM libraries and other
utilities we should consider rewriting all of the debugging and error
handling glue.
Summary:
This commit sets the default visibility of PluginInterface's symbols (in
nextgen plugins) as protected. This prevents symbols from a plugin
library to be preempted by another plugin library's symbol. It applies
the same fix introduced by D136365.
Issue reported by @ggeorgakoudis.
Differential Revision: https://reviews.llvm.org/D138002
This is part of a set of patches implementing OMPT target callback support and has been split out of the originally submitted https://reviews.llvm.org/D113728. The overall design can be found in https://rice.app.box.com/s/pf3gix2hs4d4o1aatwir1set05xmjljc
The purpose of this patch is to provide a way to register tool-provided callbacks into libomp when libomptarget is loaded.
Introduced a cmake variable LIBOMPTARGET_OMPT_SUPPORT that can be used to control OMPT target support. It follows host OMPT support, controlled by LIBOMP_HAVE_OMPT_SUPPORT.
Added a connector that can be used to communicate between OMPT implementations in libomp and libomptarget or libomptarget and a plugin.
Added a global constructor in libomptarget that uses the connector to force registration of tool-provided callbacks in libomp. A pair of init and fini functions are provided to libomp as part of the connect process which will be used to register the tool-provided callbacks in libomptarget.
Patch from John Mellor-Crummey <johnmc@rice.edu>
(With contributions from Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com>)
Reviewed By: dreachem, jhuber6
Differential Revision: https://reviews.llvm.org/D123572
But I can not reproduce the problem on my local machine. My local machine run:
222 0x5a6780
222 0x7fffbef9400e
222 0x5a677e 0x5a6780 0x7fffbef936c8
222 0x376f8e 0x376f90 0x7fffbef94008
222 0x281f20
222 0x7fffbef9400e
PASSED
It is caused by regenerate captured var value when processing the
has_device_addr, the captured var value has been generated in
GenerateOpenMPCapturedVars and passed as Arg in generateInfoForCapture.
The fix just use Arg instead regenerated just same as is_device_ptr
The AsyncInfoTy should be created in the same device as the async operation will be issued. In omp_target_memcpy, the AsyncInfoTy for the host to destination device transfer was created referring to the source device.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D137225
This is part of a set of patches implementing OMPT target callback support and has been split out of the originally submitted https://reviews.llvm.org/D113728. The overall design can be found in https://rice.app.box.com/s/pf3gix2hs4d4o1aatwir1set05xmjljc
The purpose of this patch is to provide a way to register tool-provided callbacks into libomp when libomptarget is loaded.
Introduced a cmake variable LIBOMPTARGET_OMPT_SUPPORT that can be used to control OMPT target support. It follows host OMPT support, controlled by LIBOMP_HAVE_OMPT_SUPPORT.
Added a connector that can be used to communicate between OMPT implementations in libomp and libomptarget or libomptarget and a plugin.
Added a global constructor in libomptarget that uses the connector to force registration of tool-provided callbacks in libomp. A pair of init and fini functions are provided to libomp as part of the connect process which will be used to register the tool-provided callbacks in libomptarget.
Depends on D123429
Patch from John Mellor-Crummey <johnmc@rice.edu>
(With contributions from Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com>)
Reviewed By: dreachem
Differential Revision: https://reviews.llvm.org/D123572
When multiple threads invoke checkDeviceAndCtors, both of them may read true
from the shared variable Device.HasPendingGlobals, and then invoke initLibrary
redundantly. Therefore only protecting the access to Device.HasPendingGlobals
is not sufficient to guarantee that initLibrary is invoked just once.
To fix this race condition, we move the invocation of initLibrary into the
critical section, and remove the same lock inside initLibrary.
Differential Revision: https://reviews.llvm.org/D136952
Need both add_custom_command to resolve file-level dependency and add_custom_target to resolve target-level dependency.
From CMake add_custom_command doc:
Do not list the output in more than one independent target that may build in parallel or the two instances of the rule may conflict (instead use the add_custom_target() command to drive the command and make the other targets depend on that one).
${CMAKE_CURRENT_BINARY_DIR}/${bclib_name} is used by multiple targets and thus requires a custom target to avoid racing.
Differential Revision: https://reviews.llvm.org/D136911
This patch adds a new infrastructure for OpenMP target plugins. It also implements the CUDA and GenericELF64bit plugins under this new infrastructure. We place the sources in a separate directory named plugins-nextgen, and we build the new plugins as different plugin libraries. The original plugins, which remain untouched, will be used by default. However, the user can change this behavior at run-time through the boolean envar LIBOMPTARGET_NEXTGEN_PLUGINS. If enabled, the libomptarget will try to load the NextGen version of each plugin, falling back to the original if they are not present or valid.
The idea of this new plugin infrastructure is to implement the common parts of target plugins in generic classes (defined in files inside plugins-next/common/PluginInterface folder), and then, each specific plugin defines its own specific classes inheriting from the common ones. In this way, most logic remains on the common interface while reducing the plugin-specific source code. It is also beneficial in the sense that now most code and behavior are the same across the different plugins. As an example, we define classes for a plugin, a device, a device image, a stream manager, etc. The plugin object (a single instance per plugin library) holds different device objects (i.e., one per available device), while these latter are the responsible for managing its own resources.
Most code on this patch is based on the changes made by @jdoerfert (Johannes Doerfert)
Reviewed By: jhuber6, jdoerfert
Differential Revision: https://reviews.llvm.org/D134396
The plugins all define the same interface symbols. This is generally not
a problem when calling the plugin directly from the dynamic library's
handle. However, when calling from within the plugin itself it is
possible for another plugin's symbols to preempt the symbols. This was
observed with the `__tgt_rtl_is_valid_binary` call in the
`__tgt_rtl_is_valid_binary_info` function being mapped to the x86_64
plugin.
This patch changes the default visibility to `protected` intead. This
visibility ensures that these symbols are all externally visible from
the plugin, but ensures their definitions are fixed within the shared
library. Having protected visiiblity makes such symbol preemption
impossible.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D136365
Summary:
This patch changes the `exports` file to export all `__tgt_rtl`
functions. This is a better option as not each plugin implements all of
these functions, furthermore any new functions added will be
automatically included.
This reverts commit 096f93e73d.
Revert "[Libomptarget] Make the plugins ingore undefined exported symbols"
This reverts commit 3f62314c23.
Revert "[LLD] Enable --no-undefined-version by default."
This reverts commit 7ec8b0d162.
Three commits are reverted because of the current omp build fail
with GNU ld. See discussion here: https://reviews.llvm.org/rG096f93e73dc3
Summary:
Recent changes made the default behaviour to error when given an
undefined symbol in a version script. A previous patch fixed this for
`libomptarget` by removing the single undefined symbol. However, the
plguins are expected to only define a subset of the availible functions
so we shouldn't treat it as an error. This patch updates the build flags
to work appropriately.
Summary:
A recent patch made undefined symbols in version scripts cause errors by
default. The `omp_get_interop_rc_desc` function is declared but not
defined, so it is undefined in the final link unit. This patch removes
it from the exports list, it should be added back in when actually
defined and used.
File-level dependency should not be used on files generated during the build. The next command may execute before the generating command finishes writing the file. Use add_custom_target and use target-level dependency.
Differential Revision: https://reviews.llvm.org/D135630
These debugging definitions are no longer used in the new runtime. The
old runtime has been removed since Clang-14 so we can safely get rid of
these leftover variables.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D135452
Summary:
This patch just cleans up the unused flags in the DeviceRTL. These
should no longer be necessary or are redundant. Also add the extract
tool and packager to the check and error message if not found. This will
make it easier to tell if they are not present.
The shared memory stack in the device runtime assumes no intervined uses.
D135037 breaks the assumption, potentially causing the shared stack corruption.
This patch moves the thread array to heap memory. Since it is already the slow
path, it doesn't matter that much anyway.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D135391
A previous patch merged the static and bitcode versions of the
deviceRTL. We previously used the static library's separate compilation
to set a special flag that prevented `IsSPMDMode` from being put in the
used list and preventing it from being optimized out. When they were
merged we could no longer do this separate compilation that allowed
users of LTO to get more optimal code.
This patch rearranges the code. The `IsSPMDMode` global is now
transitively used by its inclusion in the changed `__keep_alive`
function. This allows us to then manually delete the `__keep_alive`
function from the module when building the static library via
`llvm-extract`. The result is that the bitcode library correctly will
maintain the needed shared state, while the static library will be able
to internalize it and optimize it out.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D135280
If we have thread states, the program is going to be rather slow. If we
don't, we want to avoid wasting shared memory. This patch introduces a
slight penalty (malloc + indirection) for the slow path and reduces
resource usage for the fast path.
Differential Revision: https://reviews.llvm.org/D135037
We should use OpenMP atomics but they don't take variable orderings.
Maybe we should expose all of this in the header but that solves only
part of the problem anyway.
Differential Revision: https://reviews.llvm.org/D135036