The modifier bits in the schedule type is not used/supported in the
static scheduler, so it should be ignored.
Differential Revision: https://reviews.llvm.org/D134983
In preparation for OMPT target changes, create separate categories of events that will be used by OMPT target support.
Split up existing macro FOREACH_OMPT_EVENT into new ones. There is no change to the original macro. Created new macros FOREACH_OMPT_HOST_EVENT, FOREACH_OMPT_DEVICE_EVENT, FOREACH_OMPT_NOEMI_EVENT, FOREACH_OMPT_EMI_EVENT, and a few other sub-categories that can be used as required. One such use is in D123974 which uses events selectively.
Patch from John Mellor-Crummey <johnmc@rice.edu>
Reviewed By: dreachem
Differential Revision: https://reviews.llvm.org/D123429
It is data mapping ordering problem.
According omp spec
If one or more map clauses are present, the list item conversions that
are performed for any use_device_ptr or use_device_addr clause occur
after all variables are mapped on entry to the region according to those
map clauses.
The change is to put mapping data for use_device_addr at end of data
mapping array.
Differential Revision: https://reviews.llvm.org/D134556
Add OpenMP device runtime build support for the gfx1100, gfx1101,
gfx1102, and gfx1103 targets.
Differential Revision: https://reviews.llvm.org/D134465
This patch add codegen support for the has_device_addr clause. It use
the same logic of is_device_ptr. But passing &var instead pointer to var
to kernal.
Differential Revision: https://reviews.llvm.org/D134268
Summary: This patch add codegen support for the has_device_addr clause. It
use the same logic of is_device_ptr.
Differential Revision: https://reviews.llvm.org/D134186
GCC, glibc, binutils, and LLVM have added support for LoongArch64.
This patch adds support for LLVM OpenMP following D59880 for RISCV64.
Reviewed By: MaskRay, SixWeining
Differential Revision: https://reviews.llvm.org/D132925
These patches exposed a lot of problems in the AMD toolchain. Rather
than keep it broken we should revert it to its old semi-functional
state. This will prevent us from using device destructors but should
remove some new bugs. In the future this interface should be changed
once these problems are addressed more correctly.
This reverts commit ed0f218115.
This reverts commit 2b7203a359.
Fixes#57536
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D133997
This patch changes the CMake to instead embed the already generated
LLVM-IR bitcode library into an object file to create the static
library. This is different from the previous method which generated them
separately. This will make the build faster and allow us to perform the
same internalization into a single library we do with the bitcode
library.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D133952
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.
Reviewed By: jdoerfert, jhuber6, ABataev
Differential Revision: https://reviews.llvm.org/D102107
Previous support for device memory allocators used a single free
routine and did not provide the original kind of the allocation. This is
problematic as some of these memory types required different handling.
Previously this was worked around using a map in runtime to record the
original kind of each pointer. Instead, this patch introduces new free
routines similar to the existing allocation routines. This allows us to
avoid a map traversal every time we free a device pointer.
The only interfaces defined by the standard are `omp_target_alloc` and
`omp_target_free`, these do not take a kind as `omp_alloc` does. The
standard dictates the following:
"The omp_target_alloc routine returns a device pointer that references
the device address of a storage location of size bytes. The storage
location is dynamically allocated in the device data environment of the
device specified by device_num."
Which suggests that these routines only allocate the default device
memory for the kind. So this has been changed to reflect this. This
change is somewhat breaking if users were using `omp_target_free` as
previously shown in the tests.
Reviewed By: JonChesterfield, tianshilei1992
Differential Revision: https://reviews.llvm.org/D133053
Sumnmary:
A previous patch introduces an `exports` file which contains all the
symbol names that are not internalized in the bitcode library. This is
done to reduce the size of the bitcode library and only export needed
functions. This export file must contain all the functoins expected to
be called from the device. Since its introduction the `__assert_fail`
function used to be provided but was mistakenly not included. This patch
adds it.
Fixes#57656
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D133594
Summary:
The AMDGPU and CUDA plugins now relies on the Object and Support
libraries. This patch adds them explicitly rather than hoping that they
share the symbols loaded from the standard `libomptarget`.
Have it be simple KMP_MFENCE() which incorporates x86-specific logic and
reduces to KMP_MB() for other architectures.
Differential Revision: https://reviews.llvm.org/D130928
In OpenMP 5.2, §5.8.6, page 160 line 32-33, when a device pointer
allocated by omp_target_alloc has implicitly been included on a target
construct as a zero-length array, the pointer initialisation should not
find a matching mapped list item, and so should retain its value as a
firstprivate variable. Previously, we would return a null pointer if the
list item was not found. This patch updates the map handling to the
OpenMP 5.2 semantics.
Reviewed By: jdoerfert, ye-luo
Differential Revision: https://reviews.llvm.org/D133447
This patch replaces the dependency on `libelf` with LLVM's ELF support.
With this patch the user no-longer needs to have `libelf` on their
system to build and configure OpenMP offloading. The replacement is
mostly mechanical, with the exception of the hash table support which
was added in D131309.
Depends on D131309
Reviewed By: JonChesterfield, saiislam
Differential Revision: https://reviews.llvm.org/D131401
The `SHT_HASH` sections in an ELF are used to look up a symbol in the
symbol table using a symbol's name. This is done by obtaining the
`SHT_HASH` section and using its `sh_link` attribute to access the
associated symbol table, from which we can access the string table
containing the associated name. We can then search for the symbol using
the hash of the name and the buckets and chains in the hash table
itself
This patch adds utility functions that allow us to look up a symbol in
an ELF file by name. It will first attempt to look through the hash
tables, and then search the section tables manually if failed. This
allows us to pull out constants necessary for setting up offloading
without first loading the object.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D131309
support for OpenMP programs.
This is 5th of 6 patches started from https://reviews.llvm.org/D100181
This plugin code, when loaded in gdb, adds a few commands like
ompd icv, ompd bt, ompd parallel.
These commands create an interface for GDB to read the OpenMP
runtime through libompd.
Reviewed By: @dreachem
Differential Revision: https://reviews.llvm.org/D100185
This patch adds support for the device memory type, this is currently equivalent
to the default type so it should be treated as the same.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D133128
Some code previous needed the `used` attribute to prevent the GCC
compiler versions 5 and 6 from removing it. This is no longer required
as the minimum supported GCC version for LLVM 16 is >=7.1.0.
Reviewed By: JonChesterfield, vzakhari
Differential Revision: https://reviews.llvm.org/D132976
This test is an expected failure on AMDGPU. The expected failure is a GPU memory
failure, which will typically result in the device totally failing. This isn't
an issue for some GPU configurations that do not use the offloading device to
also drive the display server. However, if the main GPU is used for testing it
will reliably result in the user's display becoming unresponsive. This makes it
difficult to run the GPU offloading tests on many systems.
This patch simply makes this test unsupported so it no longer runs and freezes
my computer when using `ninja check-openmp`.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D132891
Previously, the tripcount was set by a push call. We moved away from
this with the new interface that added the tripcount to the kernel
arguments struct, but kept around the old interface for legacy purposes
for the LLVM 15 release. This patch removes the support for the legacy
method.
This removes the support for the old method, but does not break
backwards compatibility. This will result in applications using the old
interface being slower when run on the device.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D132885
Previously time tracing features were hidden behind an optional CMake
option. This was because `libomptarget` was not based on the LLVM
libraries at that time. Now that `libomptarget` is an LLVM library we
should be able to freely use the `LLVMSupport` library whenever we want
and do not need to guard it in this way.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D132852
The only RTLs that get added to the `UsedRTLs` list have already been
checked is they were valid binaries. We shouldn't need to do this again
when we unregister all the used binaries as they wouldn't have been used
if they were invalid anyway. Let me know if I'm incorrect in this
assumption.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D131443
Recently OpenMP has transitioned to using the "new" driver which
primarily merges the device and host linking phases into a single
wrapper that handles both at the same time. This replaced a few tools
that were only used for OpenMP offloading, such as the
`clang-offload-wrapper` and `clang-nvlink-wrapper`. The new driver
carries some marked benefits compared to the old driver that is now
being deprecated. Things like device-side LTO, static library
support, and more compatible tooling. As such, we should be able to
completely deprecate the old driver, at least for OpenMP. The old driver
support will still exist for CUDA and HIP, although both of these can
currently be compiled on Linux with `--offload-new-driver` to use the new
method.
Note that this does not deprecate the `clang-offload-bundler`, although
it is unused by OpenMP now, it is still used by the HIP toolchain both
as their device binary format and object format.
When I proposed deprecating this code I heard some vendors voice
concernes about needing to update their code in their fork. They should
be able to just revert this commit if it lands.
Reviewed By: jdoerfert, MaskRay, ye-luo
Differential Revision: https://reviews.llvm.org/D130020
The cuda plugin maps TARGET_ALLOC_HOST onto cuMemAllocHost
which is page locked host memory. Fine grain HSA memory is not
necessarily page locked but has the same read/write from host or
device semantics.
The cuda plugin does this per-gpu and this patch makes it accessible
from any gpu, but it can be locked down to match the cuda behaviour
if preferred.
Enabling tests requires an equivalent to
// RUN: %libomptarget-compile-run-and-check-nvptx64-nvidia-cuda
for amdgpu which doesn't seem to be in use yet.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D132660
if(TARGET amdgpu-arch) doesn't work when ENABLE_LLVM_PROJECTS=openmp because openmp subdirectory is processed before clang subdirectory. Adopt the same logic of enabling tests like the CUDA plugin.
Differential Revision: https://reviews.llvm.org/D132579
This patch replaces uses of `dlopen` and `dlsym` with LLVM's support
with `loadPermanentLibrary` and `getSymbolAddress`. This allows us to
remove the explicit dependency on the `dl` libraries in the CMake. This
removes another explicit dependency and solves an issue encountered
while building on Windows platforms. The one downside to this is that
the LLVM library does not currently support `dlclose` functionality, but
this could be added in the future.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D131507
We use the offloading entires array to determine the relative names and
addressed of device-side kernel functions. The x86_64 plugin previously
derived the device-side entry table by first identifying the
`omp_offloading_entries` section offset in the loaded elf. Then we would
use the base offset of the loaded dyanmic library to identify the
entries array within the loaded image. This relied on some more
unconventional methods which prevented us from using the LLVM dynamic
library loader for this plugin. This patch simplifies this by instead
copying the host-side entry and replacing its address with the
device-side address looked up through `dlsym`.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D131516
The OpenMP device runtime needs to support the OpenMP standard. However
constructs like nested parallelism are very uncommon in real application
yet lead to complexity in the runtime that is sometimes difficult to
optimize out. As a stop-gap for performance we should supply an argument
that selectively disables this feature. This patch adds the
`-fopenmp-assume-no-nested-parallelism` argument which explicitly
disables the usee of nested parallelism in OpenMP.
Reviewed By: carlo.bertolli
Differential Revision: https://reviews.llvm.org/D132074
We held off on this before as `LLVM_LIBDIR_SUFFIX` conflicted with it.
Now we return this.
`LLVM_LIBDIR_SUFFIX` is kept as a deprecated way to set
`CMAKE_INSTALL_LIBDIR`. The other `*_LIBDIR_SUFFIX` are just removed
entirely.
I imagine this is too potentially-breaking to make LLVM 15. That's fine.
I have a more minimal version of this in the disto (NixOS) patches for
LLVM 15 (like previous versions). This more expansive version I will
test harder after the release is cut.
Reviewed By: sebastian-ne, ldionne, #libc, #libc_abi
Differential Revision: https://reviews.llvm.org/D130586
This patch fixes a condition in the openmp/libomptarget/src/device.cpp file. The code was checking if the run_region plugin API function was implemented, but it should actually check the run_region_async function instead.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D131782
Currently, the field just emit map info for this pointer variable. It is
failed at run time. For the fields, the PartialStruct is created and it
needs call to emitCombinedEntry which create the base that covers all
the pieces.
The change is to generate map info as regular fields.
Differential Revision: https://reviews.llvm.org/D129608
Serialized parallels allocate lightweight task teams on the heap
but never free them in the corresponding join. This patch adds a wrapper
around the allocation (if ompt enabled) and also adds the corresponding
free in the join call.
Differential Revision: https://reviews.llvm.org/D131690
The problem is we create the call to __kmpc_kernel_parallel in the
openmp-opt pass but while we optimize the code, the call is not there
yet. Thus, we assume we never reach it from __kmpc_target_deinit. That
allows us to remove the store in there (`ParallelRegionFn = nullptr`),
which leads to bad results later on.
This is a shortstop solution until we come up with something better.
Fixes https://github.com/llvm/llvm-project/issues/57064
We recently added support for multi-architecture binaries in
libomptarget. This is done by extracting the architecture from the
embedded image and comparing it with the major and minor version
supported by the current CUDA installation. Previously we just compared
these directly, which was not correct for binary compatibility. The CUDA
documentation states that we can consider any image with an equivalent
major or a greater or equal to minor compatible with the current image.
Change the check to use this new logic in the CUDA plugin.
Fixes#57049
Reviewed By: jdoerfert, ye-luo
Differential Revision: https://reviews.llvm.org/D131567
integer value 40962 is outside the valid range of values [0, 31] for this enumeration type [-Wenum-constexpr-conversion]` (Issue #57022)
turn on -Wno-enum-constexpr-conversion to buy some time to fix the more egregious issue in hsa_agent_into_t and hsa_amd_agent_info_t interfaces.
relates to https://reviews.llvm.org/D131307/new/
Differential Revision: https://reviews.llvm.org/D131477
We will add some simple implementation of libc functions starting from
this patch, and the first one is `memcmp`, which is reported in #56929. Note that
`malloc` and `free` are not included in this patch because of the use of
`declare variant`. In the near future we will implement the two functions w/o
using any vendor provided function.
This fixes#56929.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D131182
A previous patch made the destruction of the HSA plugin more
deterministic. However, there were still other global values that are not
handled this way. When attempting to call a destructor kernel, the
device would have already been uninitialized and we could not find the
appropriate kernel to call. This is because they were stored in global
containers that had their destructors called already. Merges this global
state into the rest of the info state by putting those global values
inside of the global pointer already allocated and deallocated by the
constructor and destructor. This should allow the AMDGPU plugin to
correctly identify the destructors if we were to run them.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D131011
This fixes warnings like these:
../runtime/src/kmp_dispatch.cpp:2159:24: warning: left operand of comma operator has no effect [-Wunused-value]
OMPT_LOOP_DISPATCH(*p_lb, *p_ub, pr->u.p.st, status);
^~~~~
../runtime/src/kmp_dispatch.cpp:2159:31: warning: left operand of comma operator has no effect [-Wunused-value]
OMPT_LOOP_DISPATCH(*p_lb, *p_ub, pr->u.p.st, status);
^~~~~
../runtime/src/kmp_dispatch.cpp:2159:46: warning: left operand of comma operator has no effect [-Wunused-value]
OMPT_LOOP_DISPATCH(*p_lb, *p_ub, pr->u.p.st, status);
~~~~~~~ ^~
../runtime/src/kmp_dispatch.cpp:2159:50: warning: expression result unused [-Wunused-value]
OMPT_LOOP_DISPATCH(*p_lb, *p_ub, pr->u.p.st, status);
^~~~~~
CMAKE_DL_LIBS is documented as "Name of library containing dlopen
and dlclose".
On Windows platforms, there's no system provided dlopen/dlclose, but
it can be argued that if you really intend to call dlopen/dlclose,
you're going to be using a third party compat library like
https://github.com/dlfcn-win32/dlfcn-win32, and CMAKE_DL_LIBS should
expand to its name.
This has been argued upstream in CMake in
https://gitlab.kitware.com/cmake/cmake/-/issues/17600 and
https://gitlab.kitware.com/cmake/cmake/-/merge_requests/1642, that
CMAKE_DL_LIBS should expand to "dl" on mingw platforms.
The merge request wasn't merged though, as it caused some amount of
breakage, but in practice, Fedora still carries a custom CMake patch
with the same effect.
Thus, this patch fixes cross compiling OpenMP for mingw targets
on Fedora with their custom-patched CMake.
Differential Revision: https://reviews.llvm.org/D130892
The runtime makes some use of `std::vector` data structures. We should
be able to replace these trivially with `llvm::SmallVector` instead.
This should allow us to avoid heap allocations in the majority of cases
now.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D130927
Fix the LD_LIBRARY_PATH prepending order to make sure that
config.library_path ends up before any potentially-system directories
(e.g. config.hwloc_library_dir). This makes sure that we are testing
against the just-built openmp libraries rather than the version that is
already installed.
Also rename the function to `prepend_*` to make it clearer what it
actually does.
https://github.com/llvm/llvm-project/issues/56821
Differential Revision: https://reviews.llvm.org/D130825
This test hasn't been fixed and causes spurious failures when testing.
This patch sets it as unsupported until we have a reliable fix.
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D130789
Moves DeviceInfo global to heap to accurately control lifetime.
Moves calls from libomptarget to deinit_plugin later, plugins need to stay
alive until very shortly before libomptarget is destructed.
Leaving the deinit_plugin calls where initially inserted hits use after
free from the dynamic_module.c offloading test (verified with valgrind
that the new location is sound with respect to this)
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D130714
Will allow plugins to migrate away from using global variables to
manage lifetime, which will fix a segfault discovered in relation to D127432
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D130712
The behaviour of this patch is not great, but it has some side-effects
that are required for OpenMPOpt to work. The problem is that when we use
`-mlink-builtin-bitcode` we only import used symbols from the runtime.
Then OpenMPOpt will insert calls to symbols that were not previously
included. This patch removed this implicit behaviour as these functions
were kept alive by the `noinline` simply because it kept calls to them
in the module. This caused regression in some tests that relied on some
OpenMPOpt passes without using LTO. Reverting for the LLVM15 release but
will try to fix it more correctly on main.
This reverts commit d61d72dae6.
Fixes#56752
This patch extends the is_valid_binary routine to also check if the
binary's target ID matches the one parsed from the system's runtime
environment.
This should allow us to only use the binary whose compute capability
matches, allowing us to support basic multi-architecture binaries for
AMDGPU.
It also handles compatibility testing of target IDs of the image and
the enviornment.
Depends on D127432
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D127769
A previous patch removed the need to set the auxiliary architecture as
it was no longer needed for the clang invocation after moving to using
the clang frontend. However, this had a second use of preventing
unsupported host architectures from building the device runtime. This
caused failures when trying to build on 32-bit hosts for example.
Fixes#56699
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D130509
We previously used the `noinline` attributes to specify some defintions
which should be kept alive in the runtime. These were then stripped
immediately in the OpenMPOpt module pass. However, Since the changes in
D130298, we not explicitly state which functions will have external
visiblity in the bitcode library. Additionally the OpenMPOpt module pass
should run before the inliner pass, so this shouldn't make a difference
in whether or not the functions will be alive for the initial pass of
OpenMPOpt. This should simplify the interface, and additionally save
time spend on scanning funciton names for noinline.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D130368
This patch extends the is_valid_binary routine to also check if the
binary's target ID matches the one parsed from the system's runtime
environment.
This should allow us to only use the binary whose compute capability
matches, allowing us to support basic multi-architecture binaries for
AMDGPU.
It also handles compatibility testing of target IDs of the image and
the enviornment.
Depends on D127432
Differential Revision: https://reviews.llvm.org/D127769
Sometimes libomptarget's CUDA plugin produces unhelpful diagnostics
about a lack of CUDA devices before an application runs:
```
$ clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa hello-world.c
$ ./a.out
CUDA error: Error returned from cuInit
CUDA error: no CUDA-capable device is detected
Hello World: 4
```
This can happen when the CUDA plugin was built but all CUDA devices
are currently disabled in some manner, perhaps because
`CUDA_VISIBLE_DEVICES` is set to the empty string. As shown in the
above example, it can even happen when we haven't compiled the
application for offloading to CUDA.
The following code from `openmp/libomptarget/plugins/cuda/src/rtl.cpp`
appears to be intended to handle this case, and it chooses not to
write a diagnostic to stderr unless debugging is enabled:
```
if (NumberOfDevices == 0) {
DP("There are no devices supporting CUDA.\n");
return;
}
```
The problem is that the above code is never reached because the
earlier `cuInit` returns `CUDA_ERROR_NO_DEVICE`. This patch handles
that `cuInit` case in the same manner as the above code handles the
`NumberOfDevices == 0` case.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D130371
Multiple calls to `omp_get_wtime` could be optimized out due to the function
is mistakenly marked as `readnone`. This patch fixes the issue, and also add the
support to run optimization on `libomptarget` tests.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D130179
Multiple calls to `omp_get_wtime` could be optimized out due to the function
is mistakenly marked as `readnone`. This patch fixes the issue, and also add the
support to run optimization on `libomptarget` tests.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D130179
Previously we made `libomptarget` link as an LLVM library so we have
access to the LLVM core libraries. After the initial patch stuck we can
now apply the same changes to the plugins. This will allow us to use
LLVM in all of `libomptarget` when we have uses for them. In the future
this should allow us to remove the dependencies on `libelf`, `libffi`,
and `dl`.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D130262
This patch adds the use of the `-internalize-public-api-file` option in
the internalization pass to internalize any definition that isn't
explicitly needed for the interface. This will allow us to perform more
optimizations on the file that normally would not have been possible
with functions internal to the library not being internal.
Depends on D130293
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D130298
Currently the bitcode library is build using the clang front-end
manually. This was originally done because we did not support device
only compilation. Now we support device only compilation, at least for a
single offloading toolchain, so we can instead use clang directly rather
than using the front-end. This saves us needing to define things like
`aux_triple`.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D130293
Summary:
Some of the buildbots don't find the libraries because they don't build
for the GPU. Although it should always be there it's unclear why these
buildbots are having problemsd. LTO is only interesting on the GPU and
these tests take extra time anyway so I'm just going to disable them for
now.
First of all, `LLVM_TOOLS_INSTALL_DIR` put there breaks our NixOS
builds, because `LLVM_TOOLS_INSTALL_DIR` defined the same as
`CMAKE_INSTALL_BINDIR` becomes an *absolute* path, and then when
downstream projects try to install there too this breaks because our
builds always install to fresh directories for isolation's sake.
Second of all, note that `LLVM_TOOLS_INSTALL_DIR` stands out against the
other specially crafted `LLVM_CONFIG_*` variables substituted in
`llvm/cmake/modules/LLVMConfig.cmake.in`.
@beanz added it in d0e1c2a550 to fix a
dangling reference in `AddLLVM`, but I am suspicious of how this
variable doesn't follow the pattern.
Those other ones are carefully made to be build-time vs install-time
variables depending on which `LLVMConfig.cmake` is being generated, are
carefully made relative as appropriate, etc. etc. For my NixOS use-case
they are also fine because they are never used as downstream install
variables, only for reading not writing.
To avoid the problems I face, and restore symmetry, I deleted the
exported and arranged to have many `${project}_TOOLS_INSTALL_DIR`s.
`AddLLVM` now instead expects each project to define its own, and they
do so based on `CMAKE_INSTALL_BINDIR`. `LLVMConfig` still exports
`LLVM_TOOLS_BINARY_DIR` which is the location for the tools defined in
the usual way, matching the other remaining exported variables.
For the `AddLLVM` changes, I tried to copy the existing pattern of
internal vs non-internal or for LLVM vs for downstream function/macro
names, but it would good to confirm I did that correctly.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D117977
We can help optimizations by making sure we use the team state whenever
it is clear there is no thread state. To this end we introduce a new
state flag (`state::HasThreadState`) and explicit control for the
`state::ValueRAII` helpers, including a dedicated "assert equal".
Differential Revision: https://reviews.llvm.org/D130113
Our conditional writes in the runtime look like this:
```
if (active)
*ptr = value;
```
In the RAII we need to assign `ptr` which comes from a lookup call.
If a thread that is not the main thread calls lookup with the intention
to write the pointer, we'll create a new thread state. As such, we need
to avoid calling lookup for inactive threads. We used to use `nullptr`
as their `ptr` value but that can cause pessimistic reasoning. We now
use `undef` instead.
Differential Revision: https://reviews.llvm.org/D130114
We used to inline the `lookup` calls such that the runtime had "known"
access offsets when it was shipped. With the new static library build it
doesn't as the lookup is an indirection we cannot look through. This
should help us optimize the code better until we can do LTO for the
runtime again.
Differential Revision: https://reviews.llvm.org/D130111
This patch extends the `is_valid_binary` routine to also check if the
binary's architecture string matches the one parsed from the runtime.
This should allow us to only use the binary whose compute capability
matches, allowing us to support basic multi-architecture binaries for
CUDA.
Depends on D127432
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D127505
The previous path changed the linker wrapper to embed the offloading
binary format inside the target image instead. This will allow us to
more generically bundle metadata with these images, such as requires
clauses or the target architecture it was compiled for.
I wasn't sure how to handle this best, so I introduced a new type that
replaces the old `__tgt_device_image` struct that we can expand inside
the runtime library. I made the new `__tgt_device_binary` struct pretty
much the same for now. In the future we could change this struct to
pretty much be the `OffloadBinary` class in the future.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D127432
We previously had some logic that stopped us from building the device runtime if
there were no NVPTX architectures provided. This is incorrect because we could
have AMDGPU libraries. Even if the lists are empty we should be able to attempt
to build these and get dummy output. THis wilil make it much easier for our
tooling which expects certain libraries. If the user wishes to disable the
library entirely they should use `-DLIBOMPTARGET_BUILD_DEVICERTL_BCLIB=OFF"
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D130266
This patch makes libomptarget depend on LLVM libraries to be built. The
reason for this is because we already have an implicit dependency on
LLVM headers for ELF identification and extraction as well as an
optional dependenly on the LLVMSupport library for time tracing
information. Furthermore, there are changes in the future that require
using more LLVM libraries, and will heavily simplify some future code as
well as open up the large amount of useful LLVM libraries to
libomptarget.
This will make "standalone" builds of `libomptarget' more difficult for
vendors wishing to ship their own. This will require a sufficiently new
version of LLVM to be installed on the system that should be picked up
by the existing handling for the implicit headers.
The things this patch changes are as follows:
- `libomptarget.so` links against LLVMSupport and LLVMObject
- `libomptarget.so` is a symbolic link to `libomptarget.so.15`
- If using a shared library build, user applications will depend on LLVM
libraries as well
- We can now use LLVM resources in Libomptarget.
Note that this patch only changes this to apply to libomptarget itself,
not the plugins. Additional patches will be necessary for that.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D129875
This patch makes libomptarget depend on LLVM libraries to be built. The
reason for this is because we already have an implicit dependency on
LLVM headers for ELF identification and extraction as well as an
optional dependenly on the LLVMSupport library for time tracing
information. Furthermore, there are changes in the future that require
using more LLVM libraries, and will heavily simplify some future code as
well as open up the large amount of useful LLVM libraries to
libomptarget.
This will make "standalone" builds of `libomptarget' more difficult for
vendors wishing to ship their own. This will require a sufficiently new
version of LLVM to be installed on the system that should be picked up
by the existing handling for the implicit headers.
The things this patch changes are as follows:
- `libomptarget.so` links against LLVMSupport and LLVMObject
- `libomptarget.so` is a symbolic link to `libomptarget.so.15`
- If using a shared library build, user applications will depend on LLVM
libraries as well
- We can now use LLVM resources in Libomptarget.
Note that this patch only changes this to apply to libomptarget itself,
not the plugins. Additional patches will be necessary for that.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D129875
Warnings that occur during affinity initialization are supposed
to be guarded by KMP_AFFINITY=nowarnings,noverbose, but some had been
missed by this logic. Create one macro for affinity warnings that takes
these settings into account.
Differential Revision: https://reviews.llvm.org/D125991
Added control to reset affinity of primary thread after outermost parallel
region to initial affinity encountered before OpenMP runtime was initialized.
KMP_AFFINITY environment variable reset/noreset modifier introduced.
Default behavior is unchanged.
Differential Revision: https://reviews.llvm.org/D125993
icc does not properly detect lack of fallthrough attribute since it
defines __GNU__ > 7 and also icc's __has_cpp_attribute/__has_attribute
feature detectors do not properly detect the lack of fallthrough attribute.
Differential Revision: https://reviews.llvm.org/D126001
Made library registration conditional and skip it in the __kmp_atfork_child
handler, postponed it till middle initialization in the child.
This fixes the problem of applications those use e.g. popen/pclose
which terminate the forked child process.
Differential Revision: https://reviews.llvm.org/D125996
This patch makes libomptarget depend on LLVM libraries to be built. The
reason for this is because we already have an implicit dependency on
LLVM headers for ELF identification and extraction as well as an
optional dependenly on the LLVMSupport library for time tracing
information. Furthermore, there are changes in the future that require
using more LLVM libraries, and will heavily simplify some future code as
well as open up the large amount of useful LLVM libraries to
libomptarget.
This will make "standalone" builds of `libomptarget' more difficult for
vendors wishing to ship their own. This will require a sufficiently new
version of LLVM to be installed on the system that should be picked up
by the existing handling for the implicit headers.
The things this patch changes are as follows:
- `libomptarget.so` links against LLVMSupport and LLVMObject
- `libomptarget.so` is a symbolic link to `libomptarget.so.15`
- If using a shared library build, user applications will depend on LLVM
libraries as well
- We can now use LLVM resources in Libomptarget.
Note that this patch only changes this to apply to libomptarget itself,
not the plugins. Additional patches will be necessary for that.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D129875
The device runtime uses the address space attribute to control the
placement of important constants on the GPU. The changes made in D126061
caused these to start emitting errors as they were not applied to the
type. This patch fixes the issues to make the warnings go away.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D129896
The OMPD patches introduces GDB plugin. When it is built, it will create a
coulple of temp files in `.eggs`. This patch add it into `.gitignore` in case it
messed up the git tracking.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D129711
Summary:
We use a static assert to make sure that someone doesn't change the size
of an argument struct without properly updating all the other logic.
This originally only checked the size on a 64-bit system with 8-byte
pointers, causing builds on 32-bit systems to fail. This patch allows
either pointer size to work.
Fixes#56486
commit: 51d3f421f4
failed due to missing python-pip om machine.
Now the ompd gdb-plugin code will be skipped with a warning
if pip is not available in the machine.
support for OpenMP programs.
This is 5th of 6 patches started from https://reviews.llvm.org/D100181
This plugin code, when loaded in gdb, adds a few commands like
ompd icv, ompd bt, ompd parallel.
These commands create an interface for GDB to read the OpenMP
runtime through libompd.
Reviewed By: @dreachem
Differential Revision: https://reviews.llvm.org/D100185
This patch moves the old legacy interfaces into `libomptarget` to a
separate file. These do not need to be included anywhere and are simply
provided for backwards compatibility with the ABI. This cleans up the
interface greatly.
Depends on D128817
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D128818
The previous patch added an argument to the `__tgt_target_kernel`
runtime function which includes the tripcount used for the loop clause.
This was originally passed in via the `__kmpc_push_target_tripcount`
function. Now we move this logic to the kernel launch itself and remove
the need for the push function.
Depends on D128816
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D128817
This patch implements a unified kernel entry function that will be
targeted from both teams and non-teams clauses. We introduce a new
interface and make the old functions call in using the new one. A
following patch will include the necessary changes to Clang to call
these new functions instead.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D128549
This check-in adds 4 APIs to support MSVC, specifically:
* 3 APIs (__kmpc_sections_init, __kmpc_next_section,
__kmpc_end_sections) to support the dynamic scheduling of OMP sections.
* 1 API (__kmpc_copyprivate_light, a light-weight version of
__kmpc_copyrprivate) to support the OMP single copyprivate clause.
Differential Revision: https://reviews.llvm.org/D128403
Libomptarget grew out of a project that was originally not in LLVM. As
we develop libomptarget this has led to an increasingly large clash
between the naming conventions used. This patch fixes most of the
variable names that did not confrom to the LLVM standard, that is
`VariableName` for variables and `functionName` for functions.
This patch was primarily done using my editor's linting messages, if
there are any issues I missed arising from the automation let me know.
Reviewed By: saiislam
Differential Revision: https://reviews.llvm.org/D128997
This patch implements omp_get_device_num() in the host and the device.
It uses the already existing getDeviceNum in the device config for the device.
And in the host it uses the omp_get_num_devices().
Two simple tests added
Differential Revision: https://reviews.llvm.org/D128347
This patch fixes the issue that P2P memcpy doesn't work. The root cause is we didn't set current context when calling the API function. In addition, a matrix to track the states of each pair of devices is also added such that we only need to query and configure the device once.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D122764
When many nested teams are formed, __kmp_threads may be reallocated
to accommodate new threads. This reallocation causes a data
race when another existing team's thread simultaneously references
__kmp_threads. This patch keeps the old thread arrays around until library
shutdown so these lingering references can complete without issue and
access to __kmp_threads remains a simple array reference.
Fixes: https://github.com/llvm/llvm-project/issues/54708
Differential Revision: https://reviews.llvm.org/D125013
Summary:
This patch removes a duplicated exit from the OpenMP data envrionment.
We already have an RAII method that guards this environment so it is
unnecessary.
Make libomptarget.device.a built when using -DLLVM_ENABLE_PROJECTS=openmp
Use add_custom_command.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D128130
Old LLVM installation may expose its internal omptarget CMake target when being used by find_package(LLVM) and caused issues in the CMake of libomptarget that is being built. Trap the issue early.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D128129
Summary:
The static linking test ensures that we can statically link offloading
programs. To create the test we used `llvm-ar`. However, this may not
exist in the user's environment. This patch changes it to use the
binutils `ar` which should exist on every system running these tests
currently. In the future we should set up the dependencies properly.
We are planning on making LTO the default compilation mode for
offloading. In order to make sure it works we should run these tests on
the test suite. AMDGPU already uses the LTO compilation path for its
linking, but in LTO mode it also links the static library late.
Performing LTO requires the static library to be built, if we make the
change this will be a hard requirement and the old bitcode library will
go away. This means users will need to use either a two-step build or a
runtimes build for libomptarget.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D127512
First of all, `LLVM_TOOLS_INSTALL_DIR` put there breaks our NixOS
builds, because `LLVM_TOOLS_INSTALL_DIR` defined the same as
`CMAKE_INSTALL_BINDIR` becomes an *absolute* path, and then when
downstream projects try to install there too this breaks because our
builds always install to fresh directories for isolation's sake.
Second of all, note that `LLVM_TOOLS_INSTALL_DIR` stands out against the
other specially crafted `LLVM_CONFIG_*` variables substituted in
`llvm/cmake/modules/LLVMConfig.cmake.in`.
@beanz added it in d0e1c2a550 to fix a
dangling reference in `AddLLVM`, but I am suspicious of how this
variable doesn't follow the pattern.
Those other ones are carefully made to be build-time vs install-time
variables depending on which `LLVMConfig.cmake` is being generated, are
carefully made relative as appropriate, etc. etc. For my NixOS use-case
they are also fine because they are never used as downstream install
variables, only for reading not writing.
To avoid the problems I face, and restore symmetry, I deleted the
exported and arranged to have many `${project}_TOOLS_INSTALL_DIR`s.
`AddLLVM` now instead expects each project to define its own, and they
do so based on `CMAKE_INSTALL_BINDIR`. `LLVMConfig` still exports
`LLVM_TOOLS_BINARY_DIR` which is the location for the tools defined in
the usual way, matching the other remaining exported variables.
For the `AddLLVM` changes, I tried to copy the existing pattern of
internal vs non-internal or for LLVM vs for downstream function/macro
names, but it would good to confirm I did that correctly.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D117977
The code expanded from kmp_barrier.h uses some `KMP_INTERNAL_*`s,
so the definitions have to be placed before it.
Fixes#55815
Differential Revision: https://reviews.llvm.org/D126873
Summary:
This test was failing because of an implicit declaration of `printf`
which isn't legal with newer C, causing it to fail. This patch just adds
the necessary header.
When we build the libomptarget device runtime library targeting bitcode,
we need special care to make sure that certain functions are not
optimized out. This is because we manually internalize and optimize
these definitions, ignoring their standard linkage semantics. When we
build with the static library, we can maintain these semantics and we do
not need these to be kept-alive. Furthermore, if they are kept-alive it
prevents them from being removed during LTO. This prevents us from
completely internalizing `IsSPMDMode` and removing several other
functions. This patch removes these for the static library target by
using a macro definition to enable them.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D126701
MSVC may not supply source location information to kmpc_reduce passing
NULL for the value. The patch adds a check for the loc value being NULL
in kmp_determine_reduction_method.
Differential Revision: https://reviews.llvm.org/D126564
The memkind library is only available for linux. Calling dlopen here
can also be problematic in a client app that fork'ed.
Differential Revision: https://reviews.llvm.org/D126579